App Engine: Request Scheduling and Pending Latency
App Engine routes each request to an available instance. If all instances are busy, App
Engine starts a new instance. This is App Engine’s automatic scaling feature, and is
what makes it especially useful for handling real-time user traffic for web and mobile
clients.
App Engine considers an instance to be “available” for a request if it believes the
instance can handle the request in a reasonable amount of time. With multithreading
disabled, this definition is simple: an instance is available if it is not presently busy
handling a request.
With multithreading enabled, App Engine decides whether an instance is available
based on several factors. It considers the current load on the instance (CPU and
memory) from its active request handlers, and its capacity. It also considers historical
knowledge of the load caused by previous requests to the given URL path. If it seems
likely that the new request can be handled effectively in the capacity of an existing
instance, the request is scheduled to that instance.
Incoming requests are put on a pending queue in preparation for scheduling. App
Engine will leave requests on the queue for a bit of time while it waits for existing
instances to become available, before deciding it needs to create new instances. This
waiting time is called the pending latency.
You can control how App Engine decides when to start and stop instances in
response to variances in traffic. App Engine uses sensible defaults for typical applications,
but you can tune several variables to your app based on how your app uses
computational resources and what traffic patterns you’re expecting.
To set these variables, you edit your appengine-web.xml file, and add an <automaticscaling>
section, like so:
<appengine-web-app xmlns="http://appengine.google.com/ns/1.0">
<!-- ... -->
<automatic-scaling>
<min-pending-latency>automatic</min-pending-latency>
<max-pending-latency>30ms</max-pending-latency>
</automatic-scaling>
</appengine-web-app>
The maximum pending latency (<max-pending-latency>) is the most amount of time
a request will wait on the pending queue before App Engine decides more instances
are needed to handle the current level of traffic. Lowering the maximum pending
latency potentially reduces the average wait time, at the expense of activating more
instances. Conversely, raising the maximum favors reusing existing instances, at the
expense of potentially making the user wait a bit longer for a response. The setting is
a number of milliseconds, with ms as the unit.
The minimum pending latency (<min-pending-latency>) specifies a minimum
amount of time a request must be on the pending queue before App Engine can conclude
a new instance needs to be started. Raising the minimum encourages App
Engine to be more conservative about creating new instances. This minimum only
refers to creating new instances. Naturally, if an existing instance is available for a
pending request, the request is scheduled immediately. The setting is a number of
milliseconds (with the unit: 5ms), or automatic to let App Engine adjust this value on
the fly as needed (the default).
Comments (0)
Post a Comment
Cancel