AWS – AutoScaling Health Check Timers

Autoscaling without health check of instances is not possible. After all it is how autoscaling maintains the minimum and maximum instances running in healthy state. It is also not possible to suspend the health check temporarily.

Autoscaling health check type can be EC2 or ELB. There is an instance in healthy state and running but I want to stop it while attached to autoscaling group. As soon as I’ve stopped the instance it got terminated and a new instance popped up. If the EC2 instance is not in running state then autoscaling consider it as unhealthy terminates the instance and provision new instance. So it is not possible to stop instance while it is attached to autoscaling group.

The below are the settings available while configuring health checks which are self explanatory.

Healthcheck Interval - The amount of time between health checks
Response Timeout - The amount of time to wait to receive response
Healthy Threshold - The no of consecutive successful health checks that must occur to consider healthy.
Unhealthy Threshold - The no of consecutive unsuccessful health checks that must occur to consider unhealthy.

Provisioning instances using autoscaling demands the application to come up before health check grace period timer expires. Health check grace period starts as soon as instance is launched. If it is EC2 health check then it must be running and if it is ELB the configured health check should be successful before grace period expires. If there is no successful health check then autoscaling marks the instance as Unhealthy and starts the process again.

I’ve seen a question  in AWS forum that user asking – EC2 provision taking 15 minutes ( no issues there ) and health check grace period is 2000 seconds ( 33 minutes ) but autoscaling still marking the instance as unhealthy even though health check is responding 200 OK even before health check grace period expires.

The problem is the user set HealthyThreshold: 10, UnhealthyThreshold: 6, Interval: 300, Timeout: 15. So to consider the instance as healthy it should meet the healthy threshold of 10 and the health check is happening every 5 minutes  = 50 minutes which is greater than health check grace period so even though instance is responding 200 OK it never reached the healthy threshold.

Cooldown is the time autoscaling waits between the action triggers. For example consider that we have a scale up policy if avg CPU of EC2 instances is greater than 80% for 5 minutes and autoscaling triggered to increase EC2 instance by one. As mentioned above health check grace period timer starts as soon as EC2 launched and wait for it to reach healthy threshold before consider it as healthy and of course it should reach threshold before health check grace period timer expires. Once the instance is considered healthy the cooldown timer starts.

While the cooldown timer is running autoscaling wont trigger any scaling activities even though it receives alerts. It gives the time for the new instance to pick up the tasks and eventually allows to re-calculate avg CPU of EC2 to see if situation has changed. If the alarm stays even after cooldown timer then autoscaling adds one more instance and the process continues.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s