Scaling for Tsunami traffic

Published in

Disney+ Hotstar

4 min readApr 18, 2019

Hotstar, is basically the Japan of the Internet. When India takes the field to play cricket, we brace for Tsunamis! We’ve spoken about how auto-scaling is generally a bad idea when you have such varying traffic patterns, because of the Tsunami nature of traffic. However, this year, on our K8s platform, we’ve been experimenting with our own auto-scaler.

Concurrent users on Hotstar platform over a three day period

Scaling up our infra, pessimistically, with head room to match expected concurrency, before the event, was resulting in a heavy infrastructure costs. In order to optimise utilization, we started working on a platform that scales up and down seamlessly.

Scaling Policies

For autoscaling of Hotstar services we had to identify three things per service,

Scaling Trigger: Some services can be scaled up based on CPU utilization, others can be scaled based on the number of messages in the queue. However, the most common scaling approach is to scale up based on the number of requests being processed by the service.

Scale-up Time: This is the time from when the scaling decision is made to the time when new infra starts serving traffic. This includes, time to add more infrastructure, scheduling and spinning up new instances and the application start time (along with any pre-warming time required). This time is very critical. For reducing infra cost, the scale up times are what needs optimizing

Scaling Buffers: This is the maximum traffic surge expected by the service during the “Scale-up Time”. We keep different types of buffer based on the scaling policy

Scaling policies can differ based on the service,

Traffic based scaling

Number of containers running for a service with Request based scaling

For services that expose metrics about number of active threads / requests being processed, we use that as the trigger. This works well, even for services with unpredictable workloads.

Ladder based scaling

If the service doesn’t expose these metrics, we use the default strategy, ladder based scaling. We have scaling ladders defined per million concurrent users on the platform (1M, 2M .. 25M) This works well for predictable workloads.

Putting this all together

We’ve built an internal scaling app called “Infradashboard” that helps people define the types of scaling per application and the scaling parameters required.

The autoscaling system reads this data and makes scaling decisions based on current metrics and buffers we have defined. This works with any type of infrastructure, as we fire API calls for scale ups and scale downs, the implementation can be specific to any infrastructure provider

We also monitor the autoscaling system in action and tweak any parameters if needed

Parameters at Hotstar scale

Let’s say adding new infrastructure to the existing pool takes around 90 seconds. If the container creation on that node plus the application start up time is around 75 seconds, we’ve lost about 3 minutes of time for scaling up. Adding the time to gather metrics and to make the scaling decision, the total delay can be around 4 minutes

To us, this means, we would have missed about two million people on the platform within these 4 minutes.

During live events, we set higher buffers. For ladder based scaling, we add 2 million as Concurrency buffer. For Traffic based scaling, we add 30% as Traffic Scaling buffer.

That’s not all, we also keep adding more infrastructure to the platform and removing them dynamically, even when there are millions of users on the platform. All this without any disruption to the requests served by services on the platform.

Scale Down — Takeaways

Post VIVO IPL 2018, we decided to go back to the drawing board on our infrastructure and double down on our investments on containers. Being able to pull off auto-scaling, at scale, seamlessly was one of our goals, which we’re checking off our list in 2019.

Auto-scaling is great, but nothing works better than lean services that use only what they need as they churn through all that RPS madness. We relentlessly tune our services so that they use lesser compute and have higher throughput. This year, for IPL, we’re processing more than 2x of our loads at 10x lower compute than we used last year.

That though, is a whole another blog!

We’re hiring — if you want to be part of this sort of insane tech, please check out tech.hotstar.com