Fortifying your API Gateway: Defending Millions of Requests per second Against Potential Exploitations

Ziheng Wang
Disney+ Hotstar
Published in
6 min readMay 9, 2023

--

Photo by Immo Wegmann on Unsplash

Disney+Hotstar is the largest OTT provider in India and powers the Disney+ app in the MENA, SEA, and SAARC regions.

One of the key challenges faced by the platform is authenticating requests to origin APIs, while also preventing any potential exploits by hackers or malicious users. Authentication exploits can result in financial losses, availability issues, and a negative impact on the user experience.

In this blog, we’ll explore our journey of building a centralized and robust authentication mechanism using the Emissary open-source Kubernetes-native API gateway (formerly known as Ambassador). We’ll discuss how our solution has evolved and how it can effectively authenticate requests from millions of Hotstar users.

User Authentication — Old Architecture

Let’s walk through our previous-gen solution for request authentication and learn how requests flow through our systems.

At Disney+Hotstar, we utilize JWT tokens for request authentication. Previously, our services were exposed to the client via AWS Load Balancer (ALB), which resulted in all requests hitting the origin without authentication. As a result, our origin services had to integrate with our in-house token SDK to authenticate and decode the JWT token.

Limitations & Challenges

  • Auth is every service’s responsibility : In this setup, each client-facing service was required to possess a thorough understanding of authentication, which created a potential security risk. Additionally, distributing token secrets to numerous services violated the principle of “Least Privilege”. Any oversight in this process could potentially lead to security breaches.
  • Inconsistencies due to SDK versions: Inconsistencies in the version of the token library across services could create difficulties in rolling out token upgrades across teams and services, including signing key rotation.
  • Inconsistent Error Responses: Unauthenticated error responses could be inconsistent across services, posing a challenge in maintaining the business contract between clients and services.

Given these limitations, we decided to relook at our design and find a solution that would allow us to overcome these gaps.

Centralized Gateway Authentication

To mitigate these challenges, we opted for a single ingress authentication that could serve as a safeguard to all external-facing APIs. After careful consideration, we chose Emissary Ingress, which is based on the high-performance Envoy and offers a range of flexible plugins such as ExtAuth, RateLimit, Tracing, and more. This choice was well-suited to our use cases and provided us with a high level of extensibility.

Architecture

Centralized Gateway Authentication Workflow

To achieve granular control over APIs, we implemented authentication checks as plugins in the Emissary API Gateway. This allowed us to invoke the plugin only when specific criteria were met in the incoming request path, ensuring that each API had the appropriate level of authentication. As a result, we not only improved our security measures but also gained greater flexibility for customized authentication.

  • Token Authentication: Basic authentication of JWT token by checking the token signature, expiration time, and other relevant information.
  • 3rd Party Auth Integration: Pluggable authentication for requests from third-party platforms, allowing for flexible customization of the authentication process.
  • Silent token refresh: Token’s life cycle completely managed by a single Authentication service, transparent to origin services and client.
  • User session identity: To avoid the need to pass user tokens and perform validation across multiple services, we introduced a new identity structure known as “Envelope”. This structure is generated once at the Gateway and can be consumed by all origin services on request chain for common data access.

These improvements allow us to safely manage user identity tokens and protect origin services from invalid requests.

Next, we will dive into how we solved 4 major challenges with the Gateway Authentication solution.

User Session Identity (Envelope)

To securely propagate user identity information to our services without relying on the potentially fragile JWT token-based propagation, we introduced a new identity structure called “Envelope”. This structure is modeled as a Protocol Buffer and provides a uniform and secure way to propagate personal identity information to origin services.

Envelope proto structure
  • The Envelope can serve all the information contained in the token and also provides flexibility for serving enriched data based on our business requirements.
  • Each Envelope is a short-lived identity token that is scoped to the life of the client request, completely consumed and propagated among internal services in our system.
  • Downstream services can fetch the properties in the Envelope conveniently by integrating with our Envelope SDK supported in different languages.

Centralized Data Enrichment

Centralized Data Enrichment Workflow

There are multiple use-cases where downstream services may need similar customer data to serve a rich user experience. User Cohorts is one such piece which plays a critical role in Hotstar ecosystem. We use cohort data to bucket groups of customers that showcase similar patterns, and we can then design effective engagement strategies per unique cohort.

Let’s take a practical example to understand this better. We tag users who have a preference for watching certain sports into one cohort group, and push notification to them whenever a tournament relevant to them is being streamed on Hotstar. This ensures that our customers do not miss out on their favourite content.

Another use-case is customers whose subscription plan just expired or is due to expire shortly — they will be tagged into another cohort group, and then will be reminded to renew the subscription periodically.

We recognized that we could significantly improve the system NFRs (Non-functional Requirements) by enriching these properties once at the edge while generating the Envelope.

Force Session Block and Token Refresh

Context

At times, it’s necessary to block user sessions by invalidating their tokens after they log out or if they’re flagged as malicious by our RiskEngine (read more in our RiskEngineBlog). To accomplish this, we need a contract between the Authentication Service and other parts of our system for token force blocks.

There are also cases where certain events, such as a user purchasing a new subscription plan or upgrading their existing plan, require asynchronous updates to their token properties. This is where token force refresh becomes necessary. By implementing token force block and refresh, we ensure that our system remains secure and our users’ access remains up-to-date.

Solution

To solve this, naturally we would think of two approaches, either storing the invalidation and refresh list in a data storage like Redis or caching locally. However there are drawbacks with both solutions when it goes to production, it is costly to check Redis for every request with the first approach as traffic volume grows, and the space usage is definitely a big concern with the second approach since the set stored locally could be very large.

To address these concerns, we introduced Bloom Filter that would listen to token lifecycle events keep itself up-to-date. Bloom filter is a space-efficient probabilistic data structure used to check whether an element is a member in the set. Checks at Bloom Filter can only return either “highly possible in set” or “definitely not in set”.

“Highly possible in set” means there’s a possibility that a blocked user session is in BloomFilter, but actually not. If we block a wrong user, there will be a negative impact to our user experience. Therefore, we would still do a deep check in Redis to rule out false positives. Since the blocked and refreshed sessions take a tiny percentage of the total traffic, the majority of the cases will be filtered out by Bloom Filter without querying Redis.

By using Bloom Filter, we were able to reduce our space consumption by 40x.

Summary

In this blog, we talked about why we re-architected the user authentication flow and how we built a new age authentication system from scratch that takes care of user token validation, refresh, user-logouts, changes in subscription, user data enrichment in envelope and apply security features at gateway. We also talked about interesting sub-problems around simplifying space constraints to perform high scale authentication checks.

Want to build stuff like this? We’re hiring & we are always looking for smart engineers who love solving hard problems. Check out open roles at https://tech.hotstar.com.

--

--