Design a Rate Limiter
Question to ask
- Where is the rate limiter been used?
- For a gRPC/REST api on the server side?
- For a client side - to avoid sending too many request to the server?
- For DDoS prevention?
- Do we impose a hard limit or a soft limit?
- Stop serving the request if it hit more than exact 100 QPS?
- Or just roughly around 100 QPS?
- How do we identify the user?
- Using IP address?
- user account?
- AK/SK?
- The scale of the system?
- QPS?
- Number of user?
- Type of system
- Distributed system? Monolith?
- Single region? Cross-region?
- Where is the rate limiter located?
- A pass-through gateway? Frontend service?
- A microservice that determine if a request should be processed?
- How do we inform the user?
- Error code
- HTTP Code 429 - Too many
Off the shelf solution
- AWS: AWS API Gateway - configure throttling for the API
- GCP: GCP API Gateway - quota and limits
- Azure: Azure API Management - advanced api throttling
Rate limit algorithms
- token bucket
- leaking bucket
- fixed window counter
- sliding window log
- sliding window counter
High level design
Where to store the counter/queue/log?
- not in disk (not local disk, not dynamodb, not mysql)
- in memory (local memory, redis, memcached)
Data consistency (race condition & )
- require atomic operation - INCR redis operation
synchronization issue
- more than one rate limiter servers, different server is storing different states
- solution:
- consistent hashing
- using a shared data store - redis / memcached
TODO???
- what if we want to rate limit the amount of open gRPC stream / websocket connection ?
- assuming an user can have 5 stream at a time
- reject new connection is there are 5 stream open
References
- Alex Xu - System Design Interview - An Insider's Guide