Design a Rate Limiter

Question to ask

Where is the rate limiter been used?
- For a gRPC/REST api on the server side?
- For a client side - to avoid sending too many request to the server?
- For DDoS prevention?
Do we impose a hard limit or a soft limit?
- Stop serving the request if it hit more than exact 100 QPS?
- Or just roughly around 100 QPS?
How do we identify the user?
- Using IP address?
- user account?
- AK/SK?
The scale of the system?
- QPS?
- Number of user?
Type of system
- Distributed system? Monolith?
- Single region? Cross-region?
Where is the rate limiter located?
- A pass-through gateway? Frontend service?
- A microservice that determine if a request should be processed?
How do we inform the user?
- Error code
- HTTP Code 429 - Too many

Where to store the counter/queue/log?

Data consistency (race condition & )

synchronization issue

more than one rate limiter servers, different server is storing different states
solution:
- consistent hashing
- using a shared data store - redis / memcached

what if we want to rate limit the amount of open gRPC stream / websocket connection ?
- assuming an user can have 5 stream at a time
- reject new connection is there are 5 stream open

Distributed Cache