Build a rate limiter using Durable Objects and Workers.
This example shows how to build a rate limiter using Durable Objects and Workers that can be used to protect upstream resources, including third-party APIs that your application relies on and/or services that may be costly for you to invoke.
This example also discusses some decisions that need to be made when designing a system, such as a rate limiter, with Durable Objects.
The Worker creates a RateLimiter Durable Object on a per IP basis to protect upstream resources. IP based rate limiting can be effective without negatively impacting latency because any given IP will remain within a small geographic area colocated with the RateLimiter Durable Object. Furthermore, throughput is also improved because each IP gets its own Durable Object.
It might seem simpler to implement a global rate limiter, const id = env.RATE_LIMITER.idFromName("global");, which can provide better guarantees on the request rate to the upstream resource. However:
This would require all requests globally to make a sub-request to a single Durable Object.
Implementing a global rate limiter would add additional latency for requests not colocated with the Durable Object, and global throughput would be capped to the throughput of a single Durable Object.
A single Durable Object that all requests rely on is typically considered an anti-pattern. Durable Objects work best when they are scoped to a user, room, service and/or the specific subset of your application that requires global co-ordination.
The Durable Object uses a token bucket algorithm to implement rate limiting. The naive idea is that each request requires a token to complete, and the tokens are replenished according to the reciprocal of the desired number of requests per second. As an example, a 1000 requests per second rate limit will have a token replenished every millisecond (as specified by milliseconds_per_request) up to a given capacity limit.
This example uses Durable Object's Alarms API to schedule the Durable Object to be woken up at a time in the future.
When the alarm's scheduled time comes, the alarm() handler method is called, and in this case, the alarm will add a token to the "Bucket".
The implementation is made more efficient by adding tokens in bulk (as specified by milliseconds_for_updates) and preventing the alarm handler from being invoked every millisecond. More frequent invocations of Durable Objects will lead to higher invocation and duration charges.
The first implementation of a rate limiter is below: