Have you ever lost sleep because the server suddenly died in the middle of the night? I do, and the culprit is usually a silly DDoS attack or a script that abuses the API innocently. Setting up the rate limiting API to prevent DDoS spam was the silent hero that saved me from those terrible nights. It is not a panacea, but it is certainly the first and most important layer of armor to protect the system, ensuring resources are not depleted by malicious requests.
Why Will Your API "Die" Soon Without Rate Limiting?
Without rate limiting, your API will easily become a victim of server overload, resource exhaustion and system crash in just a few minutes when faced with sudden traffic or malicious attacks.
The essence of the Rate Limiting API definition is actually very simple: it is a mechanism that controls the number of requests that a client (user, device, or service) can send to the server in a certain period of time. Just imagine it like a toll station on a highway. Without this station to regulate traffic, the road will be seriously congested when too many vehicles arrive. To build a solid system, you first need to understand the architectural foundation. If you're new, reviewing What is REST API and RESTful design will help you better understand how endpoints communicate before applying layers of protection. Asking the question why we need Rate Limiting API is like asking why a house needs a door.
Identify the Enemy: DDoS, Spam, Poaching & More
Main threats include DDoS attacks, junk data spam, brute-force attacks to detect passwords, and web scraping to steal unauthorized content.
When you publicize an endpoint to the internet, you are unintentionally inviting all sorts of people. Small-scale DDoS attacks (Application layer DDoS - Layer 7) are the things I encounter most often. Hackers use botnets to send millions of fake requests to paralyze the database. Next are brute-force attacks and credential stuffing attacks (Credential Stuffing) aimed directly at login pages. If you don't limit the number of false attempts from an IP address or user ID, your database will be exposed quickly.
In addition, web scraping (data scraping) or spam comments are also forms of API abuse that cause useless bandwidth consumption. Deploying Rate Limiting against spam and Rate Limiting against brute force attacks is a mandatory step. Especially with older systems, vulnerabilities are easier to exploit. You can refer to the methods Securing PHP applications against hacking to combine with rate limit to create a multi-layered defense layer.
More than just Anti-Attack: Golden Benefits for the System
Besides security, rate limiting also helps manage resources, ensure system stability, reduce latency and provide a fair experience for all users.
Many developers often think that the benefits of Rate Limiting API are limited to Rate Limiting against DDoS attacks. In fact, at Pham Hai, we realize that its core value lies in resource management and ensuring fair use (Fair Use). Suppose you have a paid API, and a VIP user accidentally writes an infinite loop script that calls the API repeatedly. If not stopped, the entire scalability of the system will be exhausted to serve a silly bug, affecting the user experience of other customers. Rate limiting helps maintain system stability and control latency to an ideal level. Of course, protection at the application layer is not enough if the server infrastructure is lax. Don't forget to review the steps Securing Linux VPS against hacking attacks to keep the system safe from the root.
Quickly Distinguish Rate Limiting and Throttling: Don't Get Confused!
Rate Limiting completely rejects requests that exceed the limit (returning error 429), while Throttling will slow down the processing speed of those requests instead of rejecting them immediately.
I see a lot of interviewees get stuck in the question distinguishing between Rate Limiting and Throttling. Simply put, it's easy to understand:
- Rate Limiting giống như bảo vệ hộp đêm, quán quy định tối đa 100 người là đóng cửa không cho ai vào nữa. Khi bị chặn, client sẽ nhận ngay mã lỗi HTTP 429 Too Many Requests.
- Throttling (điều tiết lưu lượng) thì giống như cảnh sát giao thông phân luồng lúc tắc đường; họ không đuổi bạn về nhà, nhưng bắt bạn phải đi chậm lại.
Depending on the traffic management problem, we choose which one to use, or combine both to bring the highest efficiency.
"Martial Arts" Of Rate Limiting: The 4 Most Popular Algorithms
There are 4 core algorithms to implement rate limiting including: Token Bucket, Leaky Bucket, Fixed Window Counter and Sliding Window Log, each with its own advantages and disadvantages.
To implement the Rate Limiting API to be practically effective, you cannot just randomly code a counter variable and reset it every minute. Below are the classic Rate Limiting algorithms that technology giants are still thoroughly applying.
| Algorithm | Outstanding advantages | Main disadvantage | Suitable for |
|---|---|---|---|
| Token Bucket | Handles traffic bursts well. | Need to carefully fine-tune the token deposit speed. | Public API, limited by User. |
| Leaky Bucket | Smooth the flow, processing speed is always stable. | New requests may be dropped if the queue is full. | Protect Database, processing queue. |
| Fixed Window | Easy to code, consumes little RAM memory. | The "border effect" bug causes double the load. | Small system, simple rules. |
| Sliding Window | Extremely accurate, no border errors. | It takes a lot of memory to store timestamps. | Financial API, strict system requirements. |
Token Bucket: "Ticket" For Each Request
The token bin algorithm works by allocating a certain amount of tokens to the bin over time; Each request needs a token to be processed.
Thuật toán thùng token (Token Bucket) là "con cưng" của các hệ thống như Amazon và Stripe. Tưởng tượng bạn có một cái xô chứa tối đa 100 token (đồng xu). Mỗi giây, hệ thống tự động bỏ thêm 10 token vào xô. Khi có một request bay tới, nó phải lấy được 1 token trong xô thì mới được đi tiếp. Nếu xô rỗng, request bị rớt (drop). Thuật toán này cực kỳ xuất sắc vì nó cho phép những đợt bùng nổ lưu lượng ngắn hạn miễn là trong xô còn token. Thường thì token bucket hay được gắn với từng user cụ thể. Để quản lý định danh người dùng an toàn trước khi cấp token rate limit, bạn nên tìm hiểu bài viết OAuth 2.0 authentication giải thích dễ hiểu để thiết lập luồng xác thực chuẩn chỉ.
Leaky Bucket: A Steady Flow
The leak bucket algorithm puts all requests into a queue (bucket) and processes them (leakage) at a fixed rate, which helps smooth out traffic.
In contrast to Token Bucket, Leaky Bucket algorithm (Leaky Bucket) is extremely popular with platforms like Shopify. You have a bucket with a hole in the bottom. Requests pour into the bucket from above (no matter how fast or slow), but the water (request) flows out of the bucket to the backend server at a fixed and steady rate. If the bucket is full, the new request will overflow (rejected). This is a great way to protect the database from load shock, ensuring the server always handles with the most stable performance.
Fixed Window Counter: Simple but Easy to "Spare"
The fixed window algorithm counts the number of requests in static timestamps (for example, from 1:00 to 1:01), is easy to install but has problems with traffic congestion at time boundaries.
Thuật toán cửa sổ cố định (Fixed Window Counter) chia thời gian thành các khung bằng nhau (ví dụ: mỗi phút 1 khung). Mỗi khung có một bộ đếm. Request tới thì tăng biến đếm lên 1. Vượt quá giới hạn thì hệ thống sẽ chặn. Thuật toán này siêu dễ code và nhẹ server. Nhưng nhược điểm chí mạng của nó là "hiệu ứng giáp ranh" (boundary effect). Ví dụ giới hạn là 100 req/phút. Hacker gửi 100 req ở giây thứ 59, và thêm 100 req nữa ở giây thứ 01 của phút tiếp theo. Vậy là trong vỏn vẹn 2 giây, server phải gánh 200 request, phá vỡ hoàn toàn ý nghĩa của việc giới hạn. Dù đơn giản, thuật toán này vẫn hữu ích cho các dự án nhỏ. Ví dụ khi bạn thực hiện lập trình rest api cho wordpress, việc dùng fixed window để chặn spam comment là một khởi đầu không tồi.
Sliding Window Log: A More Optimal Solution
The sliding log algorithm saves the timestamp of each request, helping to accurately calculate the number of requests in a continuous sliding period, overcoming the disadvantages of fixed windows.
To fix the error of fixed windows, the sliding window log algorithm (Sliding Window Log) was born. Instead of counting locally in fixed minutes, it saves the exact time (timestamp) of each request in the log (usually using Redis Sorted Sets). When there is a new request, it deletes all timestamps older than a specified period of time (for example, 1 minute ago), then counts the number of remaining timestamps. If it is less than the limit, let it go. Extremely accurate!
However, the disadvantage is that it consumes memory because millions of timestamps must be saved if the traffic volume is large. To understand how Rate Limiting works most perfectly in practice, engineers often use a combination of Sliding Window and Counter (called Sliding Window Counter) to optimize both RAM and accuracy.
Real Battle: Where and How to Deploy Rate Limiting?
Implementing rate limiting can be done at many different levels, most commonly at API Gateway, through Middleware in source code, or using a distributed system with Redis.
The theory is over, now I will share my actual combat experience. Choosing the location of the blocking greatly determines the effectiveness of API security and the overall performance of the entire architecture.
API Gateway layer: "Gatekeeper" Man Can
Setting rate limiting at API Gateway helps block bad requests right from the outside before they can reach the application server.
Implementing Rate Limiting at API Gateway (like Kong, AWS API Gateway, or Nginx) is the #1 Best Practice today. It acts as a solid shield. Malicious requests are dropped right at the edge of the network, completely reducing the load on the backend. You can easily configure blocking by IP address or API Key (API Key) with just a few lines of configuration without touching the application logic code. If you are using Nginx as a gateway, besides configuring the speed limit, make sure you know how to Nginx configure SSL reverse proxy to encrypt all transmitted data, enhancing maximum security.
Middleware Layer in Code: Flexible Customization (Node.js, Python, Java)
Using middleware directly in the source code gives you the flexibility to apply complex rate limiting rules based on separate business logic.
Sometimes, you need to limit based on user rights (for example, Free package users get 100 req/day, Pro package users get 1000 req/day). At this time, writing Middleware directly in the code is the best choice.
- Triển khai Rate Limiting trong Node.js: Với Express.js, thư viện
express-rate-limitlà tiêu chuẩn vàng. Chỉ cần vài dòngapp.use()là bạn đã có một lớp bảo vệ cơ bản. - Triển khai Rate Limiting trong Python: Nếu bạn xài FastAPI hay Django, các thư viện như
slowapihoặcdjango-ratelimithỗ trợ cực tốt thuật toán Token Bucket. - Triển khai Rate Limiting trong Java: Trong hệ sinh thái Spring Boot, thư viện
Bucket4jlà một vũ khí hạng nặng mà mình luôn khuyên dùng để kiểm soát lưu lượng một cách tinh tế nhất.
Difficult Math in Microservices: Distributed Rate Limiting with Redis
In a multi-instance microservices system, using Redis as a centralized repository is required to synchronize rate limit counters between servers.
When your system scales up to a Microservices Architecture with dozens of containers running in parallel, using local RAM to count requests becomes meaningless. Rate Limiting in Microservices requires a distributed Rate Limiting solution. This is where Redis shines brightly.
By storing counters on a centralized Redis cluster, every instance of the service reads and writes to the same place. Using Lua script on Redis helps ensure atomicity when subtracting tokens, completely preventing race condition errors when thousands of requests arrive at the same time. When operating such complex distributed systems, monitoring is vital. You should proactively set up Monitoring server Uptime Kuma Netdata tools to monitor the status of Redis and realtime application nodes, ensuring no bottlenecks occur.
Never consider the rate limiting DDoS spam API as a "nice to have, okay to not have" feature. At Pham Hai, through many real projects, I always consider it a mandatory part of the security and operations checklist. Deploying early from the beginning not only helps you sleep better every night, but also ensures system resources are used in the most equitable way. A little setup effort today will save you from huge headaches, sky-high server bills, and customer complaints about Rate Limiting attacks in the future.
What rate limit algorithm is your current system using? Token Bucket, do you have any good "tips" for distributed optimization with Redis that you would like to share? Please leave a comment below, I'd love to learn more from your real-life experiences!
Note: The information in this article is for reference only. For the best advice, please contact us directly for specific advice based on your actual needs.