The first time I sat down to benchmark io_uring, I almost didn't believe the numbers. Latency cut by half. Throughput tripled. The kind of performance that makes you wonder if the old ways were ever really that good. Or if we were just too stubborn to change.
For years, epoll was the undisputed king of I/O in Linux. Every web server, every proxy, every network-heavy application leaned on it. Its API was ugly but functional—a necessary evil for the sake of speed. But ugly services are still services, and until recently, nobody had a credible challenger.
Then came io_uring, a new asynchronous I/O interface that bypasses the entire system-call model. Instead of calling into the kernel a thousand times a second, you submit a batch of operations via a shared ring buffer and collect the results later. The kernel doesn't even know it's being asked until it's already halfway done.
And that's where the fight gets interesting.
The Old Guard's Last Stand
Epoll's strength is its simplicity—deceptive though it may be. You give it a file descriptor, and it tells you when something's ready to read or write. That's it. No buffers. No memory management. Just a notification switchboard that's been polished for two decades.
But that simplicity is also its curse. Every epoll_wait call is a system call—a context switch, a trip into kernel space, a serialization bottleneck. On a heavily loaded server, those trips add up, and with CPUs running at millions of cycles per nanosecond, each one leaves a bruise.
I've worked with teams that optimized their epoll loops to within an inch of a deadlock. Event batching, hand-tuned timeouts, even exotic CPU pinning. It was art. It was also a losing battle. The kernel wasn't built for this scale.
Io_uring's Fresh Attack
Io_uring throws the old rules out the window. No more system calls for every operation. Instead, you set up two ring buffers on shared memory with the kernel—one for submission, one for completion. You fill the submission queue with read requests, write requests, open calls, even fsync. Then you flip a single bit to tell the kernel, "Work waiting." The kernel processes them asynchronously and posts results to the completion queue. You drain that queue at your leisure.
The performance jumps are not subtle. In database workloads, io_uring slashes the number of system calls by an order of magnitude. Storage-heavy applications see IOPS numbers that were once theoretical. But it doesn't come free.
The API is complex. You're managing your own memory, pinning pages, dealing with SQ and CQ ordering. The setup alone can take a week of cursing before it works correctly. And the kernel support isn't universal—io_uring was introduced in 5.1 with significant improvements in 5.6, but enterprise distributions still ship older kernels by default.
The Trade-Offs Nobody Talks About
Here's the thing the sock puppet benchmarks won't tell you: epoll still wins on latency for low-connection scenarios. If you're running a proxy with 500 connections, io_uring adds overhead without giving much back. The ring buffers themselves need memory. The polling logic can starve other processes. And on hardware without proper NVMe support, the gains shrink to a whisper.
But when you're pushing 100,000 concurrent connections—the kind of load that makes epoll's event loop look like a lazy dog—io_uring starts to sing. The lack of system call overhead means the CPU actually has time to process data instead of spending its life switching rings.
The war isn't about which is faster in a vacuum. It's about which scales better when the vacuum is filled with real traffic.
I've seen the numbers from a major CDN's internal tests. At 500 connections, epoll beats io_uring by 12%. At 50,000 connections, io_uring is 40% faster. At 200,000, epoll can't even keep its head above water while io_uring chugs along at 80% utilization.
The Verdict Isn't in Yet
So where does this leave the average developer? If you're shipping a new project today, start with io_uring. The complexity is real, but the payoff is too big to ignore. Tools like liburing ease the pain. The kernel community is actively investing in it. It's the future.
But if you're maintaining a legacy system that works, don't rewrite it just for the benchmark. Epoll doesn't suck. It's just not the future. io_uring is. And that's the kind of truth that hits you like an interrupt at 2 AM.
The kernel doesn't care about your nostalgia. It just wants to move data. And io_uring moves it faster.



