The Illusion of Concurrency

For the past two decades, the software industry has been marching in two directions at the same time.

Hardware gravitated toward parallelism. We hit the limits of Moore’s law and CPUs stopped getting dramatically faster per core and, to compensate for this, started adding more cores instead. Four cores became eight. Eight became sixteen.

Software, naturally, followed with multithreading. Frameworks, runtimes, and languages all embraced concurrency in their own way. Yes, even Python, through the multiprocessing module. Thread pools, asynchronous execution, task schedulers – everything promised the ability to do many things at once.

And yet, strangely, many modern systems barely use the parallelism available to them.

Nowhere is this more visible than in modern web applications.

Despite running on machines with many cores, the actual throughput of these systems often scales poorly. Adding more threads doesn’t help much. Just like adding lanes to an already busy highway, sometimes it makes things worse.

A Typical Modern Web Stack

A modern web application is no longer just a server responding to requests. A large portion of the system now lives in the browser. Frameworks such as React, Vue, or Angular maintain complex client-side state: UI models, cached API results, routing state, authentication tokens, and application data synchronized with the server. In many cases the browser is effectively running a full application runtime.

However, despite running on multi-core machines, the client side typically executes almost all of its logic inside a single-threaded event loop. UI updates, application logic, network responses, timers, and state mutations are funneled through the same serialized execution model. To avoid race conditions, client frameworks centralize state into stores or component trees that are mutated in carefully controlled ways.

Meanwhile, the server side – often implemented with frameworks such as Flask, Laravel, Django, ASP.NET or one of the Node.js things (yes, let’s bring JavaScript to the server, that’s a great idea) – handles requests that interact with shared infrastructure like databases, caches, and many different service layers. Each request may be isolated in principle, but it ultimately touches systems whose state is shared across many requests.

The result is a system where both sides of the web architecture are constrained by the same assumption: state is centralized and must be mutated carefully. On the client this produces a single-threaded execution model, while on the server it creates coordination around shared resources. In both cases, the architecture quietly limits how much real parallel work the system can perform.

Concurrency That Isn’t

On paper, a web server may handle hundreds of concurrent requests. In reality, those requests constantly compete for shared resources.

Database connection pools have limited slots. Caches require synchronization. Logging frameworks serialize writes. ORMs maintain internal state. Authentication contexts rely on shared stores.

To prevent corruption, these systems rely on locks, queues, atomic operations, and other synchronization mechanisms. The result is that threads frequently wait for each other: when ten threads compete for a shared structure, only one can safely modify it at a time and the other nine pause.

From the outside, the system still appears concurrent. There are many threads. Many requests are “in flight”.

The system is busy coordinating access to state rather than doing useful work.

Add to this the fact that most modern frameworks take a 10x – or bigger – performance hit from being written in interpreted languages or JIT bytecode instead of running as a binary that would benefit from the native speed of the CPU running it.

The Database Bottleneck

Besides rewriting every web service in C being unrealistic, the most obvious pain point is the database.

In many web applications, nearly every request interacts with a relational database. The database becomes the central repository of state for the entire system.

This creates a hard ceiling on concurrency.

No matter how many web server threads exist, they eventually converge on the same database engine, the same tables, and sometimes even the same rows.

At that point, the database becomes the global lock of the entire system.

Developers often try to solve this by adding read replicas, more caching layers or distributed caches. But these are mostly mitigation strategies that bring their own problems. They don’t fundamentally change the architecture that created the pressure in the first place.

Frameworks Encourage the Pattern

Ironically, many modern frameworks actively encourage this design.

Dependency injection containers provide global services.
ORMs track mutable entity objects.
Application contexts store shared runtime information.
Singleton services become repositories of mutable data.

These patterns feel clean and organized. They structure applications nicely and make individual components easy to write.

But they also quietly encourage the spread of shared mutable state throughout the system.

And every new piece of shared state becomes another coordination point, another place where threads must wait.

Stateless Work Is Different

When work is stateless, parallelism becomes natural.

If a request handler simply receives input data, performs computation, and produces a result without modifying shared structures, it can run anywhere, at any time, on any core.

No locks required.
No coordination necessary.
No threads congestion.

This is why large-scale computing systems increasingly rely on ideas such as immutable data, message queues, and event-driven processing.

Some great examples of this are found in the VFX industry. I recommend reading through the notes (or if you can find a recording, even better) of James Reinders’ Multithreading talks. Here’s a link to the latest set of slides publicly available. The compute-centric performance approaches are implementable in larger scale as well.

These approaches don’t eliminate state entirely, but they push it to the edges: databases and storage systems, while keeping the bulk of computation independent.

Parallelism works best when the architecture does not constantly fight it.

Summary: The Illusion

Modern web applications often give the impression of massive concurrency.

Hundreds of requests in flight, dozens of worker threads, async runtimes managing thousands of tasks.

Underneath, much of that concurrency collapses into serialized access to shared state. Parallelism exists, but the architecture quietly prevents it from being used.

And so we end up with something that looks like concurrency from the outside, but behaves very differently once you look inside.

That is the illusion.

A Different Direction for the Web

A realistic way out of this would acknowledge two facts. First, modern applications need state. Second, both servers and clients now run on machines with many cores. The goal, therefore, should not be to eliminate state, but to partition it so that it does not destroy concurrency.

Instead of concentrating state inside large monolithic server processes, application state could be divided into independent shards of authority that can execute concurrently without coordinating among themselves. Each shard would own its portion of the state and process messages sequentially, avoiding shared mutable structures between threads. This keeps the advantages of stateful logic: rich domain models, caches, long-lived objects – while preventing multiple threads from constantly competing for the same data.

Just as importantly, the client side should stop pretending to be single-threaded. Modern browsers and client runtimes sit on powerful multi-core machines but typically funnel application logic through a single event loop. A better architecture would allow the client to host multiple concurrent actors or workers, each owning part of the client’s state and interacting with server shards through explicit message passing. The client becomes a genuinely parallel participant in the system rather than a thin UI shell.

In such a model, state is not centralized and shared by everything. Instead it is owned, partitioned, and executed where it lives, both on the server and the client. Stateful programming remains possible, but contention disappears because no two threads fight over the same memory. Concurrency stops being something we simulate with thread pools and async frameworks and becomes what the hardware was built for: many independent computations running at the same time.

Maybe this means getting rid of the DOM. I don’t know. I’ve always been fascinated by a potential return to simpler times, actual designed user interfaces that aren’t limited by a box layout. There’s even a JavaScript version of Dear IMGUI.

Example: McMaster-Carr

A good example of this philosophy in practice is the website of McMaster-Carr.

Despite offering a massive catalog and complex product data, the site feels almost impossibly fast and responsive. Pages render immediately, navigation rarely blocks, and the system behaves as though each request is largely independent. Much of the interaction logic lives on the client side, while the server side focuses on delivering well-structured data quickly rather than maintaining large amounts of per-session state. The result is a system that scales cleanly and feels instantaneous to users, not because it uses exotic technology, but because its architecture avoids turning every request into a coordinated dance around shared mutable state.