Every commerce engineer has a war story about inventory. Mine is from a Black Friday at a previous job where we oversold a sneaker by 1,400 units. Three storefronts had read from three caches, all of which said "stock available." The reconciliation job ran at 2 AM and discovered we owed the universe more shoes than we had ever manufactured. The CTO was a kind person. He did not fire me. He did suggest, however, that the architecture might need a rethink.

That rethink, eight years later, is the inventory graph at the heart of Polluxa. This post is what we learned.

The contract, not the storage

The temptation, when you start a project like this, is to spend the first three months arguing about databases. Postgres versus FoundationDB versus a custom Raft-based store. Don't do that. The hard problem in inventory is not where to put the number. It is what the number means.

Specifically: the number you publish to a storefront has to mean "if a customer hits buy right now, we will be able to ship this." That is a much harder claim than "we have N units in warehouse W." Available-to-promise is a calculation, not a count. And the second you have multi-warehouse, multi-channel, multi-currency, partial-allocation logic on top, the calculation gets gnarly fast.

So step one was writing the contract before writing any code. We spent four weeks on a 14-page document defining every type — OnHand, Allocated, Reserved, InTransit, Available, Promised — and the legal transitions between them. The document had no implementation in it. Just words. We argued about words.

Why a graph, not a table

A SKU is not a row. A SKU is a node that connects to warehouses, batches, channels, suppliers, reservations, transit shipments and returns. Each of those is a node too. They have edges with their own properties — quantity, expiry, cost, customer-tier override.

Modelling this in flat relational tables works for the simple case. It collapses the moment you try to express "show me the available stock for Linen Co-ord size M in Mumbai, but pretend the 14 units reserved for marketplace overflow can be borrowed if D2C asks." That sentence has six joins. Six joins at a 30,000 RPS read rate is not going to happen.

We model the same query as a graph traversal with hard-cached projections per channel. The cache invalidates on any state change to the involved nodes. The projection is recomputed in single-digit milliseconds. This is the part that lets us hit the <3 second channel sync number we publish.

Available-to-promise is a calculation, not a count. The number on your storefront is a promise. Either you can keep it or you can't. There is no third option.

The thing that took the longest

It wasn't the projection cache. It was getting Reservation right. A reservation is a tentative claim on stock that hasn't yet been paid for. Carts, draft orders, marketplace overflows, customer-tier holds — all reservations of various ages.

The bug we kept hitting was zombies. A reservation that the storefront forgot to release. Multiplied across millions of carts, we were holding 3% of the catalog hostage to abandoned shopping baskets. So we wrote a contract for reservations too: every reservation has a holder, a ttl, and a renewal obligation. If the holder doesn't renew before TTL, the reservation evaporates. No exceptions, not even for marketplaces.

The marketplaces did not love this. They argued for indefinite reservations. We did not budge. Two years later, the marketplaces are quietly grateful, because their channels also stopped overselling.

Multi-warehouse: the joke is on you

The classic multi-warehouse problem is: same SKU in two warehouses, two channels, who gets to read which? The answer is "neither, both, and it depends" — and you have to be explicit about which.

Our model has four primitives. Pin commits stock to a channel. Pool shares stock across channels with a priority order. Buffer reserves N units that nobody can touch. Spill lets a higher-priority channel temporarily steal from a lower one when its own well runs dry.

Customers set these per-SKU per-channel-pair in a matrix. The matrix is huge and our compliance team wanted to ship it as a YAML file. We shipped a real UI instead. Buyers love it. The compliance team forgave us.

What we'd do differently

Two things. First, we'd write tests before we wrote the contract. We had a contract document and an implementation, and a year in we discovered they had quietly diverged. Tests against the contract document, not the implementation, would have caught it sooner.

Second, we'd have invested earlier in the simulator. We now have a simulator that replays a synthetic Black Friday — 14 million events across 12 channels in 6 hours — against any candidate change. It's the single most useful tool we built. The mistake was thinking we'd build it after the first prod incident. Just build it first.

If you want the deep technical detail, the platform team is happy to nerd out on a follow-up call. And if you have an inventory war story, send it. We collect them.

Tagged: engineering · inventory · graph ← Back to all posts