The geopolitics of artificial intelligence is debated almost entirely as a production problem: who makes the chips, builds the data centers, trains the frontier models. The more immediate exposure for most countries sits elsewhere, in the flow — call it the token flow problem. The 2026 Hormuz disruption restated the oil century’s version of that lesson; this brief makes the equivalent analytical move for AI.
The LLM market presents a paradox. Model catalogs are diversifying and commodity-tier token prices have collapsed, roughly 600-fold since 2020 (Du, 2026). Yet enterprise API dollar flow has grown more concentrated: an estimated 88% lands with three U.S. providers (Menlo Ventures, 2025). The mistake is to read the first fact as fading dependence. The leverage has moved — from model production to the recurring passage of inference requests through enforceable gates: account eligibility, supported-country rules, payment rails, rate limits, routing platforms, contracts. The gates are documented, not hypothetical: providers publish the country lists, and payment methods from outside them are grounds for blocking.
The production blind spot
In February 2026, the lesson the oil century taught repeatedly stopped being a reminder and became an event. The Strait of Hormuz — through which about one-fifth of global petroleum liquids consumption passed in early 2025 — was disrupted by regional conflict. Control over flow, the narrow points a commodity must pass through, confers leverage that ownership of production does not capture, and no producer-side statistic predicted what disrupting the strait would do.
The AI policy debate of 2024–2026 has been, with few exceptions, a production debate. Export controls target chips; national strategies fund compute clusters and sovereign foundation models; “sovereign AI” offerings bundle local data centers with locally hosted models. The underlying theory of vulnerability is extraction-shaped: whoever owns the means of producing intelligence holds the power. The leverage sits elsewhere — in the gates every hosted inference request must clear (Figure 1).
Consider one such request. A three-person startup in Tunis has built a document-processing tool for regional clients; each job sends a few hundred thousand tokens to a frontier model and streams the answer back. Before a single token returns, the request must clear a series of gates, none of which the startup controls. The account must exist, which requires the provider to serve Tunisia at all. It must be funded — a binding constraint in Tunisia, where dinar cards are not approved for foreign-currency transactions and neither Stripe nor Adyen onboards businesses anywhere in North Africa. The request must fall within a rate-limit tier that advances with cumulative payments. The model must be available in the jurisdiction; the use must comply with terms defined under foreign law. Only then does the request travel — national gateway, submarine cable, European edge node, the provider’s private network — and the route itself is not guaranteed. Every gate is ordinary, defensible, and open on most days. That is precisely what the oil century teaches about chokepoints: their politics are invisible until the flow matters more than usual, or the gate-keeper’s incentives change.
A growing body of work complicates the production frame: full-stack AI sovereignty is structurally infeasible for almost any country (Tanner, Kerry, et al., 2026); deployment-layer control sits atop U.S.-controlled dependencies — the “sovereignty gap” (Chavez, 2026); vendors selling “sovereign AI” back to states define it on their own terms (Yew et al., 2026). What none provides is a systematic account of the flow itself: how inference moves, through which channels, under whose control, with what options for rerouting.
Figure 1. One inference request, many gates (animated). A token does not travel like oil, but every hosted inference request passes through enforceable gates — supported-country rules, payment rails, rate limits, contracts. The animation traces a request through seven stages and shows what happens when one gate closes. Schematic; gate placement is illustrative.
The form of flow
The method comes from the political economy of energy. Timothy Mitchell’s Carbon Democracy (2011) showed that the political possibilities of the coal and oil eras were shaped less by who owned the resource than by the material form of its flow. Coal moved through branching networks with chokepoints where concentrated labor could interrupt the flow; oil was redesigned around that vulnerability — fluid, capital-intensive, able to reroute around blockages. The transition did not eliminate concentration; it relocated it, from the mine and the railhead to the corporate-state cartels governing routes and prices. Three tools carry over: the form of flow (the channel, not the producer, is the unit of analysis), the control point (a chokepoint becomes political only when actors can act on it), and political affordances (what an infrastructure’s form lets differently positioned actors do).
Applied across computational history, the method reveals three regimes, each with a new layer of concentrated power (Figure 2). The internet regime (1990s–2000s) sold connectivity over a redundant, open-protocol mesh; control points existed but were thinly activated. The cloud regime (2010s) moved concentration to hosting — who runs your workloads, under which contracts and jurisdictions (Srnicek, 2017; Narayan, 2022). The LLM regime (2022–present) sells algorithmic reasoning itself, metered per token, flowing from an application through an aggregator to a hosted model and back, every leg governed by API terms, rate limits, geographic rules, payment rails, and contract.
Training remains a strategic chokepoint — chip export controls prove that. But training is episodic and hosted inference is recurrent: every request, every hour, must pass through an account, a gateway, a payment rail, a rate limit, and a jurisdictional rule. That recurring passage creates an additional layer of leverage — continuous, fine-grained, contractual — that production-side policy does not capture.
How inference actually flows
The most direct measure of leverage is where the money lands. Menlo Ventures’ end-2025 enterprise survey estimates LLM API spend at Anthropic 40%, OpenAI 27%, Google 21% — a combined 88% (Figure 3), with the remaining 12% spread across Meta’s Llama ecosystem, Cohere, Mistral, and a long tail. Nor is this transient: the top three’s combined share rose from roughly 69% in 2023 even as the lead changed hands within the oligopoly. Leadership churn atop durable concentration is precisely what distinguishes a structural chokepoint from a temporary market position. The concentration triangulates: a16z’s January 2026 CIO survey independently places ~90% of enterprise model spend with the same three providers, while inverting the ordering. Contested ordering, agreed concentration (~88–92%).
Prices, meanwhile, have collapsed — but unevenly. Du (2026) estimates a roughly 600-fold decline in token prices since 2020, with commodity tiers halving every one to one-and-a-half years while flagship-tier prices show no smooth decay, sustained by a reasoning premium (Figure 4). The policy meaning is double-edged: commodity intelligence is becoming radically cheap, good news for token importers, but frontier reasoning — the tier high-value applications need — behaves like a differentiated good whose sellers, overlapping heavily with the firms capturing the enterprise dollars, have so far sustained its price. Dependence on the frontier tier is not being eroded by deflation.
Falling prices look like falling dependence; the opposite is closer to the truth. When a useful input gets cheaper, economies do not pocket the savings — they embed the input deeper. The mechanism is the shift from chat to agents. A chat exchange consumes thousands of tokens, human-paced; an agentic workload consumes millions per task, machine-paced, in long autonomous loops. As unit prices fell, total consumption rose steeply: enterprise model-API spending more than doubled in eight months to $8.4 billion by mid-2025, and tokens through OpenRouter grew roughly tenfold in 2025. This loads the flow. A chatbot outage is an inconvenience; an agent outage halts production workflows. Deflation is not eroding this dependence — it is financing its expansion.
The market generates two statistics that appear contradictory (Figure 5). The HHI of the broad inference catalog fell sharply over three years — crossing from “highly concentrated” to “moderately concentrated” as open-weight entrants diversified supply — while 88% of enterprise dollars still flow to three firms. These measure different layers. The falling index reflects a vibrant bazaar of models available to experimenters; the 88% reflects enterprise procurement. A falling index at the first layer says nothing about leverage at the second.
The structure follows a three-tier supply chain (Demirer et al., 2025): model creators at the top, where dollars concentrate; inference providers (Azure, Cerebras, Together AI, Groq) running intensely price-competitive compute; and aggregators (OpenRouter and peers) routing demand — simultaneously a resilience tool and a new single point of failure. Dependence is stickier than catalog diversity suggests: only 11% of surveyed builders switched primary vendors in the year to mid-2025, and open models’ token share remains below 30%. Redundancy is growing; substitutability — swapping providers in production at speed without capability loss — still lags.
“U.S. dominance of AI” conflates at least three flows with different geographies and chokepoints (Figure 6): a consumer flow routed through a handful of U.S. applications (chokepoints: app stores, consumer payment rails, content policy); a developer flow that is markedly more international — over half of OpenRouter usage originates outside the U.S. — gated by API geo-availability, developer payment rails, and rate limits; and an enterprise flow that is dollar-concentrated and gated by contracts, data residency, and compliance regimes. Policy that treats the three as one will misdiagnose exposure.
Where MENA is exposed
MENA is not one AI market. It includes Gulf states investing in compute and national models (the production-side pole); middle-income developer economies that mostly import tokens (the primary concern here); sanctioned or conflict-affected contexts where access rules bind differently; and public-sector adopters facing procurement and continuity questions. The regional literature has concentrated on production governance, ethics, and Arabic’s structural penalty under English-centric tokenization — but has left the consumption flow largely unexamined.
At the provider layer, as of June 2026, fifteen of nineteen MENA jurisdictions appear on all four major U.S. positive country lists; Yemen and Sudan each appear on three; Iran and Syria appear on none, and Mistral, the European provider, names the same two exclusions — the EU/U.S. divergence many expect does not exist at this layer. The payment layer is the broader constraint: Stripe onboards businesses in exactly one MENA country (the UAE), Adyen in none, and the Maghreb and Levant operate under documented central-bank restrictions on international card use — Algeria’s effective bar, Libya’s quota, Tunisia’s exclusion of dinar cards from foreign-currency transactions, Lebanon’s fresh-dollar regime. Since every major provider gates API throughput behind cumulative-payment tiers, payment-rail access is a chokepoint stacked on formal availability. Palestine is the distinctive case: rails technically functional, yet every major processor excludes West Bank and Gaza residents.
Iran is the existence proof of correlated constriction — simultaneously absent from every provider list, excluded from every payment rail, and barred by sanctions regulation: constriction at all three layers at once, by law. Syria is the instructive transitional case: comprehensive U.S. sanctions were revoked in July 2025 and the Caesar Act repealed that December, yet Syria remains absent from every major provider’s supported list. The gates have outlived their stated rationale.
The strategic risk for non-sanctioned importers is correlation of a softer kind. The jurisdictional instruments deserve precision rather than alarm: under the CLOUD Act, exposure follows jurisdiction over the provider, not the location of the server — narrower than its reputation, but real. Export controls on inference are a foreseeable channel, not yet a binding regime. And the flow layer has a physical stratum: when submarine cables were cut near Jeddah in September 2025, on a corridor carrying ~17% of intercontinental traffic, connectivity degraded from the Gulf to South Asia within hours — and hosted inference rides the same cables. Rerouting absorbed the shock in days, as oil cannot. The point is that MENA’s token imports transit a narrow set of physical corridors and a narrow set of jurisdictional gates, governed by overlapping firms and states. Correlated passage is the region’s inherited condition; the token flow is its newest layer.
The production-side response — sovereign compute — addresses little of this. A national GPU cluster does not keep a Cairo or Tunis startup’s API account funded, change the governing law of a Riyadh enterprise’s cloud contract, or provide a frontier-quality fallback when access tightens. The exposure is on the flow side; the spending is on the production side.
Resilience over self-sufficiency
The form-of-flow analysis converges with the “managed interdependence” position (Tanner, Kerry, et al., 2026) but sharpens it: managing interdependence requires mapping flows, not just suppliers. The recommendations — segmented by audience in the card above — are ordered by feasibility and deliberately start with what a single ministry can begin within ninety days, because the prerequisite dependency data does not yet exist.
The open-weight window is affordable but conditional: open-weight models are a resilience resource only when the weights are actually held, served, tested, and integrated before a crisis. The 90%-cheaper finding makes a tested fallback stack affordable; the sub-30% consumption share and 11% switching rate measure how much testing stands between the price window and an actual fallback. For the critical-infrastructure agenda, the instrument ladder runs from visibility (dependency inventories, incident reporting) through continuity (service expectations, payment continuity, portability) to international arrangements — the analogue of the energy-security frameworks built after the oil shocks.
Each infrastructural transition of the past three decades has relocated the layer at which control over computational flow concentrates: from network connectivity, to platform hosting, to the model-API layer that now meters machine reasoning. The current policy conversation, fixated on producing that capability, is analyzing the wells and ignoring the strait — in a year when an actual strait was disrupted and an announced threat alone rerouted shipping. For most countries the relevant question is not how do we produce intelligence? but how do we keep it flowing on acceptable terms when conditions change? That is a resilience question, answerable with instruments that exist today, beginning with a dependency map any ministry could start this quarter.
Importing Intelligence — full policy brief (PDF) The complete argument, statistical apparatus, Notes, and full bibliography. Licensed CC BY 4.0.Citable archival version of record: doi.org/10.5281/zenodo.20677200 (Zenodo).