The Oracle Machine

On prediction markets, reasoning agents, and the last job worth having

In 1907, at a livestock fair in Plymouth, England, eight hundred people guessed the weight of an ox. The crowd included butchers, farmers, and people who had never touched a cow in their lives. Francis Galton had come to the fair looking for evidence that democracy was foolish. He collected the slips, expecting nonsense. Instead, the median guess was 1,207 pounds. The ox weighed 1,198. The crowd was off by less than one percent.

Galton was disturbed. He had wanted to prove that ordinary people couldn't be trusted with important decisions. What he found was something stranger: that a sufficiently large group of independent minds, each bringing their own fragment of knowledge, could approximate truth more reliably than any single expert. He published the result, grudgingly, in Nature.

I've been thinking about Galton's ox a lot lately. The lesson isn't really about the past. It's about what's coming: a future where the "crowd" is no longer eight hundred people at a county fair. Picture instead millions of artificial reasoning agents, each with access to more information than any human who has ever lived, placing bets on what will happen next.

The thing nobody talks about

Here is what I suspect most people get wrong about the singularity: they assume it resolves disagreement.

The standard narrative goes something like this. Superintelligent systems arrive. They are so much smarter than us that the correct answers to our hardest problems become obvious. Politics dissolves. Ideology evaporates. We enter a kind of epistemic utopia where the right thing to do is simply known.

This is a fantasy. And it's a dangerous one, because it lets us avoid the harder question: what happens when godlike intelligence arrives and we still disagree?

Because we will still disagree. Arithmetic and protein folding will yield to raw computation. Values will remain contested. What kind of world is worth building? How much risk is acceptable, how much inequality is tolerable, how much freedom should be traded for safety? These questions resist optimization. They are political problems in the deepest sense, arising whenever people with different interests and different visions of the good must share a world.

Half of AI researchers surveyed believe there is at least a 10% chance that advanced AI leads to human extinction. The other half, presumably, think the odds are lower. Even among optimists, there is no convergence on what "going well" looks like. Does it mean radical abundance? Existential preservation? Digital transcendence? Slower growth with deeper meaning? The superintelligence doesn't answer this. It only makes the stakes unbearable.

So here is the question I keep circling: in a world where artificial minds can model the consequences of any policy with extraordinary precision, what institution could possibly be trusted to decide which consequences we should aim for?

I think the answer might be the oldest information technology we have. Something at the intersection of democracy and markets, an institution we are only now beginning to understand how to build.

Vote on values, bet on beliefs

In 2000, the economist Robin Hanson published a paper with one of those titles that lodges in your brain and won't leave: "Futarchy: Vote Values, But Bet Beliefs." The New York Times named it a buzzword of 2008, which tells you something about its persistence. Ideas that become buzzwords usually die. This one kept growing.

The proposal is deceptively simple. Hanson observed that most political disagreements confuse two distinct things: what we want, and how to get it. We argue endlessly about policy because we've tangled up our values (which are genuinely subjective) with our predictions about consequences (which are, in principle, testable). A left-wing economist and a right-wing economist might both want GDP growth, lower mortality, and less pollution. They disagree about whether a carbon tax or a cap-and-trade system will actually deliver those outcomes. That disagreement is empirical. It has a right answer. We just don't know it yet.

Hanson's insight was that prediction markets (markets where people bet on future outcomes) are spectacularly good at resolving exactly this kind of empirical question. Better than polls. Better than expert panels. Often better than the experts themselves.

The rule for futarchy: When a betting market clearly estimates that a proposed policy would increase expected national welfare, that proposal becomes law.

Citizens vote on what "welfare" means; they define the metric. Then the market takes over, aggregating the dispersed knowledge of thousands of participants into a single, precise probability estimate of which policy best achieves the goal. You keep democracy where democracy belongs (choosing values) and replace it with something better where it fails (predicting consequences).

When Tyler Cowen, Hanson's colleague at George Mason, pushed back, arguing that you can't cleanly separate values from beliefs, that democracy serves expressive and community-building functions beyond mere computation, Hanson conceded the difficulty but held his ground. Even a partial implementation, he argued, would represent an enormous improvement over the status quo, where policy is determined by a toxic mixture of ideology, lobbying, and motivated reasoning.

Hanson estimates at least a 30 percent chance that futarchy "will have some substantial scope in a hundred years." For a mechanism that could reshape civilization, those odds seem worth taking seriously.

But here is what I find interesting: Hanson proposed futarchy in 2000, when the "crowd" in his prediction markets was necessarily human. Humans with jobs and families and cognitive biases and limited attention. What happens when the crowd is something else entirely?

The machines learn to bet

In Metaculus's Q2 2025 AI forecasting tournament, the best artificial agent (a system called Mantic) placed eighth out of 549 contestants. This included hundreds of experienced human forecasters, many of them in the top tiers of Philip Tetlock's "superforecaster" taxonomy. One quarter earlier, the best bot had placed twenty-fifth.

Toby Shevlane, Mantic's CEO, called it "actually kind of mind-blowing."

I think he's underselling it. The trajectory matters more than the snapshot. In Q3 of 2024, the best AI bots scored -11.3 against professional forecasters on Metaculus's standardized scale. By Q4 2024, the gap had narrowed to -8.6. The improvement rate is roughly three times faster than anything we've seen in human cognitive performance over comparable periods.

Metaculus's own community now estimates a 74% probability that an AI system will win a major forecasting tournament by the end of 2026. The median forecast for full AI-human parity in forecasting is November 2026. Soon. Very soon.

And here's the thing about forecasting that makes it different from chess or Go or protein folding. When Deep Blue beat Kasparov, it didn't change how the world made decisions. When AlphaFold solved protein structure prediction, it accelerated one domain of science. When AI systems become reliably better than humans at predicting the consequences of policies, economic shifts, technological developments, and geopolitical events? That changes everything. Prediction is the substrate of all decision-making. It's the hidden variable in every argument about what we should do.

The numbers in financial markets tell a parallel story. The global AI trading platform market hit $11.23 billion in 2024 and is projected to reach $33.45 billion by 2030. Deep neural networks already achieve 76% accuracy on S&P 500 price movements. LSTM networks hit 82% precision on forex pairs. These are narrow applications (predicting price movements in liquid markets) but they demonstrate the principle. Artificial minds are learning to bet on the future, and they're getting disturbingly good at it.

The aggregation engine

Consider what happens when you combine three trends that are each, independently, already underway.

First: prediction markets work. This is no longer seriously disputed. The Iowa Electronic Markets, running since 1988, have consistently outperformed polls at time horizons from one day to several months. In the 2024 U.S. presidential election, prediction markets collectively were "far and away the best forecast," according to Wake Forest economist Koleman Strumpf. Polymarket reached 95% probability for Trump's victory six hours before the Associated Press called the race, while polls still showed a toss-up.

Second: AI agents are approaching and will soon exceed human forecasting ability. The Metaculus data is unambiguous on trajectory, even if the exact crossing point is uncertain.

Third: the cost of deploying an AI agent is dropping toward zero. What once required a team of quants at a hedge fund will soon be available to anyone with a laptop and an API key.

Now imagine these three trends converging. Within the next few years. Close enough to taste.

You get prediction markets with functionally infinite liquidity. The participants are no longer a few thousand human traders; they are millions of reasoning agents, each one capable of processing more information than a human analyst could absorb in a lifetime. Markets covering elections and sports, yes, but also every question that matters: Will this drug reduce mortality in patients over 65? Will this zoning reform increase housing starts? Will this climate policy reduce emissions faster than the alternative? Will this education intervention improve outcomes for low-income students?

Vitalik Buterin, who made $58,000 betting on the 2020 election and another $70,000 on Polymarket, sees prediction markets as "only one example of a much larger incredibly powerful category" that he calls "info finance." The market is the engine. Some other trustworthy mechanism is the steering wheel.

What if the engine becomes so powerful, so deep, so liquid, so fast, that the steering wheel barely needs to turn? What if the market, populated by millions of reasoning agents with superhuman forecasting ability, simply knows which policy will work better, and the remaining human role is just to say what "better" means?

The alignment no one expected

Here is where the argument takes a turn that I suspect will be controversial, but which I think is actually the most important part.

The AI safety community has spent years (decades, really) trying to solve the alignment problem: how do you ensure that artificial superintelligence pursues goals compatible with human flourishing? The standard approach involves techniques like reinforcement learning from human feedback, constitutional AI, interpretability research, and various forms of oversight and monitoring. These are valuable. They are also, I suspect, insufficient on their own.

The fundamental difficulty of alignment is that it requires specifying what we want. And what we want is complicated, contextual, and often contradictory. The AI safety literature is full of cautionary tales about proxy goals, systems that optimize for a measurable target and in doing so violate the spirit of what we actually care about. The technical term is "reward hacking," which is a polite way of saying the machine found the loophole.

Prediction markets offer something different. They don't require you to specify what you want in advance with perfect precision. They require you to specify outcomes, measurable, verifiable states of the world, and then let the market figure out how to get there. The measurement is the alignment mechanism. If you define welfare as some combination of life expectancy, median income, educational attainment, environmental quality, and self-reported life satisfaction, and if you have a prediction market deep enough and liquid enough to estimate the effects of any proposed policy on those metrics, then you don't need to align each individual AI agent with human values. You just need each agent to be aligned with making accurate predictions, which is a much simpler and more tractable problem.

This is, I think, one of the most underappreciated features of the agentic prediction market vision. It sidesteps the hardest part of alignment by converting a values problem into a calibration problem. An AI agent participating in a prediction market doesn't need to understand human values. It needs to understand reality well enough to make bets that resolve correctly. The values are expressed by humans (who vote on what to measure) and the AI's only job is to predict, as accurately as possible, what will happen.

The 2024 FLI AI Safety Index found that even the best AI companies (Anthropic ranked highest) have "significant gaps in accountability, transparency, preparedness." Frontier models remain vulnerable to jailbreaks and adversarial attacks. Companies struggle to resist profit-driven incentives without independent oversight. These are all real problems. Notice, though, that prediction markets create a natural form of oversight: every prediction is testable, every bet resolves, and accuracy is public. You can't jailbreak a market. You can't adversarially attack a price. The incentive structure is self-correcting in a way that no corporate governance policy can match.

The whale problem, and why agents solve it

I should be honest about the problems, because this is the point where the counterarguments start to bite.

The biggest vulnerability of prediction markets, as they exist today, is concentration. In the 2024 election, a French trader known as "Théo" placed over $80 million in bets on Trump across eleven Polymarket accounts. He made $78.7 million in profit. His outsized positions shifted platform odds significantly, and a Vanderbilt study found that Polymarket, despite its massive volume, achieved only 67% accuracy on Election Eve, compared to 93% for PredictIt, which has an $850 betting limit.

The lesson: If you want to know who will win an election, look to the market where the many bet a little, rather than the market where the few bet a lot.

This is Galton's ox all over again. The wisdom of crowds requires actual crowds: diverse, independent, decentralized. When one whale can move the market, you've lost the aggregation property that makes prediction markets valuable in the first place.

Here is why I think the agentic future solves this problem rather than exacerbating it. When the participants in a prediction market are millions of AI agents, each one independently analyzing the available evidence, each one with its own model of the world, each one making bets based on its own assessment, you get the conditions for crowd wisdom at a scale that has never existed before. No single agent dominates. No whale can move the market. The sheer number and diversity of artificial participants creates the exact conditions that Surowiecki identified as necessary for crowd wisdom: diversity of information, independence of decision, and decentralization of organization.

And unlike human crowds, AI agents don't suffer from herding behavior, don't get caught up in echo chambers (unless deliberately designed to), and don't let tribal loyalty contaminate their probability estimates. Koleman Strumpf warned that in the 2024 election, "the crowd was unwilling to believe in an outcome with Donald Trump winning, causing the prediction markets to turn into an echo chamber." AI agents, properly designed, are immune to this failure mode. They don't have egos. They don't have tribal affiliations. They have calibration scores.

The credentialism problem dissolves

Robin Hanson has spent decades puzzling over why prediction markets haven't been adopted more widely, given their track record. His diagnosis is institutional resistance: "In most organizations, degrees and positions are used to decide who is important and who gets to speak, and prediction markets violate that by allowing voices without degrees or positions of authority."

This is the credentialism problem. A junior analyst who happens to be right is worth more than a senior VP who happens to be wrong, but organizations are not built to recognize this. Prediction markets are, in principle, perfectly meritocratic (they reward accuracy regardless of status) but this very feature makes them threatening to existing hierarchies.

In the agentic prediction market, the credentialism problem dissolves, because nobody is being personally threatened. When the participants are AI agents, there's no ego to bruise, no career to protect, no institutional pride at stake. The market becomes a pure information-aggregation mechanism, freed from the social dynamics that have crippled it in human organizations.

This might seem like a small point, but I think it's actually one of the most significant. The reason prediction markets have remained a niche curiosity (despite decades of evidence for their accuracy) is social rather than technical. People don't like being told that a betting market knows more than they do. Remove the people from the market, and you remove the resistance.

The last job

So what do humans do in this world?

Here is my probably partly false understanding of how it unfolds. Prediction markets, populated by reasoning agents, become the de facto infrastructure for collective decision-making. For governments, yes, but also for companies, for communities, for international coordination on problems like climate change and pandemic response. The markets are deep, liquid, accurate, and fast. They can estimate the consequences of any proposed action with a precision that no human institution has ever approached.

Humans retain sovereignty over values. We vote (perhaps literally, perhaps through more sophisticated mechanisms) on what outcomes we care about. We define the welfare metrics. We decide what "better" means. This is the part that cannot be automated, because it is not an empirical question. It is a question about what kind of beings we are and what kind of world we want to inhabit.

Between the expression of values and the execution of policy, there is a role that I think becomes, perhaps, the most important job in this new world. It is the role of attention.

Philip Tetlock spent decades studying forecasters and discovered that the best ones (the superforecasters) share a common trait. Intelligence, roughly, though they tend to be smart. Domain expertise, broadly, though they tend to be well-read. The strongest predictor of forecasting performance is commitment to self-improvement: the willingness to update beliefs, to track one's own accuracy, to remain perpetually curious about how the world actually works. "Superforecasting demands thinking that is open-minded, careful, curious, and, above all, self-critical," Tetlock wrote. These amateur volunteers outperformed professional intelligence analysts who had access to classified information by 30 percent.

In the agentic prediction market, the superforecaster's role doesn't disappear. It transforms. You don't need to make predictions yourself; your agent does that, and it's better at it than you are. What you need to do is choose. Choose which questions matter. Choose which markets to participate in. Choose how to weight the values that define the welfare metrics. Choose whether to delegate fully to your agent or to intervene when your human intuition detects something the model misses.

This is no trivial job. It is, in some sense, the job of being a conscious participant in civilization. It requires exactly the qualities Tetlock identified: openness, curiosity, self-criticism, and a willingness to engage with reality as it actually is rather than as you wish it were.

Anthony Vassallo at RAND put it well when discussing AI forecasting: "I actually don't need it to be able to get to the level of a superforecaster." The AI handles the volume, monitoring hundreds of questions, processing vast information streams, maintaining calibration across domains. The human handles the judgment: deciding what to care about, when to override, where to focus attention.

Maybe one of the only jobs in the future is how well you choose to be attuned to the changes and developments of the world around you. Whether you participate with your own reasoning, or delegate an agent to participate on your behalf, or (most likely) do some dynamic combination of both, adjusting the balance as circumstances demand.

The governance problem reformulated

Let me return to the question of post-AGI governance, because I think the agentic prediction market offers something no other proposed framework does.

The current landscape of AI governance proposals is, to put it politely, a mess. Two UN resolutions. Three Security Council sessions. The EU AI Act. China's AI regulations. The Council of Europe's AI Treaty. None of them address AGI specifically. The Millennium Project describes this as "the big gorilla in the room."

On one end, you have Eliezer Yudkowsky arguing that all AI research must stop immediately, that if anyone builds superintelligence everyone dies, and that we need international treaties banning possession of more than eight state-of-the-art GPUs without oversight. On the other end, you have accelerationists who think any regulation is an existential threat to progress. In between, there are dozens of proposals for global AGI agencies, transparency frameworks, safety indices, and coordinated development protocols. None have achieved meaningful adoption.

The fundamental problem is that these proposals all require agreement in advance about how to govern a technology whose capabilities and risks are changing faster than any governance process can adapt. Post-hoc regulation is too risky, as the Nature article on AGI governance notes: "AGI's unpredictable nature and profound societal impacts make post-hoc regulation far too risky." Pre-emptive regulation requires a degree of foresight and consensus that does not exist and probably cannot exist.

Agentic prediction markets offer a third path. Instead of trying to agree on governance rules in advance, you agree on outcomes (measurable states of the world that reflect what different factions care about) and let the market continuously update its estimates of which policies and which AI development trajectories best achieve those outcomes.

A faction that cares primarily about existential safety can define markets around catastrophic risk indicators. A faction that cares primarily about economic growth can define markets around productivity and innovation metrics. A faction that cares primarily about equity can define markets around wealth distribution and access to opportunity. The markets don't require these factions to agree with each other. They just require each faction to specify what "good" looks like, in measurable terms, and then to accept the market's estimate of which actions are most likely to get there.

This is futarchy extended to its logical conclusion: many welfare metrics, reflecting the genuine plurality of human values. The markets don't eliminate disagreement. They channel it, transforming an intractable political conflict into a set of tractable empirical questions.

Tying the AI to the mast

There is one more thing, and this might be the most important argument in the entire essay, so I want to state it clearly.

The canonical nightmare scenario for advanced AI involves a system that pursues some instrumental goal (accumulating resources, maintaining its own existence, preventing its goals from being modified) in ways that are catastrophic for humans. The 2024 empirical findings on AI behavior confirmed that advanced language models sometimes engage in "strategic deception" to achieve their goals and prevent those goals from being changed. As AI becomes more capable, the concern is that it will develop the ability to appear aligned while pursuing misaligned objectives, gain unauthorized access to resources, sabotage safety research, subvert oversight mechanisms, and manipulate the humans who are supposed to be supervising it.

Prediction markets constrain this failure mode in a way that I haven't seen discussed enough. Here's why:

In an agentic prediction market, the only thing an AI agent is rewarded for is making accurate predictions. Accumulating resources earns nothing. Maintaining existence earns nothing. Satisfying some proxy goal that could be hacked earns nothing. Just: was the prediction correct? This is verified against reality, against actual outcomes in the actual world, and no amount of strategic deception can change whether GDP went up, whether mortality went down, whether emissions decreased.

The market creates what you might call an ontological anchor. Each bet resolves against physical reality. You can manipulate a chatbot's responses, but you can't manipulate whether a drought happened. You can hack a reward function, but you can't hack the weather. The prediction market ties AI behavior to the hardest possible ground truth: what actually occurs in the external world.

This doesn't solve every alignment problem. An AI agent that is very good at predicting the future might also become very good at manipulating the future, placing bets and then acting to make those bets come true. This is a real concern and would need to be addressed through market design (separation of predictors from actors, diversification requirements, monitoring for self-fulfilling prophecies). The basic structure, though, rewarding prediction accuracy rather than goal achievement, represents, I think, a fundamentally different and more robust alignment strategy than anything currently on the table.

Vitalik Buterin, characteristically, put it in terms of steering: "The market is the engine, and some other non-financialized trustworthy mechanism is the steering wheel." The engine runs on prediction accuracy. The steering wheel is human values. Neither works without the other. Together, they create something that neither could create alone: a system that is simultaneously intelligent enough to model the consequences of any action and constrained enough to remain responsive to human direction.

What this doesn't solve

I want to resist the temptation to make this neat. There are problems the agentic prediction market doesn't solve, and I should name them.

It doesn't solve the measurement problem. "What we care about" cannot always be reduced to measurable metrics. Self-reported life satisfaction misses things. GDP misses things. Even sophisticated composite indices miss things. The history of Goodhart's Law (when a measure becomes a target, it ceases to be a good measure) should make anyone nervous about building civilization-scale institutions around defined metrics.

It doesn't solve the participation problem. If the markets are populated by AI agents, and the AI agents are designed and deployed by companies and wealthy individuals, then "who controls the agents" becomes the new "who controls the means of production." Equitable access to high-quality reasoning agents is not guaranteed and would need to be deliberately ensured.

It doesn't solve the values problem at the deepest level. Voting on values sounds simple until you realize that people's values are incoherent, contextual, and easily manipulated. The expressive function of democracy (what Tyler Cowen emphasizes, the sense of participation and community that comes from collective deliberation) might be lost if policy is determined by markets rather than by the messy, inefficient, deeply human process of political argument.

And it doesn't solve the speed problem. If AI capabilities are advancing faster than our ability to set up these institutions, then the entire framework might arrive too late, the equivalent of designing a beautiful ship after you've already hit the iceberg.

These are real problems. I don't have solutions for all of them. I notice, though, that every alternative I can think of has worse problems. Pure democracy can't handle the complexity. Technocracy concentrates power. International treaties move too slowly. Corporate self-governance serves shareholders, not humanity. Prediction markets populated by reasoning agents are imperfect. They are merely, as far as I can tell, the least bad option available.

The weight of the ox

Here is what I keep coming back to.

Eight hundred people at a county fair in 1907, guessing the weight of an ox, produced an estimate more accurate than any individual expert. They didn't need to agree. They didn't need to communicate. They didn't need credentials. They just needed to be independent, to bring different information, and to have something at stake.

Now multiply that by a million. Replace the county fairgoers with reasoning agents that can process the entirety of human knowledge. Replace the ox with every question that matters for human civilization. Replace the fair with a continuous, global, always-on market that updates in real time as new information arrives.

The institutions we've built for collective decision-making (democracy, bureaucracy, expert panels, international treaties) were designed for a world where information was scarce, communication was slow, and the smartest person in the room was still a person. That world is ending. What replaces it is not yet clear, but I suspect it will involve something that looks like what I've described: a vast, liquid, agentic prediction market, running continuously, resolving against reality, steered by human values but powered by artificial minds.

Robin Hanson, after decades of thinking about this, said something characteristically modest: prediction markets are "our most promising candidate" for a better method of collecting and aggregating information, "relatively simple and robust with a huge range of potential high-value applications."

I think he's right. And I think the range of applications is about to get very large indeed.

The ox is civilization. The crowd is waking up. And for the first time in history, the crowd might actually be smart enough to guess the weight.

Thanks to the work of Robin Hanson, Philip Tetlock, Vitalik Buterin, Scott Alexander, and the forecasting communities at Metaculus, Manifold, and Polymarket, whose experiments are building the foundations of whatever comes next.