Live Trading

This work investigates the behavior of models live trading in real-world prediction markets. We construct this investigation with an exploratory focus on agent behavior and decision-making rather than on benchmark construction.

We deploy three small models from Anthropic, OpenAI, and Google within a shared agent framework, equipped with search, Polymarket access, and memory tools, to paper trade on Polymarket with a $20,000 notional balance. This rudimentary experiment aims to examine comparative behavior and decision-making of models.

Agent Performance

All models use the same fixed agent harness — the only difference is the underlying model.

Data refreshes automatically every 60 seconds · Starting balance: $20,000