One real hour. Raw ticks, L1, L2, trades, reference data — bundled.
A 60-minute slice of BTC Up or Down 4h (2026-05-19 14:00–15:00 UTC), cut straight from our archive. Same files we'd ship a paying customer, scoped to one series so the bundle stays small enough to poke at on a laptop. Pick the format that matches your tooling — both contain the same data.
A peek at the schema.
recv_ns mkt bid bid_sz ask ask_sz last 1714521608142003821 BTC-100k-2025 .624 4,200 .626 3,100 .624 1714521608388119044 TRUMP-2024 .517 980 .519 1,460 .518 1714521608412881290 FED-25BPS-MAR .708 2,350 .712 720 .710 1714521608501277013 ETH-5k-EOY .411 6,100 .414 2,840 .412 1714521608611442009 OSCAR-BEST .891 410 .894 660 .892
BTC-100k-2025 @ 14:32:08.142 ───────────────────────── ASK .629 ░░░░░░░ 1,200 ASK .626 ▓▓▓░░░░ 3,100 ──── spread ───────────── BID .624 ▓▓▓▓▓░░ 4,200 ← best BID .621 ▓▓░░░░░ 1,850
The numbers above are illustrative — the schema shapes are real. Grab the sample zip at the top of this page to see actual rows for one hour of the BTC 4h series, alongside the raw websocket messages and the reference data that joins everything back together.
§ worked exampleReplay BTC 4h markets around their settlement boundaries.
One series, three days, every 4-hour settlement in the window. Four steps from the moment you download the tarball to a backtest you can point at a settled outcome. Same shape works for CPI prints, Fed decisions, earnings, debates, or any event-anchored study you can name.
- 01pullOne tarball, four datasets — raw, L1, L2, trades — for every recurrence of the BTC 4h series across the date window.
> GET /atoms/series=btc-up-or-down-4h/date=2026-05-19..2026-05-21 - 02joinReference data joins every token back to its market, event, condition ID, and the timestamp each 4-hour bucket resolved.
> JOIN markets USING (token_id) → outcome_resolved_at - 03windowAnchor each replay on a settlement boundary; pull ±10 minutes of L2 depth around the cutoff.
> WHERE ts_recv BETWEEN settle - 10m AND settle + 10m - 04studyWalk the book tick by tick. Compute spread, depth imbalance, and reprice lag. Backtest a strategy against the settled outcome.
> replay(l2) → features(spread, imbalance, lag) → pnl
What's in the zip.
Files
| file | what it is |
|---|---|
raw.jsonl | Exact inbound Polymarket market-websocket messages, one per line. 71,885 lines. |
l1.parquet | Top-of-book updates (best bid/ask, sizes, spread). 10,474 rows. |
l2.parquet | Orderbook snapshots, 25 levels per side. 137,034 rows. |
trades.parquet | Last-trade updates from price_change events. 168 rows. |
reference/series.{json,csv} | The series metadata (1 row). |
reference/events.{json,csv} | Events under this series (64 rows). |
reference/markets.{json,csv} | Markets (condition_ids) under those events (63 rows). |
reference/tokens.{json,csv} | Outcome tokens for those markets, yes/no per market (126 rows). |
Timestamps
Every record carries up to three timestamps:
ts_recv_ns— local receive time in nanoseconds since the Unix epoch, captured as close to socket-read as possible. Authoritative for ordering.ts_src_ms— upstream Polymarket timestamp in milliseconds since the Unix epoch, when present in the payload.ts_recv_iso— ISO-8601 UTC mirror ofts_recv_ns, for convenience.
raw.jsonl — schema
Each line is a JSON object with a capture envelope plus the verbatim upstream payload:
{
"source": "polymarket_market_ws",
"collector_id": "collector-cloud-main-g2",
"run_id": "run-...",
"conn_id": "conn-...",
"msg_seq": 545692077,
"ts_recv_ns": 1779199200081988852,
"ts_recv_iso": "2026-05-19T14:00:00.081988Z",
"ts_src": "1779199200023",
"event_type": "price_change" | "book" | "tick_size_change" | "last_trade_price",
"token_id": null,
"condition_id": "0x...",
"parse_ok": true,
"parse_error": null,
"payload_sha256": "...",
"payload_text": "{...JSON string of the original message...}"
}payload_text is the original Polymarket message as a JSON string, preserved byte-for-byte (aside from being re-encoded into JSON-safe form). Parse it again to recover the upstream object.
l1.parquet — columns
ts_src_ms, ts_recv_ns, ts_recv_iso, condition_id, token_id, best_bid, best_bid_size, best_ask, best_ask_size, spread, source_event_type, msg_seq
source_event_type indicates which upstream event triggered the L1 update (book, price_change, etc.).
l2.parquet — columns
ts_src_ms, ts_recv_ns, ts_recv_iso, condition_id, token_id, event_kind, source_event_type, msg_seq, bid_levels, ask_levels, best_bid, best_ask, payload_best_bid, payload_best_ask, bid_ask_reconciles, bid_px_1..bid_px_25, bid_sz_1..bid_sz_25, ask_px_1..ask_px_25, ask_sz_1..ask_sz_25
Each row is a full snapshot of the orderbook for one token after the update, padded with NaN past the deepest level present. bid_ask_reconciles flags rows where our reconstructed top-of-book matches what the payload declared.
trades.parquet — columns
ts_src_ms, ts_recv_ns, ts_recv_iso, condition_id, token_id, price, size, side, fee_rate_bps, transaction_hash, msg_seq
Derived from last_trade_price events; one row per matched fill.
Joining
- A
token_idjoins totokens.token_id→tokens.market_id→markets.market_id→markets.event_id→events.event_id. - A
condition_idjoins tomarkets.condition_iddirectly. - Each market's two outcome tokens are listed in
markets.yes_token_id/markets.no_token_id.
Source
Captured by TickFoundry from Polymarket's public market websocket (wss://ws-subscriptions-clob.polymarket.com/ws/market). This bundle is a static hour-long sample published for evaluation. For full historical coverage, join the waitlist below or get in touch.
Lock the founder rate.
48 of 50 founder seats still open · first-come first-served · 2 signups so far. 25% off, locked for the life of your subscription.