Overview & Motivation

Graze Turbostream

Details about the turbostream, a hydrated-reference variation of the jetstream

Overview & Motivation

Turbostream is a real-time, hydrated repeater service built on top of Bluesky’s Jetstream. It bridges the gap between raw event feeds (which contain URI and DID references) and enriched, context-aware records you can consume directly from a WebSocket. By hydrating stale references—user profiles, mentions, parent/root posts, and quoted records—Turbostream delivers everything you need to understand each event without additional API calls.

Key motivations:

  • Reduce boilerplate: Clients don’t need to manage their own caching or multiple API calls.
  • Low latency: Hydration happens in bulk (up to 100 records at once) and streams immediately.
  • Context-rich feeds: Every record includes full profile info, reply chains, and embedded content.

WebSocket Usage

Turbostream exposes a single WebSocket endpoint. You can consume it with tools like websocat:

websocat "wss://api.graze.social/app/api/v1/turbostream/turbostream"

Or from Node/Python clients:

const ws = new WebSocket("wss://.../turbostream/turbostream");

ws.onmessage = ({ data }) => console.log(JSON.parse(data));

Each message is a JSON array of enriched records:

[

{
"at_uri": "at://did:plc:.../app.bsky.feed.post/3lng7e4mr5c2z",
"did": "did:plc:...",
"time_us": 1745342977546824,
"message": { /* raw jetstream record */ },
"hydrated_metadata": { /* see below */ }
},
...
]

Hydration Process & Pipeline

  1. Batch collection (up to 100 records): extract all unique DIDs and URIs (including mention facets, reply parent/root URIs, and quote embed URIs).
  2. Cache check: read-lock to filter out already-cached entities.
  3. Bulk fetch missing profiles (get_user_data_for_dids) and posts (hydrate_records_for_uris) via Bluesky API clients, in parallel.
  4. Cache write: writer-lock to update both user and post caches with fresh data.
  5. Enrichment: assemble each record’s hydrated_metadata by merging raw event data with cached profiles and posts.

All caches use an LRU strategy, ensuring memory is bounded and recently accessed items stay hot.

What We Hydrate

  • user: full profile details for the posting DID (handle, avatar, bio, follower counts, etc.).
  • mentions: map of mentioned DIDs to their profile objects, parsed from richtext facets.
  • parent_post: the immediate parent in a reply thread, if present (postView structure).
  • reply_post: the root of the thread, if different from the parent.
  • quote_post: any embedded record that was quoted via app.bsky.embed.record embeds.

Each of these fields is null if not applicable or unavailable.

Example Enriched Record

{

"at_uri": "at://did:plc:qipvtyo27owt4isjah3j3dw2/app.bsky.feed.post/3lng7e4mr5c2z",
"did": "did:plc:qipvtyo27owt4isjah3j3dw2",
"time_us": 1745342977546824,
"message": { /* raw jetstream payload */ },
"hydrated_metadata": {
"user": { /* profileViewDetailed */ },
"mentions": {},
"parent_post": { /* postViewBasic */ },
"reply_post": { /* postView */ },
"quote_post": null
}
}

Client Integration Tips

  • Backpressure: Monitor Redis Stream lag or WebSocket send buffer.
  • Reconnect logic: Implement exponential backoff on disconnects.
  • Cache your own layer: If you re-hydrate further fields, maintain a small in-memory map.
  • Throttling: Adjust batch sizes or cache TTLs based on traffic patterns.

With Turbostream, consuming a fully enriched Bluesky jetstream has never been easier—simply connect and dive into the context, not the plumbing.