Overview & Motivation

Graze Turbostream

Details about the turbostream, a hydrated-reference variation of the jetstream

Overview & Motivation

Turbostream is a real-time, hydrated repeater service built on top of Bluesky’s Jetstream. It bridges the gap between raw event feeds (which contain URI and DID references) and enriched, context-aware records you can consume directly from a WebSocket. By hydrating stale references—user profiles, mentions, parent/root posts, and quoted records—Turbostream delivers everything you need to understand each event without additional API calls.

Key motivations:

Reduce boilerplate: Clients don’t need to manage their own caching or multiple API calls.
Low latency: Hydration happens in bulk (up to 100 records at once) and streams immediately.
Context-rich feeds: Every record includes full profile info, reply chains, and embedded content.

WebSocket Usage

Turbostream exposes a single WebSocket endpoint. You can consume it with tools like websocat:

websocat "wss://api.graze.social/app/api/v1/turbostream/turbostream"

Or from Node/Python clients:

const ws = new WebSocket("wss://.../turbostream/turbostream");

ws.onmessage = ({ data }) => console.log(JSON.parse(data));

Each message is a JSON array of enriched records:

[

  {
    "at_uri": "at://did:plc:.../app.bsky.feed.post/3lng7e4mr5c2z",
    "did": "did:plc:...",
    "time_us": 1745342977546824,
    "message": { /* raw jetstream record */ },
    "hydrated_metadata": { /* see below */ }
  },
  ...
]

Hydration Process & Pipeline

Batch collection (up to 100 records): extract all unique DIDs and URIs (including mention facets, reply parent/root URIs, and quote embed URIs).
Cache check: read-lock to filter out already-cached entities.
Bulk fetch missing profiles (get_user_data_for_dids) and posts (hydrate_records_for_uris) via Bluesky API clients, in parallel.
Cache write: writer-lock to update both user and post caches with fresh data.
Enrichment: assemble each record’s hydrated_metadata by merging raw event data with cached profiles and posts.

All caches use an LRU strategy, ensuring memory is bounded and recently accessed items stay hot.

What We Hydrate

user: full profile details for the posting DID (handle, avatar, bio, follower counts, etc.).
mentions: map of mentioned DIDs to their profile objects, parsed from richtext facets.
parent_post: the immediate parent in a reply thread, if present (postView structure).
reply_post: the root of the thread, if different from the parent.
quote_post: any embedded record that was quoted via app.bsky.embed.record embeds.

Each of these fields is null if not applicable or unavailable.

Example Enriched Record

{

  "at_uri": "at://did:plc:qipvtyo27owt4isjah3j3dw2/app.bsky.feed.post/3lng7e4mr5c2z",
  "did": "did:plc:qipvtyo27owt4isjah3j3dw2",
  "time_us": 1745342977546824,
  "message": { /* raw jetstream payload */ },
  "hydrated_metadata": {
    "user": { /* profileViewDetailed */ },
    "mentions": {},
    "parent_post": { /* postViewBasic */ },
    "reply_post": { /* postView */ },
    "quote_post": null
  }
}

Client Integration Tips

Backpressure: Monitor Redis Stream lag or WebSocket send buffer.
Reconnect logic: Implement exponential backoff on disconnects.
Cache your own layer: If you re-hydrate further fields, maintain a small in-memory map.
Throttling: Adjust batch sizes or cache TTLs based on traffic patterns.

With Turbostream, consuming a fully enriched Bluesky jetstream has never been easier—simply connect and dive into the context, not the plumbing.