In this blog post, I’ll cover the theory, implementation, and challenges of building GraphQL Subscriptions from scratch.

In case you are unfamiliar with GraphQL, here’s a primer: GraphQL is a query language for APIs and a runtime for fulfilling those queries with your existing data. GraphQL was open-sourced in 2015, and the subscription operation (added to the spec in 2017) allows you to subscribe to real-time data in GraphQL. GraphQL Subscriptions power a number of features on Facebook, including live comments and streaming reactions on live videos. If you’d like a more thorough overview GraphQL, check out graphql.org.

Imagine we are building an email client with two basic features:

When the user starts the application, fetch and display all emails in the inbox. For each email, display the receive time, sender’s email, and subject line.
When a new email arrives, add it to the inbox, displaying the receive time, sender’s email, and subject line.

For requirement #1, the client can fetch the relevant fields with a simple GraphQL Query:

query FetchEmailsOnStart($viewer: ID!) {
  allEmailsForViewer(viewer: $viewer) {
    receiveTime,
    sender,
    subject
  }
}

For requirement #2, the client needs to ask the server to be notified whenever a new email arrives. As an example, the server exposes this functionality via a pubsub (publish-subscribe) API:

newEmails.Subscribe(viewerContext, onPublish: (newEmailId) => {
  // execute a GraphQL query to fetch the relevant fields from the new email
});
// Elsewhere in the server-side code, we need to detect the arrival of new emails and publish:
newEmails.Publish(viewerID, newEmailId);

Whenever the onPublish callback function is triggered, we execute another GraphQL query like so:

# query to run in response to a new email
query FetchEmailById($viewer: ID!, $newEmailId: ID!) {
  email(viewer: $viewer, id: $newEmailId) {
    receiveTime,
    sender,
    subject
  }
}

We can now define three key responsibilities of any reactive GraphQL system:

Define and track the query to run when data changes.
Capture and detect the conditions that would trigger a re-evaluation of the query.
Re-evaluate the query and return the result.

What if we tried to do all this on the client? By explicitly subscribing to the underlying source event stream, the client contains imperative logic for how a “new email event” is detected (a single pubsub event). This isn’t a big problem now, but imagine this code ships inside a mobile app, where a small percentage of users never upgrade from that version. If the imperative logic changes include multiple pubsub events or a different pubsub event, those clients could be easily broken.

What if we moved the responsibilities onto the server? In other words, the client could send the server a GraphQL document. Then the server would:

Persist and track the GraphQL document
Capture the trigger conditions (source event stream)
Re-evaluate the query and return the result.
Maintain a persistent channel to the client and push results back.

By performing these operations on the server, the details of step #2 are hidden from the client and free to change on the server, and the query becomes entirely declarative:

subscription SubscribeToNewEmails($viewer: ID!) {
  newEmail(viewer: $viewer) {
    receiveTime,
    sender,
    subject
  }
}

Compared to the client-side solution, we save a network roundtrip, avoid exposing the source event stream on the backend, and eliminate the need to specify newEmailId. The client still knows why the data changed, but the why (new email arrived) is decoupled from the how (pubsub event called newEmail containing a newEmailId).

Scaling and Operations

This sounds promising, but the server now needs to maintain more state and manage persistent connections with potentially hundreds of millions of clients. Stateful systems are much more difficult to monitor and debug than stateless systems, and GraphQL is no exception. To highlight a few challenges:

How will the system scale to handle millions of concurrent users?
N-squared fanout: what happens when a chat room has a million participants and everyone is typing at the same time?
What sorts of guarantees can we make about in-order, once-and-only-once delivery? What sort of tradeoff does this have with latency and availability?
How do we measure the reliability of the system?
How do we transfer client connections from one node to another during deployment?
How does the system handle overload on a single node?
Should the stateful and stateless tiers be integrated or separate?

These are hard questions with no right answers. But it’s a good idea to think about them before building a large-scale real-time API.

Open Source

The community has expressed consistent interest in GraphQL Subscriptions, so we added it to the GraphQL Specification last year. Here’s the spec text:

If the operation is a subscription, the result is an event stream called the “Response Stream” where each event in the event stream is the result of executing the operation for each new event on an underlying “Source Stream”.

Executing a subscription creates a persistent function on the server that maps an underlying Source Stream to a returned Response Stream.

Mapping this to our example, the “Source Stream” is the “newEmails pubsub event”. The “persistent function on the server” remembers the GraphQL query and listens to the Source Stream for events. Each time a new email arrives, the mapping function executes the stored GraphQL document using the input from the Source Stream event.

GraphQL Subscriptions is available from key community partners such as Apollo and Prisma, but I hope this blog has equipped you with the knowledge to build your own GraphQL Subscription implementation!