TheGraph more technically explained

What you need to know about the Prime Data Source of Web3

8 min readFeb 23, 2022

Web3 is the most hyped topic in the internet. Some say it’s ******** and web3 lacks fundamental use case while others think it’s the next revolution. Instead of just being part of one extreme party, let’s familiarise ourselves with the technology and the value proposition of the theGraph project, which is to my knowledge the best data resource for web3.

Understanding theGraph is hard (in the beginning), if you are missing the right resources

Within the Web3 space TheGraph is a more mature project than others. It has attracted many users and last summer the project showed incredible growth with up to 60 million queries each day.

https://thegraph.com/blog/one-billion-monthly-queries

This success didn’t remain unnoticed. Many people call it the “google” of web3. I became really interested, when I first heard about this project. Sadly, in the beginning I didn’t find good explanations on what they are exactly doing. I was looking for some introductory and technical explanations while many authors are focusing how you can earn money quickly and trying to include as many buzz words as possible. I attempted a couple of times to read the documentation of the Graph. But the jungle of new terminology made the topic for me opaque. I was not able to build my understanding on solid ground. Until recently, when I discovered an article by the founder of theGraph, which covered the project more technically my understanding really started to be built on solid ground.

My desire is to share this solid fundament with you, so that you can make judgments about theGraph that are deeper and more insightful than just “TheGraph is the google of web3”. I want to share my thoughts so that the project gains popularity and the data community has an additional resource that tries to explain this project from another angle.

Value proposition of theGraph

The vision of many solid web3 projects* and protocols includes some form of decentralisation and incentive system. The incentive system of such a protocol aims to orchestrate the users towards good for the whole ecosystem.

Show me the incentives, I show you the outcome — Brandon Ramirez Co-Founder of TheGraph

*(not the scammer ones)

TheGraph is looking to decentralise data consumption and data storage. How is theGraph different from traditional SaaS companies? How does the protocol achieve decentralisation? The next paragraph focus on this type of questions.

https://thegraph.com/blog/the-graph-network-in-depth-part-1

Traditionally, a business stores data and gives access to this data via an API which the consumer accesses. The consumer is most often a company itself that relies on this data, to offer its clients some services. This doesn’t sound too bad. However, the consumer (or the company) is at mercy of any changes the data storing business wants to do on the API or on the data that they have stored. The company has no choice but to adapt to the changes forced on it.

Now imagine, the entity that stores and exposes data through an API, is incentivised to adhere to a manifest on how to store and how to expose the data to consumers. This is exactly, what theGraph protocol does.

The data consumer wants for example data xyz that is reflected by a subgraph. In order to receive this data xyz the consumer can pick from a variety of data providers, which we call indexers. An indexer is in competition with many other indexers that want to serve the same data.

This competition should serve the data consumers because they have a more reliable and trusted data source since they are not forced to have a contract with one single data providing company and no easy choice to change later. But now they have a supply of data providers that committed to store and send data xyz. And if one indexer doesn’t serve the best data experience, the consumer can easily switch to another indexer.

That situation and relationship between consumers and sellers is somewhat comparable to consumer experience in a wide range of domains. Sellers need to adhere to industrial standards otherwise the consumers switch to a competitor. TheGraph brings this dynamic interaction to the data business of on-chain-data.

What are indexers on theGraph technically?

Indexers run a graph node (subpart of theGraph) and provide chain data to consumers. The Indexers get their data from an Ethereum node. Indexers can run their own Ethereum Node to which they connect to or they can use node provider service like Infura, Alchemy or Quicknode (there are many other services).

Note: TheGraph is in the process of including other Blockchains beside Ethereum into their ecosystem.

Indexers store the data they get from the Ethereum node in a PostgreSQL database. They then expose the data from the database to the consumers via a GraphQL API. How the database and the API are structured is defined by a subgraph.

What are subgraphs?

What we call a subgraph in the TheGraph ecosystem are technically three files which are stored on IPFS (a protocol to store immutable data decentralised).

A description of which data sources indexers will index. This file is called subgraph.yaml.
schema.graphql describes how the data needs to be structured in the PostgresSQL database by indexers (storing data) and how the data is queried through a GraphQL API by consumers (serving data).
mapping.ts is an assembly script that defines how data from smart-contract events is parsed into the types which are defined in the GraphQL schema (creating data).

How are subgraphs and indexers related to each other?

Indexers have a wide range of subgraphs to choose from. Subgraphs are created by the developer community. Whatever subgraph the indexers choose, they need to adhere to the manifest.

Indexers have some skin in the game once they decide for a particular subgraph, since they stake GRT tokens for a given subgraph. Only if they stake GRT they are visible to consumers in the market. If they act malicious or in other words don’t adhere to the manifest, they are punished by the protocol which withdraws their staked GRT tokens.

Since indexers receive value from consumers requiring their data, they choose a subgraph/manifest that is popular or has the expectation to become popular. To be technically more precise, indexers look for smart contract events that are meaningful to many existing consumers or potential consumers.

Challenge: How a new Subgraph can attract Indexers

If a developer from the community creates a new subgraph it first must be indexed by at least one indexer, but preferably many more. This basically means for a given indexer that his or her script that executes the indexing needs to go through the whole blockchain history to store the desired data from smart contract events into a PostgreSQL database. If the smart contract exists for a while or the smart contract is highly popular, indexing will include many events. This process can last hours or even days.

Now imagine indexers had to guess, which subgraph may attract new consumers so that they can receive fees from their queries. In this scenario the market would not be as efficient as it could be.

This is why TheGraph protocol introduced two additional roles beside indexers and consumers, called Curators and Delegators.

Curators

Curators are speculators and investors that believe that certain subgraphs have a use case or they anticipate future real-life use case for some subgraphs. They are financially rewarded, if their (educated) guess was right. For example, a new DeFi protocol is launched on the Ethereum Mainnet and subsequently someone from this protocol creates a subgraph. If the curators think that this protocol is going to attract many users, it will attract many data consumers on theGraph as well. Those data consumers within the theGraph platform are interested in smart contract events of this newly launched DeFi platform (this potentially includes traders which use data analysis or dApps that display information about the protocol, etc.). Thus the curators will stake GRT on this new subgraph to signal to indexers, that this subgraph is worth indexing. If they are early believers, their financial return is potentially higher.

Delegators

Maybe this article reminds you of an interesting web3 project and you are wondering if you could use your programming knowledge to index a certain subgraph, but you are worried about the GRT stake needed. For exactly such situations theGraph protocol created another role called delegators. Delegators help indexers financially and in return participate in the reward.

How do I know as a consumer or layman which subgraph is interesting to me?

Maybe you want to play a little bit with the data offered on theGraph. This is possible on the hosted service. The hosted service is a transient centralised version of theGraph. TheGraph will gradually sunset the Hosted Service once they reach feature parity with the decentralised network. The full decentralised version of theGraph is called Graph Explorer it’s the space where indexers, consumers, curators and delegators meet and exchange value.

On the hosted service you can make GraphQL queries to different subgraphs and see which data the subgraphs offer to you. You can even integrate them into your own hobby project with the url they are offering.

Conclusion

You learned about a protocol that tries to reshape our usage of data and the underlying understanding about interaction of data providers and consumers. They creators of the protocol rely on incentive systems that are enforced programmatically and you learned about some basic building blocks that make up this system. We learned about Indexers, Consumers, Delegators and Curators. Go on their website https://thegraph.com/en/ and checkout if they have additional roles I kept quiet about.

Resources:

A selection of my articles:

Create your GraphQL API and access your mongoDB database via Apollo Server deployed on Heroku

If you follow this tutorial from start to finish you will learn how to organise a backend for your own data science…

towardsdatascience.com

Node embeddings for Beginners

Node embeddings can be hard in the beginning. This article provides you with intuition so that you can read more…

towardsdatascience.com

Graph Coloring with networkx

The solution to the graph coloring problem is conceptually easy but powerful in its application. This tutorial shows…

towardsdatascience.com

TheGraph more technically explained

What you need to know about the Prime Data Source of Web3

Understanding theGraph is hard (in the beginning), if you are missing the right resources

Value proposition of theGraph

What are indexers on theGraph technically?

What are subgraphs?

How are subgraphs and indexers related to each other?

Challenge: How a new Subgraph can attract Indexers

Curators

Delegators

How do I know as a consumer or layman which subgraph is interesting to me?

Conclusion

Resources:

A selection of my articles:

Create your GraphQL API and access your mongoDB database via Apollo Server deployed on Heroku

If you follow this tutorial from start to finish you will learn how to organise a backend for your own data science…

Node embeddings for Beginners

Node embeddings can be hard in the beginning. This article provides you with intuition so that you can read more…

Graph Coloring with networkx

The solution to the graph coloring problem is conceptually easy but powerful in its application. This tutorial shows…

Written by Yves Boutellier

No responses yet