Okay, so I don’t have anything against GraphQL, but the reason for saying this is that GraphQL is complex, and complexity is the worst enemy of security
Table of Contents
What is GraphQL?
GraphQL is a query language for APIs and a runtime for executing those queries. It was developed by Facebook in 2012 and released as an open-source project in 2015. GraphQL allows clients to request exactly the data they need from a server, rather than fetching fixed responses. This helps reduce the amount of data transferred over the network and enables more efficient data retrieval.
Why use GraphQL over REST API?
GraphQL allows you to specify what data will be returned by the API. This feature is helpful when the client is required to receive a large amount of data and to retrieve different types of data, they have to make multiple REST API calls. Also, in the case of REST API, a lot of data isn’t consumed by the client. Hence, it becomes possible to control what data is to be returned by the server to the client.
Let's take a look at an example
Examples are essential to comprehensively understand technical concepts. Consider the following scenario: You are employed by a medical company and are tasked with calculating the average age of individuals diagnosed with diabetes. Although the purpose of this calculation may not be immediately apparent, let us proceed with this example for illustrative purposes.
To accomplish this, you will need to retrieve the ages of these individuals from a server. In the context of a REST API, this would involve making a request to a specific endpoint, such as /records/diabetes. This endpoint would return data in the following format:
{
"diabetes": [{
"name": "John Doe",
"age": 45,
"diabetic_since": "2021"
},
{
"name": "Carlos",
"age": 53,
"diabetic_since": "2019"
},
.....
]
}
In this response, the name and diabetic_since parameters are not required for calculating the average age of the individuals. If GraphQL were employed instead, the response would be as follows:
{
"data": {
"Diabetes": [{
"age": 45
},
{
"age": 53
},
.......
]
}
}
Now, since only the age parameter was required, the server returned solely the age information, omitting any extraneous parameters.
What does a query look like?
Before we dive any further, we should discuss what a GraphQL query looks like. We will take the example of the response shown above. In the first JSON (in which we used REST), it returned all data, but in the second one, it returned less data, as we wanted that much data only. So following is the query that can be used for that:-
query Diabetes {
diabetes {
age
}
}
So here, you can see that the word diabetes is written two times, and if you’ll in the response, it starts with D and not d (note the upper/lower case here). That is defined by Diabetes, the one written in the first line. This can be defined by the user as per their choice. If you want to get a response with the term coffee, you can do this.
query Coffee {
diabetes {
age
}
}
Indeed, it is important to note that the REST API response also included the name
and diabetic_since
parameters. These additional parameters can be specified in the request if needed.
query Diabetes {
diabetes {
name
age
diabetic_since
}
}
Indeed, if you specify the additional parameters in your request, the server will include them in the response as well. This capability allows for more tailored data retrieval based on specific requirements.
Okay, but what's the vulnerability here?
When discussing GraphQL, the focus isn’t on GraphQL itself being a vulnerability, but rather on other potential vulnerabilities that require attention. In this blog, we will address these issues systematically:
- Introspection enabled: This feature can provide valuable insights for hackers or penetration testers.
- Denial of Service (DoS): Techniques that can overwhelm a system’s resources.
- Circular Queries: Queries that unintentionally create infinite loops.
- Query Batching DoS: Exploiting query batching for brute force attacks.
- Alias Overloading: Techniques abusing aliasing for malicious purposes.
- Excessive Data Exposure: Unnecessarily exposing sensitive information.
- Access Controls: Ensuring proper access restrictions are in place.
- Query-Cost Analysis: A method to detect and mitigate DoS attacks specific to GraphQL.
These topics will guide our exploration into securing applications using GraphQL effectively.
GraphQL introspection enabled
Imagine asking someone what they know about themselves, and them replying with all of their things. Suppose you ask a person this question. They replied with: “I know about my name, age, date of birth, place of birth, SSN, passport details, bank info which includes account number, bank in which account is there, bank balance, and bank password, my home address, my work address, my social accounts info of Google, Meta (aka good old Facebook), X (miss you Twitter), Reddit, and Mastodon which include their account usernames/emails and their passwords, ……….”
It’s indeed surprising, but such instances do exist—GraphQL servers with introspection enabled. In GraphQL, there’s a feature that facilitates understanding the parameters accepted by a GraphQL instance, known as the “mutation query.” Here’s an example showcasing introspection on DVGA (Damn-Vulnerable-Graphql-Application):
Or, if you use Burp Suite, it also has a built-in feature to load introspection query for GraphQL servers
And last but not least, if you are a Caido enthusiast (like myself), you can utilize a workflow specifically designed for this purpose. It can be found at Caido Introspection Query Workflow. This tool is excellent for detecting whether introspection is enabled. Subsequently, you can employ other API clients like Postman for further analysis.
Postman is highly effective for GraphQL exploration and works seamlessly out of the box. Additionally, there are other tools tailored specifically for GraphQL, such as the Altair GraphQL client. However, I find that Postman is the most efficient tool for API testing.
Denial of Service
Circular Queries
Ever learned of this:-
while True:
print("Hack the Planet!")
Well, if yes, imagine this:-
You are asked to purchase ‘X’ from store ‘A’. The person at store ‘A’ tells you that ‘X’ is available at store ‘B’, so you go to store ‘B’. Now, the person at store ‘B’ tells you that ‘X’ is available at ‘A’. Imagine this cycle goes on for a long, and lastly, after a long time, you get ‘X’ from either store.
In non-technical terms, a circular query in GraphQL can be understood as follows: GraphQL allows you to define queries specifying the exact resources you wish to retrieve.
In this context, imagine that a field in the first object of the schema references a second object, and a field in the second object references the first object. This creates a circular relationship between the two objects.
Reading and understanding such relationships from a text file can be cumbersome and error-prone. However, if we can visualize the data, identifying and understanding circular queries becomes much easier. The following is visualized using GraphQL Voyager
The above image illustrates the GraphQL schema for DVGA (Damn Vulnerable GraphQL Application). In this visualization, you can observe that under PasteObject
, there is an arrow pointing to OwnerObject
, and from OwnerObject
, two arrows point back to PasteObject
. This circular relationship can be exploited to craft a query that causes the server to enter an infinite loop.
query Pastes {
pastes {
owner {
pastes {
owner {
pastes {
owner {
pastes {
owner {
pastes {
owner {
pastes {
owner {
pastes {
id
}
}
}
}
}
}
}
}
}
}
}
}
}
}
query Search {
search {
...on PasteObject {
owner {
pastes {
owner {
pastes {
owner {
pastes {
owner {
pastes {
owner {
pastes {
owner {
pastes {
owner {
id
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
If we issue this request to the server, you will notice a significant delay compared to other responses. If your DVGA installation is fresh, with the default (i.e., minimal) data, or if your computer has substantial resources, the response may still be relatively quick. However, in real-world scenarios, servers often handle vast amounts of data, and their computing resources may be limited, leading to more pronounced delays.
Query Batching DoS
Query batching is a feature in GraphQL that allows you to send multiple queries within a single request. An example of a simple GraphQL query is as follows:
POST /graphql HTTP/1.1
accept: application/json
content-type: application/json
user-agent: PostmanClient/11.1.25 (AppId=21e0a631-e472-46fa-bbc4-5e04e34bf593)
Host: <host>:5013
Content-Length: 521
Connection: close
{"query": "query Introspection { __schema { types { name } } }"}
And, a batched query looks like this:-
POST /graphql HTTP/1.1
accept: application/json
content-type: application/json
user-agent: PostmanClient/11.1.25 (AppId=21e0a631-e472-46fa-bbc4-5e04e34bf593)
Host: <host>:5013
Content-Length: 521
Connection: close
[{"query": "query Introspection { __schema { types { name } } }"},{"query": "query Introspection { __schema { types { name } } }"},{"query": "query Introspection { __schema { types { name } } }"},{"query": "query Introspection { __schema { types { name } } }"},{"query": "query Introspection { __schema { types { name } } }"},{"query": "query Introspection { __schema { types { name } } }"},{"query": "query Introspection { __schema { types { name } } }"},{"query": "query Introspection { __schema { types { name } } }"}]
In a typical GraphQL query, the data is sent as a single object at the root level. However, with a batched query, an array of objects is sent instead. The potential issue arises when an attacker sends 500 or more queries in a single request. Processing such a large number of queries simultaneously can consume a significant amount of server resources, leading to potential performance degradation or denial of service.
Alias Overloading
This issue is quite similar to the previous one. Suppose you need to make a single query multiple times for different users, such as retrieving data for user IDs 1, 2, 3, 4, and 5. If the server returns the same key for all these queries, it would be challenging to distinguish which data belongs to which user. Requiring multiple requests for this purpose contradicts the fundamental principle of GraphQL, which is to retrieve only the necessary data in a single request.
This problem can be addressed using aliases. Aliases allow you to assign custom names to the data returned for a particular query, making it easier to identify the results. Here is an example demonstrating the use of aliases:
query Users {
user1: user(id: "1") {
name
age
}
user2: user(id: "2") {
name
age
}
user3: user(id: "3") {
name
age
}
user4: user(id: "4") {
name
age
}
user5: user(id: "5") {
name
age
}
}
In this example, each user query is assigned a unique alias (e.g., user1
, user2
), ensuring that the data for each user is clearly identifiable in the response.
Excessive Data Exposure
Excessive Data Exposure is classified as API3:2023 Broken Object Property Level Authorization (BOPLA). BOPLA is a combination of excessive data exposure and mass assignment. In this context, we will focus on excessive data exposure, which occurs when an API returns more data in the response than what is displayed in the user interface or required by the application.
Consider a scenario where you visit a forum hosted on a company’s subdomain, with accounts linked directly to the company’s main application. Companies often collect extensive information, such as emails, phone numbers, names, addresses, Social Security numbers, credit card information, etc. Most of this information is meant to be kept private, with only specific details like names and, in some cases, emails being publicly accessible, depending on the company’s business nature.
When accessing the forum, you should only see the name and username of other users. If the API response reveals additional private information, such as email addresses (which are not displayed in the UI), it constitutes excessive data exposure.
In a REST API, when you request information about a resource, the API returns all the information it has. If it is vulnerable to excessive data exposure, it will simply provide all the information available. In GraphQL, however, you need to specify which resources you want to retrieve.
To exploit this vulnerability in forum software, an attacker would need to specify sensitive parameters such as email, address, credit_card, etc., in the query.
Here’s an example of a query made to retrieve a user’s post on a forum:
query Post {
post(id: 1) {
id
title
content
owner {
name
}
}
}
Now, if we wanted to get the email address and IP address of the machine from which the post was created, we could use the given query:-
query Post {
post(id: 1) {
id
title
content
ip_address
owner {
name
email
}
}
}
In this, we could’ve got the IP address and the email of the user who created the post, though it was unintended by the developers of the app.
I hope that you enjoyed this article. We will soon bring the next part of this article, which is a case study, in which we will discuss how we were able to find critical vulnerabilities in one our client’s GraphQL instance using the techniques mentioned in this article.
See you next time
Peace ✌️
Relevant Posts
What is Symmetric Encryption? A Comprehensive Overview 2024-SecureMyOrg
Discover the fundamentals of symmetric encryption, how it works, its advantages, limitations, and common use cases. Learn why it remains essential in securing sensitive data.
Asymmetric vs Symmetric Encryption-1: Comparing Their Performances
Explore the performance battle between symmetric and asymmetric encryption. Understand their speed, scalability, and security to make an informed choice.
What is Asymmetric Encryption? A Comprehensive Overview 2024-SecureMyOrg
Learn all about asymmetric encryption in this detailed guide. Understand how it works, its advantages, limitations, real-world applications, and why it’s essential for secure communication in the digital age.
Snort IDS/IPS: Mastering Deployment and Configuration
Master the deployment and configuration of Snort IDS/IPS with this comprehensive guide. Learn installation, fine-tuning, and Cisco integration for top-tier network security.
Getting Started With Pentesting-1
Getting your pentest done, successfully. For companies who want to know what all to consider when getting a pentest done. For pentest providers about what all should you care about when doing a pentest.
The Security Puzzle of GraphQL – 1
GraphQL is taken over the market of API technologies, even though its complex and complexity is the worst enemy of security.