XML Guide

What Is XML? A Complete Guide to Extensible Markup Language

By S. Nitya·Published ·Updated ·14 min read

XML — Extensible Markup Language — quietly powers much of the internet's infrastructure. Every RSS feed your reader subscribes to, every SVG icon on your screen, every SOAP web service call, and every .docx file you open is XML. This guide covers XML from its origins in 1986 to modern usage: syntax rules, namespaces, schema validation, XPath querying, and parsing in multiple languages, with concrete examples throughout.

What Is XML?

XML (Extensible Markup Language) is a markup language that defines rules for encoding documents in a format that is both human-readable and machine-readable. The W3C specification defines XML as a “meta-language” — not a fixed language like HTML, but a framework for creating your own markup languages tailored to your data.

Key characteristics that define XML:

  • Self-describing — tags describe the meaning of the data they contain, not how it should be displayed.
  • Extensible — you define your own tags; there is no fixed vocabulary imposed by the format.
  • Hierarchical — data is organised as a tree of nested elements, with exactly one root.
  • Platform-independent — any system that can read UTF-8 text can read and produce XML.
  • Strictly parsed — unlike HTML, XML parsers reject any document that violates the well-formedness rules.

Here is a well-formed XML document representing a book catalogue:

<?xml version="1.0" encoding="UTF-8"?>
<library>
  <book isbn="978-0-13-468599-1" available="true">
    <title>The Pragmatic Programmer</title>
    <author>David Thomas</author>
    <year>2019</year>
    <price currency="USD">49.99</price>
  </book>
</library>

Every piece of data is wrapped in meaningful tags. A program reading this document can immediately understand that 49.99 is a price in US dollars — no additional documentation required.

A Brief History of XML

XML's lineage stretches back to 1969 when Charles Goldfarb, Ed Mosher, and Raymond Lorie developed GML (Generalized Markup Language) at IBM for document management. GML evolved into SGML (Standard Generalized Markup Language), standardised as ISO 8879 in 1986 — a powerful but immensely complex language used by governments, publishers, and aerospace firms.

HTML itself is an application of SGML. But HTML's permissive parsing rules and fixed tag set made it unsuitable for general data exchange. In 1996, the W3C formed the XML Working Group to create a simplified, web-friendly subset of SGML.

Key people who shaped XML:

  • Jon Bosak (Sun Microsystems) — chaired the XML Working Group; often called “the father of XML”.
  • Tim Bray (Netscape) — co-authored the XML 1.0 specification; later co-created Atom and worked on JSON at Google.
  • C. M. Sperberg-McQueen (University of Illinois) — co-edited the XML 1.0 spec with Tim Bray.

Key milestones:

  • February 10, 1998 — XML 1.0 published as a W3C Recommendation. A landmark in web standards history.
  • 2001 — SOAP 1.1 (Simple Object Access Protocol), an XML-based web services standard, drives mass enterprise adoption.
  • 2003–2005 — RSS 2.0 and Atom 1.0 cement XML as the format for feed syndication.
  • 2006 — XML 1.1 published (handles additional Unicode characters; rarely adopted in practice).
  • November 2008 — XML 1.0 Fifth Edition, the current definitive version.
  • Today — XML underpins RSS/Atom, SVG, MathML, XHTML, ODF, OOXML, SAML, XMPP, and hundreds of industry-specific formats.

XML Syntax Rules

XML has seven fundamental rules. Unlike HTML, XML parsers are completely strict — any violation produces a fatal error and stops processing immediately.

  1. Include an XML declaration (strongly recommended): <?xml version="1.0" encoding="UTF-8"?>
  2. Exactly one root element. All other elements must be descendants of this single root.
  3. All elements must be closed. Either <name>Alice</name> or self-closing <br/> — never an unclosed tag.
  4. XML is case-sensitive. <Name> and <name> are completely different elements.
  5. Elements must be properly nested. <a><b></b></a> is valid; <a><b></a></b> is not.
  6. Attribute values must be quoted. Both single and double quotes are valid: id="001" or id='001'.
  7. Special characters must be escaped. Five predefined entity references:
    • &amp; — encodes the ampersand (&)
    • &lt; — encodes less-than (<)
    • &gt; — encodes greater-than (>)
    • &quot; — encodes double quote (")
    • &apos; — encodes single quote (')

Valid vs Invalid XML

✓ Well-formed

<?xml version="1.0"?>
<user id="001">
  <name>Alice &amp; Bob</name>
  <active>true</active>
</user>

✗ Malformed (3 errors)

<user id=001>
  <Name>Alice & Bob
  <active>true</active>
</user>

Validate and format your XML instantly

Paste any XML into the free XML Formatter — it highlights every well-formedness error and pretty-prints the document, entirely in your browser.

Open XML Formatter

XML Core Components

XML documents consist of six types of content nodes. Understanding each one is essential for reading and writing XML correctly.

1. Elements

The primary building block. An element has a start tag, optional content (text or child elements), and an end tag. Elements can also be self-closing when empty.

<product id="SKU-001">
  <name>Wireless Keyboard</name>
  <stock>142</stock>
</product>

<separator/>    <!-- empty element, self-closing -->

2. Attributes

Name-value pairs inside a start tag. Use attributes for metadata that describes the element — identifiers, flags, units — rather than data that belongs in child elements.

<image src="photo.jpg" width="800" height="600" alt="A sunset photo"/>
<measurement value="98.6" unit="fahrenheit" timestamp="2026-05-23T10:00:00Z"/>

3. Text Content

The text between element tags. Any text containing < or & must use entity references. Parsers preserve whitespace within text content.

4. CDATA Sections

Blocks of raw text where special characters do not need escaping. The parser passes CDATA content through unchanged. Ideal for embedding code, HTML, or regex patterns.

<script><![CDATA[
  if (a < b && b > c) {
    doSomething('<tag>');
  }
]]></script>

<pattern><![CDATA[^[a-z0-9._%+-]+@[a-z0-9.-]+.[a-z]{2,}$]]></pattern>

5. Comments

Annotations for human readers that parsers ignore. Comments cannot contain the sequence -- and cannot be nested.

<!-- This comment is ignored by XML parsers -->
<!-- TODO: add version 2 schema validation -->

6. Processing Instructions

Directives passed through to the application processing the document. The XML declaration itself is technically a processing instruction.

<?xml-stylesheet type="text/xsl" href="transform.xsl"?>
<?xml-stylesheet type="text/css" href="style.css"?>
<?php echo "Hello"; ?>

Complete document combining all six

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="catalog.css"?>
<catalog xmlns="https://example.com/catalog">
  <!-- Book catalogue, version 2.1 -->
  <book id="B001" available="true">
    <title>Clean Code</title>
    <author>Robert C. Martin</author>
    <price currency="USD">39.99</price>
    <summary><![CDATA[
      A handbook of agile software craftsmanship.
      Learn to write code that is readable & maintainable.
    ]]></summary>
  </book>
</catalog>

XML vs JSON

XML and JSON are the two dominant formats for structured data exchange. XML predates JSON by seven years and carries richer structural features; JSON wins on simplicity and JavaScript integration.

FeatureXMLJSON
ReadabilityVerbose — opening + closing tagsCompact — key-value pairs
Data typesEverything is text by default6 native types (string, number, boolean, null, array, object)
CommentsSupported (<!-- -->)Not supported
AttributesFull attribute supportNo concept of attributes
NamespacesFull namespace supportNot supported
Schema validationDTD, XSD, RelaxNGJSON Schema
Parsing speedSlower — heavier specificationGenerally faster
Browser supportDOMParser / XSLTNative JSON.parse
TransformationXSLT, XQueryNo standard equivalent
Best forDocuments, SOAP, legacy enterpriseWeb APIs, config files, logs

Same data, two formats

XML — expressive

<user id="001">
  <name>Alice</name>
  <age>30</age>
</user>

JSON — compact

{
  "id": "001",
  "name": "Alice",
  "age": 30
}

Bottom line: Use XML for documents, SOAP services, configurations requiring attributes and namespaces, and any domain where XML schemas (XSD) are already defined. Use JSON for modern REST APIs, browser-side JavaScript, and situations where parsing speed and payload size matter most. For a deep dive into JSON, see our What Is JSON guide.

Convert XML to JSON (or JSON to XML)

Need to migrate from XML to JSON or vice versa? The free XML to JSON Converter handles it instantly, in your browser.

Open XML to JSON Converter

XML Namespaces

Namespaces solve a fundamental problem: when two XML vocabularies define elements with the same name, how do you tell them apart? For example, both an HTML vocabulary and a database vocabulary might define a <table> element — but they mean entirely different things.

XML namespaces assign a unique URI to each vocabulary. The URI acts as a globally unique identifier — it does not need to be a real web page, just a unique string.

<!-- Two vocabularies, same element name — no conflict -->
<root xmlns:html="http://www.w3.org/1999/xhtml"
      xmlns:data="https://example.com/schema">
  <html:table>
    <html:tr><html:td>HTML table row</html:td></html:tr>
  </html:table>
  <data:table>
    <data:row id="1">Database row</data:row>
  </data:table>
</root>

The prefix before : (like html or data) is a local alias — only the URI after xmlns: is the actual identifier. You can choose any prefix you like.

Default namespace (no prefix)

When you declare xmlns="..." without a prefix, that namespace applies to the element and all unprefixed descendants. SVG files use this pattern:

<svg xmlns="http://www.w3.org/2000/svg"
     viewBox="0 0 100 100" width="100" height="100">
  <circle cx="50" cy="50" r="40" fill="steelblue"/>
  <text x="50" y="55" text-anchor="middle" fill="white">SVG</text>
</svg>

Parsing XML

Every major programming language includes an XML parser in its standard library or ecosystem. Here are the most common patterns.

JavaScript (Browser & Node.js)

Browsers provide DOMParser for parsing and XMLSerializer for serialising. Node.js does not include a native XML parser — common choices are fast-xml-parser or xml2js.

// Browser: parse XML string → DOM document
const parser = new DOMParser();
const doc = parser.parseFromString(xmlString, 'application/xml');

// Check for parse errors
const error = doc.querySelector('parsererror');
if (error) throw new Error('Invalid XML: ' + error.textContent);

// Query elements
const title = doc.querySelector('title').textContent;
const books = doc.querySelectorAll('book');
const isbn = doc.querySelector('book').getAttribute('isbn');

// Serialize DOM → string
const serializer = new XMLSerializer();
const xmlOut = serializer.serializeToString(doc);

Python

Python's standard library ships xml.etree.ElementTree (fast, C-based) and xml.dom.minidom (DOM-level interface). For complex XPath queries, the third-party lxml library is the industry standard.

import xml.etree.ElementTree as ET

# Parse from file
tree = ET.parse('catalog.xml')
root = tree.getroot()

# Parse from string
root = ET.fromstring(xml_string)

# Find elements
for book in root.findall('book'):
    title = book.find('title').text
    isbn  = book.get('isbn')          # attribute
    price = book.find('price').text
    print(f"{title} ({isbn}) — ${price}")

# Build XML programmatically
catalog = ET.Element('catalog')
book = ET.SubElement(catalog, 'book', isbn="978-0-13-468599-1")
ET.SubElement(book, 'title').text = 'The Pragmatic Programmer'

tree = ET.ElementTree(catalog)
ET.indent(tree, space='  ')           # Python 3.9+
tree.write('output.xml', encoding='UTF-8', xml_declaration=True)

Other Languages — Quick Reference

LanguageParse (XML → object)Serialize (object → XML)
JavaDocumentBuilder.parse()Transformer.transform()
PHPsimplexml_load_string()$xml->asXML()
Goxml.Unmarshal()xml.Marshal()
RubyNokogiri::XML()doc.to_xml
C#XDocument.Parse()doc.ToString()
Rustquick_xml::Readerquick_xml::Writer

Compare two XML documents

Spot every addition, removal, and change between two XML files with the free XML Compare tool — side-by-side diff with deep structural analysis.

Open XML Compare

XML Schema (XSD)

XSD (XML Schema Definition) is the modern replacement for the older DTD (Document Type Definition). Unlike DTD, an XSD is itself a valid XML document, which means it can be parsed, validated, and transformed with the same XML tooling.

An XSD defines:

  • Which elements and attributes are allowed
  • What data types they must contain (xs:string, xs:integer, xs:date, etc.)
  • Whether they are required or optional
  • Their ordering and cardinality (minOccurs / maxOccurs)
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="user">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="name"  type="xs:string"          minOccurs="1"/>
        <xs:element name="email" type="xs:string"          minOccurs="1"/>
        <xs:element name="age"   type="xs:positiveInteger" minOccurs="0"/>
        <xs:element name="roles">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="role" type="xs:string" maxOccurs="unbounded"/>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
      <xs:attribute name="id"     type="xs:string"  use="required"/>
      <xs:attribute name="active" type="xs:boolean" use="optional"/>
    </xs:complexType>
  </xs:element>
</xs:schema>

A document that satisfies its schema is called valid — a stricter guarantee than merely well-formed. Other schema languages for XML include RelaxNG (simpler syntax, equally powerful) and Schematron (rule-based validation using XPath assertions).

XPath & XSLT

XPath and XSLT are two complementary W3C standards that turn XML from a storage format into a query and transformation platform.

XPath — Query the XML tree

XPath expressions navigate the XML document tree using a path-like syntax, similar to filesystem paths.

/catalog/book                     → direct children named <book> of <catalog>
//title                           → all <title> elements anywhere in the document
/catalog/book[@isbn]              → books that have an isbn attribute
/catalog/book[@available='true']  → books where available="true"
//price[@currency='USD']          → prices in USD anywhere
//book[price > 30]                → books with a price element whose value > 30
count(/catalog/book)              → total number of book elements
/catalog/book[1]/title            → title of the first book (1-indexed)

XSLT — Transform XML into anything

XSLT (Extensible Stylesheet Language Transformations) uses XPath to select nodes from a source XML document and defines templates for what to output — another XML document, HTML, plain text, or any other format.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/catalog">
    <html>
      <body>
        <h1>Book Catalogue</h1>
        <ul>
          <xsl:for-each select="book">
            <li>
              <xsl:value-of select="title"/>
              — $<xsl:value-of select="price"/>
            </li>
          </xsl:for-each>
        </ul>
      </body>
    </html>
  </xsl:template>

</xsl:stylesheet>

Together, XPath and XSLT form a powerful pipeline for querying, transforming, and reporting on XML data without writing general-purpose code. XSLT is still widely used in publishing workflows, enterprise data integration, and generating HTML from XML source documents.

Common XML Use Cases

XML is not a niche format — it is baked into the toolchain of web, mobile, enterprise, and desktop software development.

  • RSS & Atom feeds — Blog posts, podcasts, and news sites syndicate content as XML. Every podcast directory (Apple, Spotify, Google) reads RSS 2.0, an XML format. Atom 1.0 (RFC 4287) is the newer IETF standard for feed syndication.
  • SVG (Scalable Vector Graphics) — SVG images are XML documents rendered natively by browsers. Every icon, chart, and illustration in SVG format is an XML file you can open in a text editor.
  • SOAP web services — Enterprise APIs built before REST use SOAP — a protocol wrapping messages in XML envelopes. Banking, insurance, healthcare (HL7/FHIR), and government systems still depend heavily on SOAP.
  • Maven (pom.xml) — Java project configuration, dependency management, and build lifecycle in Maven are defined in XML.
  • Android UI layouts — Android interface layouts (activity_main.xml, fragment_home.xml) are XML documents that the Android build system compiles to binary resources.
  • Office Open XML (.docx, .xlsx, .pptx) — Microsoft Office file formats are ZIP archives of XML files. Unzip any .docx to find word/document.xml, word/styles.xml, and more.
  • SAML (Security Assertion Markup Language) — Enterprise single sign-on (SSO) relies on SAML — XML-encoded assertions passed between identity providers (Okta, Azure AD) and service providers.

XML Best Practices

  1. Always include the XML declaration. Start every document with <?xml version="1.0" encoding="UTF-8"?>. Declaring UTF-8 explicitly prevents ambiguity across platforms and ensures correct handling of non-ASCII characters.
  2. Use elements for content, attributes for metadata. Data that users read or that has complex structure belongs in child elements. Identifiers, flags, units, and version numbers belong in attributes. Avoid overloading attributes as a shortcut to avoid nesting.
  3. Keep naming consistent. Pick a convention — lowercase-kebab, camelCase, or PascalCase — and apply it to all element and attribute names in your schema. Mixing styles makes documents harder to query and consume.
  4. Validate against an XSD. Define an XSD for any XML format you own and validate all incoming documents at the system boundary. Schema validation catches type mismatches, missing required elements, and structural errors before they reach business logic.
  5. Avoid excessive nesting. Beyond five or six levels, deeply nested XML becomes difficult to read and slow to query. Consider whether a flatter structure with ID references would be cleaner — similar to how relational databases normalise data.

Common XML Errors

These four mistakes are responsible for the overwhelming majority of XML parse failures.

Error: Missing closing tag

✗ Invalid

<user>
  <name>Alice

✓ Fixed

<user>
  <name>Alice</name>
</user>
Every element must have a closing tag or use self-closing syntax: <br/>
Error: Unquoted attribute value

✗ Invalid

<user id=001 active=true>

✓ Fixed

<user id="001" active="true">
All attribute values must be quoted — single or double quotes are both valid.
Error: Unescaped special character

✗ Invalid

<condition>a < b && c > d</condition>

✓ Fixed

<condition>a &lt; b &amp;&amp; c &gt; d</condition>
Use &lt; for <, &gt; for >, &amp; for &, &quot; for ", &apos; for '.
Error: Multiple root elements

✗ Invalid

<?xml version="1.0"?>
<user>Alice</user>
<user>Bob</user>

✓ Fixed

<?xml version="1.0"?>
<users>
  <user>Alice</user>
  <user>Bob</user>
</users>
Every XML document must have exactly one root element containing all others.

Frequently Asked Questions

What does XML stand for?
XML stands for Extensible Markup Language. The "extensible" part means you define your own tags rather than using a fixed vocabulary like HTML. XML 1.0 was published as a W3C Recommendation on February 10, 1998.
What is the difference between XML and HTML?
Both use angle-bracket tags, but they serve different purposes. HTML is a fixed vocabulary for presenting content in browsers — it is forgiving of errors and has its own parsing quirks mode. XML is a meta-language for defining custom vocabularies. XML is strict: any well-formedness violation stops parsing. HTML is designed for presentation; XML is designed for data exchange.
Is XML still relevant in 2026?
Yes. XML remains the backbone of RSS and Atom feeds, SVG graphics, SOAP web services, SAML single sign-on, Maven build files (pom.xml), Android UI layouts, and all Microsoft Office file formats (OOXML: .docx, .xlsx, .pptx). While JSON has displaced XML in most new web API work, XML's role in document formats and enterprise infrastructure is deeply embedded and stable.
What is the difference between XML and JSON?
JSON is lighter, faster to parse, and maps directly to native data structures in most programming languages. XML supports attributes, namespaces, CDATA sections, processing instructions, and formal schema validation (XSD/DTD), making it more expressive for complex document formats. For new web APIs, JSON is almost always the better choice. For documents, configuration, and enterprise systems, XML often remains preferable.
How do I validate XML online for free?
Use the free XML Formatter at jsonbuddy.net/xml-formatter. Paste your XML and the tool instantly highlights syntax errors, validates well-formedness, and formats the document — all in your browser without sending data to any server.

Ready to work with XML?

JSON Buddy gives you free browser-based tools for XML — format, validate, compare, and convert XML to JSON or YAML. No signup. Your data never leaves your browser.