19. Structured Markup Processing Tools¶
Python supports a variety of modules to work with various forms of structured data markup. This includes modules to work with the Standard Generalized Markup Language (SGML) and the Hypertext Markup Language (HTML), and several interfaces for working with the Extensible Markup Language (XML).
It is important to note that modules in the xml
package require that
there be at least one SAX-compliant XML parser available. Starting with Python
2.3, the Expat parser is included with Python, so the xml.parsers.expat
module will always be available. You may still want to be aware of the PyXML
add-on package; that package provides an
extended set of XML libraries for Python.
The documentation for the xml.dom
and xml.sax
packages are the
definition of the Python bindings for the DOM and SAX interfaces.
- 19.1.
HTMLParser
— Simple HTML and XHTML parserHTMLParser
HTMLParseError
- 19.1.1. Example HTML Parser Application
- 19.1.2.
HTMLParser
MethodsHTMLParser.feed()
HTMLParser.close()
HTMLParser.reset()
HTMLParser.getpos()
HTMLParser.get_starttag_text()
HTMLParser.handle_starttag()
HTMLParser.handle_endtag()
HTMLParser.handle_startendtag()
HTMLParser.handle_data()
HTMLParser.handle_entityref()
HTMLParser.handle_charref()
HTMLParser.handle_comment()
HTMLParser.handle_decl()
HTMLParser.handle_pi()
HTMLParser.unknown_decl()
- 19.1.3. Examples
- 19.2.
sgmllib
— Simple SGML parserSGMLParser
SGMLParseError
SGMLParser.reset()
SGMLParser.setnomoretags()
SGMLParser.setliteral()
SGMLParser.feed()
SGMLParser.close()
SGMLParser.get_starttag_text()
SGMLParser.handle_starttag()
SGMLParser.handle_endtag()
SGMLParser.handle_data()
SGMLParser.handle_charref()
SGMLParser.convert_charref()
SGMLParser.convert_codepoint()
SGMLParser.handle_entityref()
SGMLParser.convert_entityref()
SGMLParser.handle_comment()
SGMLParser.handle_decl()
SGMLParser.report_unbalanced()
SGMLParser.unknown_starttag()
SGMLParser.unknown_endtag()
SGMLParser.unknown_charref()
SGMLParser.unknown_entityref()
- 19.3.
htmllib
— A parser for HTML documents - 19.4.
htmlentitydefs
— Definitions of HTML general entities - 19.5. XML Processing Modules
- 19.6. XML vulnerabilities
- 19.7.
xml.etree.ElementTree
— The ElementTree XML API- 19.7.1. Tutorial
- 19.7.2. XPath support
- 19.7.3. Reference
- 19.7.3.1. Functions
- 19.7.3.2. Element Objects
Element
Element.tag
Element.text
Element.tail
Element.attrib
Element.clear()
Element.get()
Element.items()
Element.keys()
Element.set()
Element.append()
Element.extend()
Element.find()
Element.findall()
Element.findtext()
Element.getchildren()
Element.getiterator()
Element.insert()
Element.iter()
Element.iterfind()
Element.itertext()
Element.makeelement()
Element.remove()
- 19.7.3.3. ElementTree Objects
- 19.7.3.4. QName Objects
- 19.7.3.5. TreeBuilder Objects
- 19.7.3.6. XMLParser Objects
- 19.8.
xml.dom
— The Document Object Model API- 19.8.1. Module Contents
- 19.8.2. Objects in the DOM
- 19.8.2.1. DOMImplementation Objects
- 19.8.2.2. Node Objects
Node.nodeType
Node.parentNode
Node.attributes
Node.previousSibling
Node.nextSibling
Node.childNodes
Node.firstChild
Node.lastChild
Node.localName
Node.prefix
Node.namespaceURI
Node.nodeName
Node.nodeValue
Node.hasAttributes()
Node.hasChildNodes()
Node.isSameNode()
Node.appendChild()
Node.insertBefore()
Node.removeChild()
Node.replaceChild()
Node.normalize()
Node.cloneNode()
- 19.8.2.3. NodeList Objects
- 19.8.2.4. DocumentType Objects
- 19.8.2.5. Document Objects
- 19.8.2.6. Element Objects
Element.tagName
Element.getElementsByTagName()
Element.getElementsByTagNameNS()
Element.hasAttribute()
Element.hasAttributeNS()
Element.getAttribute()
Element.getAttributeNode()
Element.getAttributeNS()
Element.getAttributeNodeNS()
Element.removeAttribute()
Element.removeAttributeNode()
Element.removeAttributeNS()
Element.setAttribute()
Element.setAttributeNode()
Element.setAttributeNodeNS()
Element.setAttributeNS()
- 19.8.2.7. Attr Objects
- 19.8.2.8. NamedNodeMap Objects
- 19.8.2.9. Comment Objects
- 19.8.2.10. Text and CDATASection Objects
- 19.8.2.11. ProcessingInstruction Objects
- 19.8.2.12. Exceptions
- 19.8.3. Conformance
- 19.9.
xml.dom.minidom
— Minimal DOM implementation - 19.10.
xml.dom.pulldom
— Support for building partial DOM trees - 19.11.
xml.sax
— Support for SAX2 parsers - 19.12.
xml.sax.handler
— Base classes for SAX handlersContentHandler
DTDHandler
EntityResolver
ErrorHandler
feature_namespaces
feature_namespace_prefixes
feature_string_interning
feature_validation
feature_external_ges
feature_external_pes
all_features
property_lexical_handler
property_declaration_handler
property_dom_node
property_xml_string
all_properties
- 19.12.1. ContentHandler Objects
ContentHandler.setDocumentLocator()
ContentHandler.startDocument()
ContentHandler.endDocument()
ContentHandler.startPrefixMapping()
ContentHandler.endPrefixMapping()
ContentHandler.startElement()
ContentHandler.endElement()
ContentHandler.startElementNS()
ContentHandler.endElementNS()
ContentHandler.characters()
ContentHandler.ignorableWhitespace()
ContentHandler.processingInstruction()
ContentHandler.skippedEntity()
- 19.12.2. DTDHandler Objects
- 19.12.3. EntityResolver Objects
- 19.12.4. ErrorHandler Objects
- 19.13.
xml.sax.saxutils
— SAX Utilities - 19.14.
xml.sax.xmlreader
— Interface for XML parsersXMLReader
IncrementalParser
Locator
InputSource
AttributesImpl
AttributesNSImpl
- 19.14.1. XMLReader Objects
XMLReader.parse()
XMLReader.getContentHandler()
XMLReader.setContentHandler()
XMLReader.getDTDHandler()
XMLReader.setDTDHandler()
XMLReader.getEntityResolver()
XMLReader.setEntityResolver()
XMLReader.getErrorHandler()
XMLReader.setErrorHandler()
XMLReader.setLocale()
XMLReader.getFeature()
XMLReader.setFeature()
XMLReader.getProperty()
XMLReader.setProperty()
- 19.14.2. IncrementalParser Objects
- 19.14.3. Locator Objects
- 19.14.4. InputSource Objects
- 19.14.5. The
Attributes
Interface - 19.14.6. The
AttributesNS
Interface
- 19.15.
xml.parsers.expat
— Fast XML parsing using ExpatExpatError
error
XMLParserType
ErrorString()
ParserCreate()
- 19.15.1. XMLParser Objects
xmlparser.Parse()
xmlparser.ParseFile()
xmlparser.SetBase()
xmlparser.GetBase()
xmlparser.GetInputContext()
xmlparser.ExternalEntityParserCreate()
xmlparser.SetParamEntityParsing()
xmlparser.UseForeignDTD()
xmlparser.buffer_size
xmlparser.buffer_text
xmlparser.buffer_used
xmlparser.ordered_attributes
xmlparser.returns_unicode
xmlparser.specified_attributes
xmlparser.ErrorByteIndex
xmlparser.ErrorCode
xmlparser.ErrorColumnNumber
xmlparser.ErrorLineNumber
xmlparser.CurrentByteIndex
xmlparser.CurrentColumnNumber
xmlparser.CurrentLineNumber
xmlparser.XmlDeclHandler()
xmlparser.StartDoctypeDeclHandler()
xmlparser.EndDoctypeDeclHandler()
xmlparser.ElementDeclHandler()
xmlparser.AttlistDeclHandler()
xmlparser.StartElementHandler()
xmlparser.EndElementHandler()
xmlparser.ProcessingInstructionHandler()
xmlparser.CharacterDataHandler()
xmlparser.UnparsedEntityDeclHandler()
xmlparser.EntityDeclHandler()
xmlparser.NotationDeclHandler()
xmlparser.StartNamespaceDeclHandler()
xmlparser.EndNamespaceDeclHandler()
xmlparser.CommentHandler()
xmlparser.StartCdataSectionHandler()
xmlparser.EndCdataSectionHandler()
xmlparser.DefaultHandler()
xmlparser.DefaultHandlerExpand()
xmlparser.NotStandaloneHandler()
xmlparser.ExternalEntityRefHandler()
- 19.15.2. ExpatError Exceptions
- 19.15.3. Example
- 19.15.4. Content Model Descriptions
- 19.15.5. Expat error constants