Represents the DOM in memory. Provides functions to parse documents and access individual elements (see HtmlNode
).
Public Properties
Property | Description |
---|---|
root |
Root node of the document. |
nodes |
List of top-level nodes in the document. |
callback |
Callback function that is called for each element in the DOM when generating outertext. |
lowercase |
If enabled, all tag names are converted to lowercase when parsing documents. |
original_size |
Original document size in bytes. |
size |
Current document size in bytes. |
_charset |
Charset of the original document. |
_target_charset |
Target charset for the current document. |
default_br_text |
Text to return for <br> elements. |
default_span_text |
Text to return for <span> elements. |
Protected Properties
Property | Description |
---|---|
pos |
Current parsing position within doc . |
doc |
The original document. |
char |
Character at position pos in doc . |
cursor |
Current element cursor in the document. |
parent |
Parent element node. |
noise |
Noise from the original document (i.e. scripts, comments, etc...). |
token_blank |
Tokens that are considered whitespace in HTML. |
block_tags |
A list of tag names where remaining unclosed tags are forcibly closed. |
optional_closing_tags |
A list of tag names where the closing tag can be omitted. |