The type for an allocator function. Takes the 'userdata' member of the GumboParser struct as its first argument. Semantics should be the same as malloc, i.e. return a block of size_t bytes on success or NULL on failure. Allocating a block of 0 bytes behaves as per malloc.
The type for a deallocator function. Takes the 'userdata' member of the GumboParser struct as its first argument.
Attribute namespaces. HTML includes special handling for XLink, XML, and XMLNS namespaces on attributes. Everything else goes in the generatic "NONE" namespace.
Namespaces. Unlike in X(HT)ML, namespaces in HTML5 are not denoted by a prefix. Rather, anything inside an <svg> tag is in the SVG namespace, anything inside the <math> tag is in the MathML namespace, and anything else is inside the HTML namespace. No other namespaces are supported, so this can be an enum only.
Enum denoting the type of node. This determines the type of the node.v union.
Parse flags. We track the reasons for parser insertion of nodes and store them in a bitvector in the node itself. This lets client code optimize out nodes that are implied by the HTML structure of the document, or flag constructs that may not be allowed by a style guide, or track the prevalence of incorrect or tricky HTML code.
An enum for all the tags defined in the HTML5 standard. These correspond to the tag names themselves. Enum constants exist only for tags which appear in the spec itself (or for tags with special handling in the SVG and MathML namespaces); any other tags appear as GUMBO_TAG_UNKNOWN and the actual tag name can be obtained through original_tag.
Release the memory used for the parse tree & parse errors.
Given a vector of GumboAttributes, look up the one with the specified name and return it, or NULL if no such attribute exists. This uses a case-insensitive match, as HTML is case-insensitive.
Fixes the case of SVG elements that are not all lowercase. http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html#parsing-main-inforeign
This is not done at parse time because there's no place to store a mutated tag name. tag_name is an enum (which will be TAG_UNKNOWN for most SVG tags without special handling), while original_tag_name is a pointer into the original buffer. Instead, we provide this helper function that clients can use to rename SVG tags as appropriate. Returns the case-normalized SVG tagname if a replacement is found, or NULL if no normalization is called for. The return value is static data and owned by the library.
Returns the normalized (usually all-lowercased, except for foreign content) tag name for an GumboTag enum. Return value is static data owned by the library.
Parses a buffer of UTF8 text into an GumboNode parse tree. The buffer must live at least as long as the parse tree, as some fields (eg. original_text) point directly into the original buffer.
Extended version of gumbo_parse that takes an explicit options structure, buffer, and length.
Compares two GumboStringPieces, and returns true if they're equal or false otherwise.
Compares two GumboStringPieces ignoring case, and returns true if they're equal or false otherwise.
Converts a tag name string (which may be in upper or mixed case) to a tag enum.
Extracts the tag name from the original_text field of an element or token by stripping off </> characters and attributes and adjusting the passed-in GumboStringPiece appropriately. The tag name is in the original case and shares a buffer with the original text, to simplify memory management. Behavior is undefined if a string-piece that doesn't represent an HTML tag (<tagname> or </tagname>) is passed in. If the string piece is completely empty (NULL data pointer), then this function will exit successfully as a no-op.
Returns the first index at which an element appears in this vector (testing by pointer equality), or -1 if it never does.
A struct representing a single attribute on an HTML tag. This is a name-value pair, but also includes information about source locations and original source text.
Information specific to document nodes.
The struct used to represent all HTML elements. This contains information about the tag, attributes, and child nodes.
A supertype for GumboElement and GumboText, so that we can include one generic type in lists of children and cast as necessary to subtypes.
Input struct containing configuration options for the parser. These let you specify alternate memory managers, provide different error handling, etc. Use kGumboDefaultOptions for sensible defaults, and only set what you need.
The output struct containing the results of the parse.
A struct representing a character position within the original text buffer. Line and column numbers are 1-based and offsets are 0-based, which matches how most editors and command-line tools work. Also, columns measure positions in terms of characters while offsets measure by bytes; this is because the offset field is often used to pull out a particular region of text (which in most languages that bind to C implies pointer arithmetic on a buffer of bytes), while the column field is often used to reference a particular column on a printable display, which nowadays is usually UTF-8.
A struct representing a string or part of a string. Strings within the parser are represented by a char* and a length; the char* points into an existing data buffer owned by some other code (often the original input). GumboStringPieces are assumed (by convention) to be immutable, because they may share data. Use GumboStringBuffer if you need to construct a string. Clients should assume that it is not NUL-terminated, and should always use explicit lengths when manipulating them.
The struct used to represent TEXT, CDATA, COMMENT, and WHITESPACE elements. This contains just a block of text and its position.
A simple vector implementation. This stores a pointer to a data array and a length. All elements are stored as void*; client code must cast to the appropriate type. Overflows upon addition result in reallocation of the data array, with the size doubling to maintain O(1) amortized cost. There is no removal function, as this isn't needed for any of the operations within this library. Iteration can be done through inspecting the structure directly in a for-loop.
Default options struct; use this with gumbo_parse_with_options.
A SourcePosition used for elements that have no source position, i.e. parser-inserted elements.
A constant to represent a 0-length null string.
An empty (0-length, 0-capacity) GumboVector.