gumbo.capi

Undocumented in source.

Members

Aliases

GumboAllocatorFunction
alias GumboAllocatorFunction = void* function(void* userdata, size_t size)

The type for an allocator function. Takes the 'userdata' member of the GumboParser struct as its first argument. Semantics should be the same as malloc, i.e. return a block of size_t bytes on success or NULL on failure. Allocating a block of 0 bytes behaves as per malloc.

GumboDeallocatorFunction
alias GumboDeallocatorFunction = void function(void* userdata, void* ptr)

The type for a deallocator function. Takes the 'userdata' member of the GumboParser struct as its first argument.

Enums

GumboAttributeNamespaceEnum
enum GumboAttributeNamespaceEnum

Attribute namespaces. HTML includes special handling for XLink, XML, and XMLNS namespaces on attributes. Everything else goes in the generatic "NONE" namespace.

GumboNamespaceEnum
enum GumboNamespaceEnum

Namespaces. Unlike in X(HT)ML, namespaces in HTML5 are not denoted by a prefix. Rather, anything inside an <svg> tag is in the SVG namespace, anything inside the <math> tag is in the MathML namespace, and anything else is inside the HTML namespace. No other namespaces are supported, so this can be an enum only.

GumboNodeType
enum GumboNodeType

Enum denoting the type of node. This determines the type of the node.v union.

GumboParseFlags
enum GumboParseFlags

Parse flags. We track the reasons for parser insertion of nodes and store them in a bitvector in the node itself. This lets client code optimize out nodes that are implied by the HTML structure of the document, or flag constructs that may not be allowed by a style guide, or track the prevalence of incorrect or tricky HTML code.

GumboQuirksModeEnum
enum GumboQuirksModeEnum

http://www.whatwg.org/specs/web-apps/current-work/complete/dom.html#quirks-mode

GumboTag
enum GumboTag

An enum for all the tags defined in the HTML5 standard. These correspond to the tag names themselves. Enum constants exist only for tags which appear in the spec itself (or for tags with special handling in the SVG and MathML namespaces); any other tags appear as GUMBO_TAG_UNKNOWN and the actual tag name can be obtained through original_tag.

Functions

gumbo_destroy_output
void gumbo_destroy_output(GumboOptions* options, GumboOutput* output)

Release the memory used for the parse tree & parse errors.

gumbo_get_attribute
GumboAttribute* gumbo_get_attribute(GumboVector* attrs, char* name)

Given a vector of GumboAttributes, look up the one with the specified name and return it, or NULL if no such attribute exists. This uses a case-insensitive match, as HTML is case-insensitive.

gumbo_normalize_svg_tagname
char* gumbo_normalize_svg_tagname(GumboStringPiece* tagname)

Fixes the case of SVG elements that are not all lowercase. http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html#parsing-main-inforeign

This is not done at parse time because there's no place to store a mutated tag name. tag_name is an enum (which will be TAG_UNKNOWN for most SVG tags without special handling), while original_tag_name is a pointer into the original buffer. Instead, we provide this helper function that clients can use to rename SVG tags as appropriate. Returns the case-normalized SVG tagname if a replacement is found, or NULL if no normalization is called for. The return value is static data and owned by the library.

gumbo_normalized_tagname
char* gumbo_normalized_tagname(GumboTag tag)

Returns the normalized (usually all-lowercased, except for foreign content) tag name for an GumboTag enum. Return value is static data owned by the library.

gumbo_parse
GumboOutput* gumbo_parse(char* buffer)

Parses a buffer of UTF8 text into an GumboNode parse tree. The buffer must live at least as long as the parse tree, as some fields (eg. original_text) point directly into the original buffer.

gumbo_parse_with_options
GumboOutput* gumbo_parse_with_options(GumboOptions* options, char* buffer, size_t buffer_length)

Extended version of gumbo_parse that takes an explicit options structure, buffer, and length.

gumbo_string_equals
bool gumbo_string_equals(GumboStringPiece* str1, GumboStringPiece* str2)

Compares two GumboStringPieces, and returns true if they're equal or false otherwise.

gumbo_string_equals_ignore_case
bool gumbo_string_equals_ignore_case(GumboStringPiece* str1, GumboStringPiece* str2)

Compares two GumboStringPieces ignoring case, and returns true if they're equal or false otherwise.

gumbo_tag_enum
GumboTag gumbo_tag_enum(char* tagname)

Converts a tag name string (which may be in upper or mixed case) to a tag enum.

gumbo_tag_from_original_text
void gumbo_tag_from_original_text(GumboStringPiece* text)

Extracts the tag name from the original_text field of an element or token by stripping off </> characters and attributes and adjusting the passed-in GumboStringPiece appropriately. The tag name is in the original case and shares a buffer with the original text, to simplify memory management. Behavior is undefined if a string-piece that doesn't represent an HTML tag (<tagname> or </tagname>) is passed in. If the string piece is completely empty (NULL data pointer), then this function will exit successfully as a no-op.

gumbo_vector_index_of
int gumbo_vector_index_of(GumboVector* vector, void* element)

Returns the first index at which an element appears in this vector (testing by pointer equality), or -1 if it never does.

Structs

GumboAttribute
struct GumboAttribute

A struct representing a single attribute on an HTML tag. This is a name-value pair, but also includes information about source locations and original source text.

GumboDocument
struct GumboDocument

Information specific to document nodes.

GumboElement
struct GumboElement

The struct used to represent all HTML elements. This contains information about the tag, attributes, and child nodes.

GumboNode
struct GumboNode

A supertype for GumboElement and GumboText, so that we can include one generic type in lists of children and cast as necessary to subtypes.

GumboOptions
struct GumboOptions

Input struct containing configuration options for the parser. These let you specify alternate memory managers, provide different error handling, etc. Use kGumboDefaultOptions for sensible defaults, and only set what you need.

GumboOutput
struct GumboOutput

The output struct containing the results of the parse.

GumboSourcePosition
struct GumboSourcePosition

A struct representing a character position within the original text buffer. Line and column numbers are 1-based and offsets are 0-based, which matches how most editors and command-line tools work. Also, columns measure positions in terms of characters while offsets measure by bytes; this is because the offset field is often used to pull out a particular region of text (which in most languages that bind to C implies pointer arithmetic on a buffer of bytes), while the column field is often used to reference a particular column on a printable display, which nowadays is usually UTF-8.

GumboStringPiece
struct GumboStringPiece

A struct representing a string or part of a string. Strings within the parser are represented by a char* and a length; the char* points into an existing data buffer owned by some other code (often the original input). GumboStringPieces are assumed (by convention) to be immutable, because they may share data. Use GumboStringBuffer if you need to construct a string. Clients should assume that it is not NUL-terminated, and should always use explicit lengths when manipulating them.

GumboText
struct GumboText

The struct used to represent TEXT, CDATA, COMMENT, and WHITESPACE elements. This contains just a block of text and its position.

GumboVector
struct GumboVector

A simple vector implementation. This stores a pointer to a data array and a length. All elements are stored as void*; client code must cast to the appropriate type. Overflows upon addition result in reallocation of the data array, with the size doubling to maintain O(1) amortized cost. There is no removal function, as this isn't needed for any of the operations within this library. Iteration can be done through inspecting the structure directly in a for-loop.

Variables

kGumboDefaultOptions
GumboOptions kGumboDefaultOptions;

Default options struct; use this with gumbo_parse_with_options.

kGumboEmptySourcePosition
GumboSourcePosition kGumboEmptySourcePosition;

A SourcePosition used for elements that have no source position, i.e. parser-inserted elements.

kGumboEmptyString
GumboStringPiece kGumboEmptyString;

A constant to represent a 0-length null string.

kGumboEmptyVector
GumboVector kGumboEmptyVector;

An empty (0-length, 0-capacity) GumboVector.

Meta