YuNetSurf HTML5 parser and tree builder with CSS3 tokenizer, parser, and selection engine.
- Parse HTML, good and bad.
- HTML5 tokeniser with SAX-style events.
- Flexible tree builder API for custom DOM implementations.
- Readily provided HTML5 parser classes convert HTML documents into DOM trees:
- Simple DOM tree: Fast and efficient.
- LibXml2 tree: Fully compatible with LibXml2 functions, including XPATH and LibXslt transformation.
- XDOM tree: Feasibility study limited by XDOM restrictions.
- Fast, efficient, and low memory usage.
- Parse CSS, good and bad.
- Apply CSS rules to DOM nodes:
- CSS2 and CSS3 selectors.
- Handle most CSS2 as well as some CSS3 properties.
- Selector engines readily provided for
- Simple DOM tree.
- LibXml2 tree.
- Flexible selection API to apply CSS to custom DOM implementations.
- Fast, efficient, and low memory usage.
- Extract information from HTML documents:
- Locate HTML elements based on their tag, attribute, or CSS properties.
- Extract values of attributes from elements.
- Find the n-th sibling or child element parsed data.
- Cleanup, change, or extend HTML documents:
- Add, change, or remove attributes of elements.
- Manipulate the inner content of elements.
- Wrap element content inside new elements.
- Use CSS to detect hidden, highlighted, or invisible HTML fragments.
- Apply custom CSS as an HTML element filter or locator.
- Download HTML documents with linked CSSURL resources.