(PHP 5, PHP 7, PHP 8)
DOMDocument::loadHTML — Load HTML from a string
The function parses the HTML contained in the string source
.
Unlike loading XML, HTML does not have to be well-formed to load.
This function parses the input using an HTML 4 parser. The parsing rules of HTML 5, which is what modern web browsers use, are different. Depending on the input this might result in a different DOM structure. Therefore this function cannot be safely used for sanitizing HTML.
The behavior when parsing HTML can depend on the version of
libxml
that is being used, particularly with regards to
edge conditions and error handling.
For parsing that conforms to the HTML5 specification,
use Dom\HTMLDocument::createFromString() or
Dom\HTMLDocument::createFromFile(), added in PHP 8.4.
As an example, some HTML elements will implicitly close a parent element when encountered. The rules for automatically closing parent elements differ between HTML 4 and HTML 5 and thus the resulting DOM structure that DOMDocument sees might be different from the DOM structure a web browser sees, possibly allowing an attacker to break the resulting HTML.
If an empty string is passed as the source
,
a warning will be generated. This warning is not generated by libxml
and cannot be handled using libxml's error handling functions.
While malformed HTML should load successfully, this function may generate E_WARNING
errors when it encounters bad markup. libxml's error handling functions may be used to handle these errors.
Version | Description |
---|---|
8.3.0 | This function now has a tentative bool return type. |
8.0.0 |
Calling this function statically will
now throw an Error.
Previously, an E_DEPRECATED was raised.
|
Example #1 Creating a Document
<?php
$doc = new DOMDocument();
$doc->loadHTML("<html><body>Test<br></body></html>");
echo $doc->saveHTML();
?>