
I like to use the PEAR library HTML_Safe to clean up any user input I collect from forms and such before saving to a database. (It is also downloadable seperately from PixelApes).
It strips out any potentially dangerous HTML and code such as;
- opening tag without its closing tag
- closing tag without its opening tag
- any of these tags: “base”, “basefont”, “head”, “html”, “body”, “applet”, “object”,
“iframe”, “frame”, “frameset”, “script”, “layer”, “ilayer”, “embed”, “bgsound”,
“link”, “meta”, “style”, “title”, “blink”, “xml” etc.
- any of these attributes: on*, data*, dynsrc
- javascript:/vbscript:/about: etc. protocols
- expression/behavior etc. in styles
- any other active content
It’s been stuck at 0.9.9 beta since 2005 but the oldies are the goodies (See qmail, 1 & 2).
Useage, say for example, I want to make the $_GET['show'] variabl, which is passed in the query string safe;
require_once 'HTML/Safe.php';
$safehtml =& new HTML_Safe();
$show_safe = $safehtml->parse($_GET['show']);
For a lazy simple programmer it is simple to use even with ADODb’s AutoExecute() function which I am using more and more recently;
$safehtml =& new HTML_Safe();
foreach ($_POST as $foo) {
$_POST[$foo] = $safehtml->parse($foo);
}
$insert_rs = $conn->AutoExecute('SOME_TABLE', $_POST, 'INSERT');
Simple as.
It is also worth looking at HTMLPurifer which seems to be more recently updated.