I am retreiving some html strings from my database and I would like to parse these strings into my DOMDocument. The problem is, that the DOMDocument gives warnings at special characters.
I wonder why and I wonder how to solve this. This are some code fragments of my page. How can I fix these kind of warnings?
I also found something about validation, but when I apply that, my page won’t load anymore. The code I tried for that was something like this.
Thanks in advance!
There is no
That means you have to use the numeric equivalent of a non-breaking space, which is
If you are trying to save HTML into an XML container, then save it as text. HTML and XML may look similar but they are very distinct.
That’s a tricky one because it’s actually multiple issues in one.
Like Tomalak points out, there is no
but because there is millions of requests to that page daily, the W3C decided to block access to the page, unless there is a UserAgent sent in the request. To supply a UserAgent you have to create a custom stream context.
This still takes some time to complete (dont ask me why) but in the end, you’ll get (reformatted for SO)
Also see DOMDocument::validate() problem
I do see the problem in question, and also that the question has been answered, but if I may I’d like to suggest a thought from my past dealing with similar problems.
It just might be so that your task requires including tagged data from the database in the resulting XML, but may or may not require parsing. If it’s merely data for inclusion, and not structured parts of your XML, you can place strings from the database in CDATA section(s), effectively bypassing all validation errors at this stage.
While smarty might be a good bet (why invent the wheel for the 14th time?), etranger might have a point. There’s situations in which you don’t want to use something overkill like a complete new (and unstudied) package, but more like you want to post some data from a database that just happens to contain html stuff an XML parser has issues with.
Warning, the following is a simple solution, but don’t do it unless you’re SURE you can get away with it! (I did this when I had about 2 hours before a deadline and didn’t have time to study, leave lone implement something like smarty…)
Before sticking the string into an appendXML function, run it through a preg_replace. For instance, replace all & nbsp; characters with [some_prefix]_nbsp. Then, on the page where you show the html, do it the other way around.
And Presto! =)
Code that parsed the string and writes the html:
It’s probably a good idea to think up a stronger replace. (If you insist on being thorough: Do a md5 on a time() value, and hardcode the result of that as a prefix. So like in the first snippet:
And in the second:
Do the same for any other tags and stuff you need to circumvent.
This is a hack, and not good code by any stretch of the imagination. But it saved my live and wanted to share it with other people that run into this particular problem with minutes to spare.
Use the above at your own risk.
replyGGKF - Ted Guild：Due to the volume of DTD requests W3C not only blocks user agents that do not identify themselves but has set up a tarpit to encourage software and library authors to use a catalog. Wouldn't it be more efficient to validate your markup using local resources instead of going over the net to retrieve something that has not changed since August of 2002 each and every time your code runs? At the very least these libraries should more fully implement HTTP and take advantage of the caching directives which would mean they would only retrieve the DTD once every three months. W3C's tarpit delay is meant to emphasize this so developers notice and file bug reports/feature requests with the library maintainers.