django-htmlmin
django-htmlmin copied to clipboard
I think error
https://gist.github.com/vgulaev/4753159
Based in what I know, that's is not an error, that's html escaping. It's a security measure against XSS. In the htmlmin, that is done by BeautifulSoup:
soup = bs4.BeautifulSoup(html_code, "html5lib")
The html_code
is your original code.
After that, the soup
variable will be your Return.html
, encoded by BeautifulSoup.
If you take a look at BeautifulSoup code, you will find this:
CHARACTER_TO_XML_ENTITY = {
"'": "apos",
'"': "quot",
"&": "amp",
"<": "lt",
">": "gt",
}
In words: That means that any <
will be turned into lt
, >
into gt
and so on.
I hope that helps, if so, may I close this issue @vgulaev?
Yes you absolutely right.
soup = BeautifulSoup(htmlstr, "html5lib") print(soup.prettify("utf-8"))
return the same code:((
I'm reopening this issue.
I know that beautifulsoup escapes the content of noscript but I think that it is a wrong behaviour.
I agree, that is wrong behaviour. If use BeautifulSoup without particular "html5lib" the result is fine.
But if we don't use html5lib, other things will break. It's time to create our own html5 parser.
when we start working?!:) coding
@vgulaev I created the https://github.com/cobrateam/html5py project :)
There were similar problems with script and css tags which were fixed by #45 - so it looks like adding noscript to the EXCLUDE_TAGS would fix this issue.
Would be nice to be able to configure EXCLUDE_TAGS from project's settings.py Adding noscript to that tuple correctly fixes the problem for me. Right now I'm doing:
from htmlmin import minify
minify.EXCLUDE_TAGS += ('noscript',)
@honi Where would you put that code? Putting it in settings.py doesn't seem to have any effect here?
Yes, I've put it in settings.py. Though that was for an old project, I'm not needing to do this anymore, but I don't have any noscript tags in this project, so...