django-htmlmin icon indicating copy to clipboard operation
django-htmlmin copied to clipboard

I think error

Open vgulaev opened this issue 12 years ago • 11 comments

https://gist.github.com/vgulaev/4753159

vgulaev avatar Feb 11 '13 07:02 vgulaev

Based in what I know, that's is not an error, that's html escaping. It's a security measure against XSS. In the htmlmin, that is done by BeautifulSoup:

soup = bs4.BeautifulSoup(html_code, "html5lib")

The html_code is your original code. After that, the soup variable will be your Return.html, encoded by BeautifulSoup.

If you take a look at BeautifulSoup code, you will find this:

CHARACTER_TO_XML_ENTITY = {
    "'": "apos",
    '"': "quot",
    "&": "amp",
    "<": "lt",
    ">": "gt",
}

In words: That means that any < will be turned into lt, > into gt and so on.

I hope that helps, if so, may I close this issue @vgulaev?

bernardobarreto avatar Feb 11 '13 17:02 bernardobarreto

Yes you absolutely right.

soup = BeautifulSoup(htmlstr, "html5lib")
print(soup.prettify("utf-8"))

return the same code:((

vgulaev avatar Feb 12 '13 04:02 vgulaev

I'm reopening this issue.

I know that beautifulsoup escapes the content of noscript but I think that it is a wrong behaviour.

andrewsmedina avatar Feb 12 '13 13:02 andrewsmedina

I agree, that is wrong behaviour. If use BeautifulSoup without particular "html5lib" the result is fine.

vgulaev avatar Feb 13 '13 06:02 vgulaev

But if we don't use html5lib, other things will break. It's time to create our own html5 parser.

andrewsmedina avatar Feb 13 '13 14:02 andrewsmedina

when we start working?!:) coding

vgulaev avatar Feb 14 '13 04:02 vgulaev

@vgulaev I created the https://github.com/cobrateam/html5py project :)

andrewsmedina avatar Feb 14 '13 21:02 andrewsmedina

There were similar problems with script and css tags which were fixed by #45 - so it looks like adding noscript to the EXCLUDE_TAGS would fix this issue.

foobacca avatar Apr 30 '13 09:04 foobacca

Would be nice to be able to configure EXCLUDE_TAGS from project's settings.py Adding noscript to that tuple correctly fixes the problem for me. Right now I'm doing:

from htmlmin import minify
minify.EXCLUDE_TAGS += ('noscript',)

honi avatar Jun 14 '13 15:06 honi

@honi Where would you put that code? Putting it in settings.py doesn't seem to have any effect here?

reinoudvansanten avatar Mar 20 '14 09:03 reinoudvansanten

Yes, I've put it in settings.py. Though that was for an old project, I'm not needing to do this anymore, but I don't have any noscript tags in this project, so...

honi avatar Mar 20 '14 11:03 honi