AdvancedHTMLParser icon indicating copy to clipboard operation
AdvancedHTMLParser copied to clipboard

AttributeError: 'NoneType' object has no attribute 'strip'

Open Amirelkanov opened this issue 1 year ago • 0 comments

Desctiption: Getting an AttributeError when passing an html-like string with a corrupted <style> tag in the AdvancedHTMLParser.AdvancedHTMLParser().parseStr method.

String input:

<!DOCTYPE html><html><head><title>W33ZpsIOCysn9GGU45y0LW9EpuPHBlAuxCRRusKRvowefQLMy2</title><style:p { color: red; }</style></head><body><ul><li>rp52OnfCuzqBsp7</li><li>wrAAhIfvfpvMeyoTdmoF1oxezMhscNlgTqo0fPhfUS7XWZvECi2iVMsldLpqJq6W34KuOeoJ74cx5</li><li>8ymeXTKNEDb3jDnYwKt3lFMc4s7pJxDIVgSXljWIlOjv7JGr8cXf8SJOmpiyD05PyTzj9UATCFo1XqBpCqXR7KcjUYinCI4kZYI</li></ul> 6L1gB6g0z</body></html>

Bytearray input:

[60, 33, 68, 79, 67, 84, 89, 80, 69, 32, 104, 116, 109, 108, 62, 60, 104, 116, 109, 108, 62, 60, 104, 101, 97, 100, 62, 60, 116, 105, 116, 108, 101, 62, 87, 51, 51, 90, 112, 115, 73, 79, 67, 121, 115, 110, 57, 71, 71, 85, 52, 53, 121, 48, 76, 87, 57, 69, 112, 117, 80, 72, 66, 108, 65, 117, 120, 67, 82, 82, 117, 115, 75, 82, 118, 111, 119, 101, 102, 81, 76, 77, 121, 50, 60, 47, 116, 105, 116, 108, 101, 62, 60, 115, 116, 121, 108, 101, 58, 112, 32, 123, 32, 99, 111, 108, 111, 114, 58, 32, 114, 101, 100, 59, 32, 125, 60, 47, 115, 116, 121, 108, 101, 62, 60, 47, 104, 101, 97, 100, 62, 60, 98, 111, 100, 121, 62, 60, 117, 108, 62, 60, 108, 105, 62, 114, 112, 53, 50, 79, 110, 102, 67, 117, 122, 113, 66, 115, 112, 55, 60, 47, 108, 105, 62, 60, 108, 105, 62, 119, 114, 65, 65, 104, 73, 102, 118, 102, 112, 118, 77, 101, 121, 111, 84, 100, 109, 111, 70, 49, 111, 120, 101, 122, 77, 104, 115, 99, 78, 108, 103, 84, 113, 111, 48, 102, 80, 104, 102, 85, 83, 55, 88, 87, 90, 118, 69, 67, 105, 50, 105, 86, 77, 115, 108, 100, 76, 112, 113, 74, 113, 54, 87, 51, 52, 75, 117, 79, 101, 111, 74, 55, 52, 99, 120, 53, 60, 47, 108, 105, 62, 60, 108, 105, 62, 56, 121, 109, 101, 88, 84, 75, 78, 69, 68, 98, 51, 106, 68, 110, 89, 119, 75, 116, 51, 108, 70, 77, 99, 52, 115, 55, 112, 74, 120, 68, 73, 86, 103, 83, 88, 108, 106, 87, 73, 108, 79, 106, 118, 55, 74, 71, 114, 56, 99, 88, 102, 56, 83, 74, 79, 109, 112, 105, 121, 68, 48, 53, 80, 121, 84, 122, 106, 57, 85, 65, 84, 67, 70, 111, 49, 88, 113, 66, 112, 67, 113, 88, 82, 55, 75, 99, 106, 85, 89, 105, 110, 67, 73, 52, 107, 90, 89, 73, 60, 47, 108, 105, 62, 60, 47, 117, 108, 62, 32, 54, 76, 49, 103, 66, 54, 103, 48, 122, 60, 47, 98, 111, 100, 121, 62, 60, 47, 104, 116, 109, 108, 62]

Code that reproduces the error:

import AdvancedHTMLParser

parser = AdvancedHTMLParser.AdvancedHTMLParser()
parser.parseStr(string_input) # The same string_input as above in issue

Expected Result: Ignore invalid input or raise a specified exception (like MultipleRootNodeException)

Actual Result:

Traceback (most recent call last):
  File "C:\Users\AmEl\IdeaProjects\Joker2023\src\main\python\main.py", line 55, in main
    python_method(input_data)
  File "C:\Users\AmEl\IdeaProjects\Joker2023\venv\Lib\site-packages\AdvancedHTMLParser\Parser.py", line 980, in parseStr
    self.feed(html)
  File "C:\Users\AmEl\IdeaProjects\Joker2023\venv\Lib\site-packages\AdvancedHTMLParser\Parser.py", line 948, in feed
    HTMLParser.feed(self, contents)
  File "C:\Users\AmEl\AppData\Local\Programs\Python\Python312\Lib\html\parser.py", line 111, in feed
    self.goahead(0)
  File "C:\Users\AmEl\AppData\Local\Programs\Python\Python312\Lib\html\parser.py", line 171, in goahead
    k = self.parse_starttag(i)
        ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\AmEl\AppData\Local\Programs\Python\Python312\Lib\html\parser.py", line 338, in parse_starttag
    self.handle_starttag(tag, attrs)
  File "C:\Users\AmEl\IdeaProjects\Joker2023\venv\Lib\site-packages\AdvancedHTMLParser\Parser.py", line 138, in handle_starttag
    newTag = AdvancedTag(tagName, attributeList, isSelfClosing, ownerDocument=self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\AmEl\IdeaProjects\Joker2023\venv\Lib\site-packages\AdvancedHTMLParser\Tags.py", line 196, in __init__
    myAttributes[key] = value
    ~~~~~~~~~~~~^^^^^
  File "C:\Users\AmEl\IdeaProjects\Joker2023\venv\Lib\site-packages\AdvancedHTMLParser\SpecialAttributes.py", line 96, in __setitem__
    tag.style = StyleAttribute(value, tag)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\AmEl\IdeaProjects\Joker2023\venv\Lib\site-packages\AdvancedHTMLParser\SpecialAttributes.py", line 424, in __init__
    self._styleDict = StyleAttribute.styleToDict(styleValue)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\AmEl\IdeaProjects\Joker2023\venv\Lib\site-packages\AdvancedHTMLParser\SpecialAttributes.py", line 650, in styleToDict
    styleStr = styleStr.strip()
               ^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'strip'

Additional information:

  • OS: Windows 10, 22H2 (19045.4984)
  • Python version: Python 3.12.6
  • You can achieve this error on input like this: <s</style>

P.s. You can see the same info in reportAttributeError.txt

Amirelkanov avatar Oct 08 '24 19:10 Amirelkanov