wikitextparser Infinite loop with CR (\x09) in table parsing

Infinite loop with CR (\x09) in table parsing

Open TrueBrain opened this issue 2 years ago • 0 comments

In one of my test-sets I forgot to sanitize the input, and in result I had a \r (without any \n). This caused a funny effect I thought you might want to know.

import wikitextparser
wikitextparser.parse("{|\n}\n\r").get_tables()[0].data()

This causes an infinite loop. Similar, if you replace \r with \x0b or \x0c, but that is even more nonsense ofc.

https://github.com/5j9/wikitextparser/blob/f64e098b0ba040595f6fc427edf6409308761bd0/wikitextparser/_table.py#L94 returns -1, after which _lstrip_increase increases that back to 0, and it repeats.

Personally, I think this is not a bug in your library, as a string ending on a \r is just weird. But I didn't want to keep this finding from you either, just in case I am missing something else here :)

I also found a possibly related issue. For example:

import wikitextparser
wikitextparser.parse("{|\n}\n").get_tables()[0].data()

triggers:

IndexError: bytearray index out of range on line 93 of _table.py.. Sadly, this is text users in TrueWiki have been entering, but I can capture that error on my side. Just mentioning it, as there might be something else going on here actually :)

As always, tnx for the awesome library! :D

Jun 26 '22 20:06 TrueBrain

wikitextparser wikitextparser copied to clipboard

Infinite loop with CR (\x09) in table parsing

wikitextparser
wikitextparser copied to clipboard