kristian-clausal
kristian-clausal
Many of those are just that they contain words that are not in nltk.corpus.brown; words like "cellphone", "mousetrap", "He’s" (with a unicode apostrophe or other character), "dumbfuck", "peppermint"... Hrm, many...
I was thinking of that, yeah. We can check to see if the arguments map on to the template expanded output and exit early if they conform to the formatting...
EDIT: This is a post that was left unwritten earlier today, posting it here just for completeness. "cause of death" slipped through because decode_tags classified if as tags. "of tags"...
https://github.com/tatuylonen/wikitextprocessor/commit/3952eda54ffbeff3b7c754424c48aa82dfc436cb This is a partial fix. If you have something like: ``` # Line 1, ref 1... line 1 continues # Line 2... ``` The `` doesn't break the list...
At the moment, this is a bit too difficult to fix completely. The issue is HTML tags are too 'free'. HTML tags (or HTML-like tag entitites like nowiki or ref...
We're not using the HTML dump, so it should be fine.
Talk with Tatu about this before you start working on it.
If you find something that seems to be similar to this issue, highlight it by starting a new thread; closing this as complete (mostly) for now.
I've *finally* got a working remixed implementation of the sense gloss list parsing code that is purely recursive. Because it is recursive, it can go arbitrarily deep in a gloss...
Had a bug (!= should have been ==) which broke *dress* specifically, because I hadn't tested *that* word because I got stuck on other words, like break... But should be...