WiktionaryParser
WiktionaryParser copied to clipboard
Fix sub-definitions not being listed
Previously, a word that had a layered definition, such as:
would return the following definition:
['grapple (countable and uncountable, plural grapples)',
'A tool with claws or hooks which is used to catch or hold something.',
'A close hand-to-hand struggle.',
'(uncountable) The act of grappling.']
So the sub-definitions of A tool with claws or hooks which is used to catch or hold something. weren't listed.
This PR makes it so the returned definition becomes:
['grapple (countable and uncountable, plural grapples)',
['A tool with claws or hooks which is used to catch or hold something.',
'(nautical) A device consisting of iron claws, attached to the end of a rope, used for grasping and holding an enemy ship prior to boarding; a grappling iron.',
'(nautical) A grapnel (“type of anchor”).'],
'A close hand-to-hand struggle.',
'(uncountable) The act of grappling.']
thus fixing #57.
However, this structure (using arrays to depict a layered definition, where the first element represents the top-level definition) may not be the most ideal one. Not sure.
This causes two of the tests to fail (grapple and house), as they both have such sub-definitions that were previously ignored.
Thanks for this! I'll take a look. Can you fix the failing tests in the meanwhile?
Done. Everything should be fine now. However, I'll test some more words on my own, to make sure I didn't miss any edge cases.
Thanks so much for this! I just tested it out with my script that uses WiktionaryParser and (with slight modification) it works great. Hope it gets merged soon! 😁
@Luca1152 @katacarbix while this change works, I'm not sure the definitions list should contain different data types. For your use cases, is it a problem if the sub-definitions are included in the list as top level definitions?
@katacarbix what are the modifications you've had to make? Was wondering if the parser could be enhanced to include them
Thanks so much for this! I just tested it out with my script that uses WiktionaryParser and (with slight modification) it works great. Hope it gets merged soon! 😁
I added a section to loop through sub-definitions and output them indented. I also put back the colons that I was removing before for formatting.
Old:
word = parser.fetch(query)
entries = []
for section in word:
for defn in section['definitions']:
for entry in defn['text']:
if entry[:len(query)] != query:
entries.append('('+defn['partOfSpeech']+') ' + entry.rstrip(':'))
New:
word = parser.fetch(query)
entries = []
for section in word:
for defn in section['definitions']:
for entry in defn['text']:
if type(entry) is list:
entries.append('('+defn['partOfSpeech'])+') ' + entry[0])
for subentry in entry[1:]:
entries.append(' ' + subentry)
elif entry[:len(query)] != query:
entries.append('('+defn['partOfSpeech'])+') ' + entry)
@Suyash458 This PR fixes a very serious bug. Is this ever going to get merged and is pip ever going to get a new release?