WiktionaryParser icon indicating copy to clipboard operation
WiktionaryParser copied to clipboard

Fix sub-definitions not being listed

Open imlutr opened this issue 5 years ago • 6 comments

Previously, a word that had a layered definition, such as:

Screenshot 2020-09-09 at 19 18 18

would return the following definition:

['grapple (countable and uncountable, plural grapples)',
 'A tool with claws or hooks which is used to catch or hold something.',
 'A close hand-to-hand struggle.',
 '(uncountable) The act of grappling.']

So the sub-definitions of A tool with claws or hooks which is used to catch or hold something. weren't listed.

This PR makes it so the returned definition becomes:

['grapple (countable and uncountable, plural grapples)',
 ['A tool with claws or hooks which is used to catch or hold something.',
  '(nautical) A device consisting of iron claws, attached to the end of a rope, used for grasping and holding an enemy ship prior to boarding; a grappling iron.',
  '(nautical) A grapnel (“type of anchor”).'],
 'A close hand-to-hand struggle.',
 '(uncountable) The act of grappling.']

thus fixing #57.

However, this structure (using arrays to depict a layered definition, where the first element represents the top-level definition) may not be the most ideal one. Not sure.

This causes two of the tests to fail (grapple and house), as they both have such sub-definitions that were previously ignored.

imlutr avatar Sep 09 '20 16:09 imlutr

Thanks for this! I'll take a look. Can you fix the failing tests in the meanwhile?

suyashb95 avatar Sep 09 '20 17:09 suyashb95

Done. Everything should be fine now. However, I'll test some more words on my own, to make sure I didn't miss any edge cases.

imlutr avatar Sep 11 '20 13:09 imlutr

Thanks so much for this! I just tested it out with my script that uses WiktionaryParser and (with slight modification) it works great. Hope it gets merged soon! 😁

reeseovine avatar Sep 21 '20 04:09 reeseovine

@Luca1152 @katacarbix while this change works, I'm not sure the definitions list should contain different data types. For your use cases, is it a problem if the sub-definitions are included in the list as top level definitions?

@katacarbix what are the modifications you've had to make? Was wondering if the parser could be enhanced to include them

Thanks so much for this! I just tested it out with my script that uses WiktionaryParser and (with slight modification) it works great. Hope it gets merged soon! 😁

suyashb95 avatar Sep 23 '20 04:09 suyashb95

I added a section to loop through sub-definitions and output them indented. I also put back the colons that I was removing before for formatting.

Old:

word = parser.fetch(query)
entries = []
for section in word:
    for defn in section['definitions']:
        for entry in defn['text']:
            if entry[:len(query)] != query:
                entries.append('('+defn['partOfSpeech']+') ' + entry.rstrip(':'))

New:

word = parser.fetch(query)
entries = []
for section in word:
    for defn in section['definitions']:
        for entry in defn['text']:
            if type(entry) is list:
                entries.append('('+defn['partOfSpeech'])+') ' + entry[0])
                for subentry in entry[1:]:
                    entries.append('    ' + subentry)
            elif entry[:len(query)] != query:
                entries.append('('+defn['partOfSpeech'])+') '  + entry)

reeseovine avatar Sep 23 '20 05:09 reeseovine

@Suyash458 This PR fixes a very serious bug. Is this ever going to get merged and is pip ever going to get a new release?

pragma- avatar Jul 11 '21 18:07 pragma-