WiktionaryParser icon indicating copy to clipboard operation
WiktionaryParser copied to clipboard

returns nothing for Thai

Open garfieldnate opened this issue 7 years ago • 3 comments

>> from wiktionaryparser import WiktionaryParser
>> parser = WiktionaryParser()
>> word = parser.fetch('ฉลาด')
>> word
[]

The page is clearly there on the website: https://en.wiktionary.org/wiki/%E0%B8%89%E0%B8%A5%E0%B8%B2%E0%B8%94. I'm trying to scrape the pronunciations.

garfieldnate avatar Sep 23 '18 11:09 garfieldnate

The language is english by default.

parser.fetch('ฉลาด', language='thai')

Surkal avatar Sep 28 '18 23:09 Surkal

Ah, that gets it. The info returned is not quite right, though:

[
    {
        'etymology': 'From Khmer ឆ្លាត (chlaat, “clever”). Compare Lao ສະຫລາດ (sa lāt).\n', 'definitions': [
            {
                'partOfSpeech': 'adjective', 
                'text': ['ฉลาด • (chà-làat) (abstract noun ความฉลาด)', 'clever; smart; intelligent.'], 'relatedWords': [], 
                'examples': []
            }
        ], 
        'pronunciations': {
            'text': ['From Khmer ឆ្លាត (chlaat, “clever”). Compare Lao ສະຫລາດ (sa lāt).\n'], 
            'audio': []
        }
    }, 
    {
        'etymology': '', 
        'definitions': [
            {
                'partOfSpeech': 'noun', 
                'text': ['ฉลาด • (chà-làat)', 'Alternative form of สลาด (slàat)'], 
                'relatedWords': [], 
                'examples': []
            }
        ], 
        'pronunciations': {
            'text': ['From Khmer ឆ្លាត (chlaat, “clever”). Compare Lao ສະຫລາດ (sa lāt).\n'], 
            'audio': []
        }
    }
]

The etymology is in the pronunciation text, and the pronunciation is missing altogether.

garfieldnate avatar Sep 29 '18 06:09 garfieldnate

Yeah well, the format of the pronunciations is different from most of the other words. I'm still working on it

suyashb95 avatar Sep 29 '18 07:09 suyashb95