WiktionaryParser
WiktionaryParser copied to clipboard
returns nothing for Thai
>> from wiktionaryparser import WiktionaryParser
>> parser = WiktionaryParser()
>> word = parser.fetch('ฉลาด')
>> word
[]
The page is clearly there on the website: https://en.wiktionary.org/wiki/%E0%B8%89%E0%B8%A5%E0%B8%B2%E0%B8%94. I'm trying to scrape the pronunciations.
The language is english by default.
parser.fetch('ฉลาด', language='thai')
Ah, that gets it. The info returned is not quite right, though:
[
{
'etymology': 'From Khmer ឆ្លាត (chlaat, “clever”). Compare Lao ສະຫລາດ (sa lāt).\n', 'definitions': [
{
'partOfSpeech': 'adjective',
'text': ['ฉลาด • (chà-làat) (abstract noun ความฉลาด)', 'clever; smart; intelligent.'], 'relatedWords': [],
'examples': []
}
],
'pronunciations': {
'text': ['From Khmer ឆ្លាត (chlaat, “clever”). Compare Lao ສະຫລາດ (sa lāt).\n'],
'audio': []
}
},
{
'etymology': '',
'definitions': [
{
'partOfSpeech': 'noun',
'text': ['ฉลาด • (chà-làat)', 'Alternative form of สลาด (slàat)'],
'relatedWords': [],
'examples': []
}
],
'pronunciations': {
'text': ['From Khmer ឆ្លាត (chlaat, “clever”). Compare Lao ສະຫລາດ (sa lāt).\n'],
'audio': []
}
}
]
The etymology is in the pronunciation text, and the pronunciation is missing altogether.
Yeah well, the format of the pronunciations is different from most of the other words. I'm still working on it