pyglossary icon indicating copy to clipboard operation
pyglossary copied to clipboard

Wiktextract : Japanese .jsonl to .index error with japanese part of english wiktionary

Open franzmondlichtmann opened this issue 7 months ago • 9 comments

OS: Newest EndeavourOS updates (arch linux with calamares installer) Python-Setup: Micromamba with python 3.10 Shell: Fish shell

pyglossary was installed with pip. I did take the wiktionary .jsonl files from the kaikki.org site. It worked for the spanish part of the english wiktionary, but when I try it with the japanese part I get an error:

laptop02@laptop02-pc ~/Downloads> pyglossary kaikki.org-dictionary-Japanese.jsonl kaikki.org-dictionary-Japanese.index                                                  (py3) 
[INFO] Writing to DictOrg file '/home/laptop02/Downloads/kaikki.org-dictionary-Japanese.index'
[ERROR] Exception while calling plugin's write function                                                                                                                       
Traceback (most recent call last):
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/glossary_v2.py", line 908, in _write
    self._writeEntries(writerList, filename)
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/glossary_v2.py", line 842, in _writeEntries
    for entry in self:
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/glossary_v2.py", line 393, in _readersEntryGen
    yield from self._applyEntryFiltersGen(reader)
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/glossary_v2.py", line 407, in _applyEntryFiltersGen
    for entry in gen:
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/plugins/wiktextract.py", line 156, in __iter__
    yield self.makeEntry(json_loads(line))
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/plugins/wiktextract.py", line 208, in makeEntry
    self.writeSenseList(_hf, data.get("senses"))  # type: ignore
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/plugins/wiktextract.py", line 313, in writeSenseList
    self.makeList(
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/plugins/wiktextract.py", line 653, in makeList
    processor(hf, el)
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/plugins/wiktextract.py", line 612, in writeSense
    self.writeSenseExamples(hf, sense.get("examples"))
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/plugins/wiktextract.py", line 392, in writeSenseExamples
    self.writeSenseExample(hf, example)
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/plugins/wiktextract.py", line 369, in writeSenseExample
    hf.write(text)
  File "src/lxml/serializer.pxi", line 1660, in lxml.etree._IncrementalFileWriter.write
TypeError: got invalid input value of type <class 'list'>, expected string or Element
Traceback (most recent call last):
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/glossary_v2.py", line 908, in _write
    self._writeEntries(writerList, filename)
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/glossary_v2.py", line 842, in _writeEntries
    for entry in self:
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/glossary_v2.py", line 393, in _readersEntryGen
    yield from self._applyEntryFiltersGen(reader)
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/glossary_v2.py", line 407, in _applyEntryFiltersGen
    for entry in gen:
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/plugins/wiktextract.py", line 156, in __iter__
    yield self.makeEntry(json_loads(line))
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/plugins/wiktextract.py", line 208, in makeEntry
    self.writeSenseList(_hf, data.get("senses"))  # type: ignore
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/plugins/wiktextract.py", line 313, in writeSenseList
    self.makeList(
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/plugins/wiktextract.py", line 653, in makeList
    processor(hf, el)
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/plugins/wiktextract.py", line 612, in writeSense
    self.writeSenseExamples(hf, sense.get("examples"))
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/plugins/wiktextract.py", line 392, in writeSenseExamples
    self.writeSenseExample(hf, example)
  File "/home/laptop02/fish/envs/py3.10/lib/python3.10/site-packages/pyglossary/plugins/wiktextract.py", line 369, in writeSenseExample
    hf.write(text)
  File "src/lxml/serializer.pxi", line 1660, in lxml.etree._IncrementalFileWriter.write
TypeError: got invalid input value of type <class 'list'>, expected string or Element

franzmondlichtmann avatar Jul 07 '24 21:07 franzmondlichtmann