leancrawler icon indicating copy to clipboard operation
leancrawler copied to clipboard

pyyaml does not support unicode characters like š•œ

Open pechersky opened this issue 3 years ago ā€¢ 3 comments

On master:

āÆ git checkout pechersky/polynomial-notation
āÆ leanproject get-c
āÆ leanproject mk-all
āÆ leancrawler all poly-notation.yaml
āÆ python
Python 3.7.3 (default, Mar 27 2019, 22:11:17) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from leancrawler import LeanLib, LeanDeclGraph
>>> data_branch = LeanLib.from_yaml("poly-notation", "poly-notation.yaml")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/yakov/code/leancrawler/src/leancrawler/crawler.py", line 182, in from_yaml
    for decl in safe_load(lean_output):
  File "/home/yakov/.cache/pypoetry/virtualenvs/leancrawler--ZxjUoya-py3.7/lib/python3.7/site-packages/yaml/__init__.py", line 94, in safe_load
    return load(stream, SafeLoader)
  File "/home/yakov/.cache/pypoetry/virtualenvs/leancrawler--ZxjUoya-py3.7/lib/python3.7/site-packages/yaml/__init__.py", line 70, in load
    loader = Loader(stream)
  File "/home/yakov/.cache/pypoetry/virtualenvs/leancrawler--ZxjUoya-py3.7/lib/python3.7/site-packages/yaml/loader.py", line 24, in __init__
    Reader.__init__(self, stream)
  File "/home/yakov/.cache/pypoetry/virtualenvs/leancrawler--ZxjUoya-py3.7/lib/python3.7/site-packages/yaml/reader.py", line 79, in __init__
    self.determine_encoding()
  File "/home/yakov/.cache/pypoetry/virtualenvs/leancrawler--ZxjUoya-py3.7/lib/python3.7/site-packages/yaml/reader.py", line 135, in determine_encoding
    self.update(1)
  File "/home/yakov/.cache/pypoetry/virtualenvs/leancrawler--ZxjUoya-py3.7/lib/python3.7/site-packages/yaml/reader.py", line 169, in update
    self.check_printable(data)
  File "/home/yakov/.cache/pypoetry/virtualenvs/leancrawler--ZxjUoya-py3.7/lib/python3.7/site-packages/yaml/reader.py", line 144, in check_printable
    'unicode', "special characters are not allowed")
yaml.reader.ReaderError: unacceptable character #x1d55c: special characters are not allowed
  in "<byte string>", position 26447
āÆ ugrep "\x{1D55C}" poly-notation.yaml | head -c100
  Type: "āˆ€ {š•œ : Type u_1} {E : Type u_2} {Ī² : Type u_4} [_inst_1 : ordered_semiring š•œ] [_in% 

pechersky avatar Feb 08 '22 07:02 pechersky

Thanks for this report. I'm sorry but I can't reproduce any of that. I couldn't find your git repository or mathlib branch. Python has no trouble loading yaml dicts involving unicode as in:

>>> from yaml import safe_load
>>> safe_load('1 : š•œ\nš•œ : 2')
{1: 'š•œ', 'š•œ': 2}

So I'll need a reproducible issue involving clearly located public code and version numbers for all involved libs.

PatrickMassot avatar Feb 12 '22 10:02 PatrickMassot

Sorry for the confusion. That branch got merged into mathlib master, maybe that is why it was not possible to find it.

Here is a more detailed report, ran in the main mathlib repo:

āÆ git rev-parse --short HEAD
af1355c45e
āÆ git rev-parse --short master
af1355c45e
āÆ pip list | grep mathlibtools
mathlibtools                       1.1.0
āÆ leanproject get-c
Looking for mathlib oleans for af1355c45e
  locally...
  Found local mathlib oleans
Using matching cache
Applying cache
  files extracted: 2153 [00:04, 444.17/s]
āÆ pip list | grep leancrawler
leancrawler          0.0.3
āÆ pip list | grep yaml -i
PyYAML               3.13
types-PyYAML         6.0.4
āÆ leancrawler category_theory.category.basic master.yaml
āÆ head crawl.lean -n1
import category_theory.category.basic
āÆ lean -v
Lean (version 3.39.1, commit 1781ded0d006, Release)
āÆ lean --run crawl.lean
āÆ wc -l master.yaml
182940 master.yaml
āÆ python
Python 3.7.3 (default, Mar 27 2019, 22:11:17) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from leancrawler import LeanLib, LeanDeclGraph
>>> data_branch = LeanLib.from_yaml("master", "master.yaml")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/yakov/code/leancrawler/src/leancrawler/crawler.py", line 160, in from_yaml
    for decl in safe_load(lean_output):
  File "/home/yakov/.cache/pypoetry/virtualenvs/leancrawler--ZxjUoya-py3.7/lib/python3.7/site-packages/yaml/__init__.py", line 94, in safe_load
    return load(stream, SafeLoader)
  File "/home/yakov/.cache/pypoetry/virtualenvs/leancrawler--ZxjUoya-py3.7/lib/python3.7/site-packages/yaml/__init__.py", line 70, in load
    loader = Loader(stream)
  File "/home/yakov/.cache/pypoetry/virtualenvs/leancrawler--ZxjUoya-py3.7/lib/python3.7/site-packages/yaml/loader.py", line 24, in __init__
    Reader.__init__(self, stream)
  File "/home/yakov/.cache/pypoetry/virtualenvs/leancrawler--ZxjUoya-py3.7/lib/python3.7/site-packages/yaml/reader.py", line 79, in __init__
    self.determine_encoding()
  File "/home/yakov/.cache/pypoetry/virtualenvs/leancrawler--ZxjUoya-py3.7/lib/python3.7/site-packages/yaml/reader.py", line 135, in determine_encoding
    self.update(1)
  File "/home/yakov/.cache/pypoetry/virtualenvs/leancrawler--ZxjUoya-py3.7/lib/python3.7/site-packages/yaml/reader.py", line 169, in update
    self.check_printable(data)
  File "/home/yakov/.cache/pypoetry/virtualenvs/leancrawler--ZxjUoya-py3.7/lib/python3.7/site-packages/yaml/reader.py", line 144, in check_printable
    'unicode', "special characters are not allowed")
yaml.reader.ReaderError: unacceptable character #x1d7d9: special characters are not allowed
  in "<byte string>", position 368774

So I think that is now erroring on bold ones from category theory:

āÆ ugrep "\x{1D7D9}" src/category_theory/category/basic.lean | grep lemma
lemma id_of_comp_left_id (f : X āŸ¶ X) (w : āˆ€ {Y : C} (g : X āŸ¶ Y), f ā‰« g = g) : f = šŸ™ X :=
lemma id_of_comp_right_id (f : X āŸ¶ X) (w : āˆ€ {Y : C} (g : Y āŸ¶ X), g ā‰« f = g) : f = šŸ™ X :=
lemma cancel_epi_id (f : X āŸ¶ Y) [epi f] {h : Y āŸ¶ Y} : (f ā‰« h = f) ā†” h = šŸ™ Y :=
lemma cancel_mono_id (f : X āŸ¶ Y) [mono f] {g : X āŸ¶ X} : (g ā‰« f = f) ā†” g = šŸ™ X :=

pechersky avatar Feb 13 '22 00:02 pechersky

Here I get

pip list | grep yaml -i
PyYAML                        5.3.1
types-PyYAML                  5.4.10

which seems like a fairly large difference with your setup. Could you please try to update your PyYAML? If this fix the issue then we'll have to bump the requirement.

PatrickMassot avatar Feb 13 '22 13:02 PatrickMassot