leancrawler
leancrawler copied to clipboard
pyyaml does not support unicode characters like š
On master
:
āÆ git checkout pechersky/polynomial-notation
āÆ leanproject get-c
āÆ leanproject mk-all
āÆ leancrawler all poly-notation.yaml
āÆ python
Python 3.7.3 (default, Mar 27 2019, 22:11:17)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from leancrawler import LeanLib, LeanDeclGraph
>>> data_branch = LeanLib.from_yaml("poly-notation", "poly-notation.yaml")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/yakov/code/leancrawler/src/leancrawler/crawler.py", line 182, in from_yaml
for decl in safe_load(lean_output):
File "/home/yakov/.cache/pypoetry/virtualenvs/leancrawler--ZxjUoya-py3.7/lib/python3.7/site-packages/yaml/__init__.py", line 94, in safe_load
return load(stream, SafeLoader)
File "/home/yakov/.cache/pypoetry/virtualenvs/leancrawler--ZxjUoya-py3.7/lib/python3.7/site-packages/yaml/__init__.py", line 70, in load
loader = Loader(stream)
File "/home/yakov/.cache/pypoetry/virtualenvs/leancrawler--ZxjUoya-py3.7/lib/python3.7/site-packages/yaml/loader.py", line 24, in __init__
Reader.__init__(self, stream)
File "/home/yakov/.cache/pypoetry/virtualenvs/leancrawler--ZxjUoya-py3.7/lib/python3.7/site-packages/yaml/reader.py", line 79, in __init__
self.determine_encoding()
File "/home/yakov/.cache/pypoetry/virtualenvs/leancrawler--ZxjUoya-py3.7/lib/python3.7/site-packages/yaml/reader.py", line 135, in determine_encoding
self.update(1)
File "/home/yakov/.cache/pypoetry/virtualenvs/leancrawler--ZxjUoya-py3.7/lib/python3.7/site-packages/yaml/reader.py", line 169, in update
self.check_printable(data)
File "/home/yakov/.cache/pypoetry/virtualenvs/leancrawler--ZxjUoya-py3.7/lib/python3.7/site-packages/yaml/reader.py", line 144, in check_printable
'unicode', "special characters are not allowed")
yaml.reader.ReaderError: unacceptable character #x1d55c: special characters are not allowed
in "<byte string>", position 26447
āÆ ugrep "\x{1D55C}" poly-notation.yaml | head -c100
Type: "ā {š : Type u_1} {E : Type u_2} {Ī² : Type u_4} [_inst_1 : ordered_semiring š] [_in%
Thanks for this report. I'm sorry but I can't reproduce any of that. I couldn't find your git repository or mathlib branch. Python has no trouble loading yaml dicts involving unicode as in:
>>> from yaml import safe_load
>>> safe_load('1 : š\nš : 2')
{1: 'š', 'š': 2}
So I'll need a reproducible issue involving clearly located public code and version numbers for all involved libs.
Sorry for the confusion. That branch got merged into mathlib master, maybe that is why it was not possible to find it.
Here is a more detailed report, ran in the main mathlib
repo:
āÆ git rev-parse --short HEAD
af1355c45e
āÆ git rev-parse --short master
af1355c45e
āÆ pip list | grep mathlibtools
mathlibtools 1.1.0
āÆ leanproject get-c
Looking for mathlib oleans for af1355c45e
locally...
Found local mathlib oleans
Using matching cache
Applying cache
files extracted: 2153 [00:04, 444.17/s]
āÆ pip list | grep leancrawler
leancrawler 0.0.3
āÆ pip list | grep yaml -i
PyYAML 3.13
types-PyYAML 6.0.4
āÆ leancrawler category_theory.category.basic master.yaml
āÆ head crawl.lean -n1
import category_theory.category.basic
āÆ lean -v
Lean (version 3.39.1, commit 1781ded0d006, Release)
āÆ lean --run crawl.lean
āÆ wc -l master.yaml
182940 master.yaml
āÆ python
Python 3.7.3 (default, Mar 27 2019, 22:11:17)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from leancrawler import LeanLib, LeanDeclGraph
>>> data_branch = LeanLib.from_yaml("master", "master.yaml")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/yakov/code/leancrawler/src/leancrawler/crawler.py", line 160, in from_yaml
for decl in safe_load(lean_output):
File "/home/yakov/.cache/pypoetry/virtualenvs/leancrawler--ZxjUoya-py3.7/lib/python3.7/site-packages/yaml/__init__.py", line 94, in safe_load
return load(stream, SafeLoader)
File "/home/yakov/.cache/pypoetry/virtualenvs/leancrawler--ZxjUoya-py3.7/lib/python3.7/site-packages/yaml/__init__.py", line 70, in load
loader = Loader(stream)
File "/home/yakov/.cache/pypoetry/virtualenvs/leancrawler--ZxjUoya-py3.7/lib/python3.7/site-packages/yaml/loader.py", line 24, in __init__
Reader.__init__(self, stream)
File "/home/yakov/.cache/pypoetry/virtualenvs/leancrawler--ZxjUoya-py3.7/lib/python3.7/site-packages/yaml/reader.py", line 79, in __init__
self.determine_encoding()
File "/home/yakov/.cache/pypoetry/virtualenvs/leancrawler--ZxjUoya-py3.7/lib/python3.7/site-packages/yaml/reader.py", line 135, in determine_encoding
self.update(1)
File "/home/yakov/.cache/pypoetry/virtualenvs/leancrawler--ZxjUoya-py3.7/lib/python3.7/site-packages/yaml/reader.py", line 169, in update
self.check_printable(data)
File "/home/yakov/.cache/pypoetry/virtualenvs/leancrawler--ZxjUoya-py3.7/lib/python3.7/site-packages/yaml/reader.py", line 144, in check_printable
'unicode', "special characters are not allowed")
yaml.reader.ReaderError: unacceptable character #x1d7d9: special characters are not allowed
in "<byte string>", position 368774
So I think that is now erroring on bold ones from category theory:
āÆ ugrep "\x{1D7D9}" src/category_theory/category/basic.lean | grep lemma
lemma id_of_comp_left_id (f : X ā¶ X) (w : ā {Y : C} (g : X ā¶ Y), f ā« g = g) : f = š X :=
lemma id_of_comp_right_id (f : X ā¶ X) (w : ā {Y : C} (g : Y ā¶ X), g ā« f = g) : f = š X :=
lemma cancel_epi_id (f : X ā¶ Y) [epi f] {h : Y ā¶ Y} : (f ā« h = f) ā h = š Y :=
lemma cancel_mono_id (f : X ā¶ Y) [mono f] {g : X ā¶ X} : (g ā« f = f) ā g = š X :=
Here I get
pip list | grep yaml -i
PyYAML 5.3.1
types-PyYAML 5.4.10
which seems like a fairly large difference with your setup. Could you please try to update your PyYAML? If this fix the issue then we'll have to bump the requirement.