omegaconf
omegaconf copied to clipboard
String to float conversion edge case
Hello! Hypothesis found an odd test case during one of hydra-zen's nightly jobs.
The yaml field-value '0_e0'
gets converted to a float:
>>> instantiate(OmegaConf.create('x: 0_e0'))
{'x': 0.0}
however this string is not actually a valid representation of 0.0
:
>>> float('0_e0')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-16-12374ef4f47c> in <module>
----> 1 float('0_e0')
I do not think that the underscore can precede e
directly.
Actually, this might be more appropriate for Hydra, since it involves instantiation. If so, I will be happy to move it.
I think that the culprit is this regex in OmegaConf's custom yaml loader :)
Yup! That sure looks like it!
tl;dr we should probably restrict underscore handling to less cases. At least we should disallow trailing underscores (as is the case in this bug).
That would match most underscore allowing languages including python. See https://peps.python.org/pep-0515/
I see yaml 1.2 doesn't even specify _
as allowed in numbers, as the 1.1 spec was too loosely defined wrt _
See: https://github.com/nodeca/js-yaml/issues/627
I see our existing RE probably originates in ruamel. See https://stackoverflow.com/q/30458977/4421 and commit a50bdeea
It's interesting that yaml being a superset of json, diverges from json here ?
I also see some inconsistencies between the antlr (interpolation) and basic loader. pr #995 does not address that, but at least documents it.