omegaconf icon indicating copy to clipboard operation
omegaconf copied to clipboard

String to float conversion edge case

Open rsokl opened this issue 2 years ago • 3 comments

Hello! Hypothesis found an odd test case during one of hydra-zen's nightly jobs.

The yaml field-value '0_e0' gets converted to a float:

>>> instantiate(OmegaConf.create('x: 0_e0'))
{'x': 0.0}

however this string is not actually a valid representation of 0.0:

>>> float('0_e0')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-12374ef4f47c> in <module>
----> 1 float('0_e0')

I do not think that the underscore can precede e directly.

rsokl avatar Dec 06 '21 02:12 rsokl

Actually, this might be more appropriate for Hydra, since it involves instantiation. If so, I will be happy to move it.

rsokl avatar Dec 06 '21 02:12 rsokl

I think that the culprit is this regex in OmegaConf's custom yaml loader :)

Jasha10 avatar Dec 06 '21 15:12 Jasha10

Yup! That sure looks like it!

rsokl avatar Dec 06 '21 17:12 rsokl

tl;dr we should probably restrict underscore handling to less cases. At least we should disallow trailing underscores (as is the case in this bug).

That would match most underscore allowing languages including python. See https://peps.python.org/pep-0515/

I see yaml 1.2 doesn't even specify _ as allowed in numbers, as the 1.1 spec was too loosely defined wrt _ See: https://github.com/nodeca/js-yaml/issues/627

I see our existing RE probably originates in ruamel. See https://stackoverflow.com/q/30458977/4421 and commit a50bdeea

It's interesting that yaml being a superset of json, diverges from json here ?

pixelb avatar Aug 09 '22 16:08 pixelb

I also see some inconsistencies between the antlr (interpolation) and basic loader. pr #995 does not address that, but at least documents it.

pixelb avatar Aug 10 '22 12:08 pixelb