ion-python icon indicating copy to clipboard operation
ion-python copied to clipboard

Support trailing commas in `ion.dump[s]()`

Open mavjop opened this issue 1 year ago • 14 comments

I have code that I want to read in ion data from a file, make a modification, and then write it back out to a file. The file will be source controlled in a version control system.

I want to include trailing commas, e.g.:

{
    foo: "bar",
    baz: {
        quux: [
            "thing1",
            "thing2",
            "thing3",
        ],
    },
}

That is, a comma after the last item in a list, a comma after the last item in a map, etc.

Ion supports this, which is a huge plus in my book (kudos, Ion!).

This results in diffs that are minimized (if you add or remove a line, there aren't additional lines of diff which only contain addition or removal of a comma), version conflicts are cleaner, and it reduces the likelihood of errors when humans may also edit the files.

I don't think there is any support for asking dump or dumps to include trailing commas (I spent a little time looking for one), so I would love if we could add one. I'm most interested in ion.dumps(), but a flag could be added to any other methods that make sense (dump()? Others?).

- Stephen

P.S. For more on why trailing commas are a good idea, please see On the virtues of the trailing comma.

mavjop avatar Sep 24 '24 00:09 mavjop

Some notes:

  • ~~We don't have any formatting directives for ion text writer today. Even "pretty printing" (which is what I assume you want), with newlines AFAIK.~~
  • The pure-python implementation of this would be almost trivial, but we'd need to plumb the flag into the ioncmodule.c so that it worked with the c-extension as well.

rmarrowstone avatar Sep 24 '24 15:09 rmarrowstone

It ... does do pretty printing for me? I'm calling it as:

    ion.dumps(content, binary=False, omit_version_marker=True, indent=u'   ')

I assume the indent=u' ' directs it to do multi-line pretty printing, with a (in our case) three space indent.

I was thinking it could take another optional argument such as trailing_commas=True, defaulted to False for backwards compatibility. I saw that all of the library seemed to be written in C. It's been a few years since I've used C in anger, but I did consider having a look at how one might add code. :)

Is there a concern about maintaining feature parity with ion libraries for other languages?

I agree that it would be relatively easy to implement in pure Python, but potentially a little more tricky in C, though perhaps not horrendously so.

mavjop avatar Sep 25 '24 04:09 mavjop

oh. you are correct. corrected comment.

I was thinking it could take another optional argument such as trailing_commas=True, defaulted to False for backwards compatibility.

That sounds good to me.

Is there a concern about maintaining feature parity with ion libraries for other languages?

I'd say no. But I will check with the other maintainers.

rmarrowstone avatar Sep 25 '24 15:09 rmarrowstone

Is there a concern about maintaining feature parity with ion libraries for other languages?

I'd say no. But I will check with the other maintainers.

API feature parity with other libraries is not a concern. The only real cross-library "parity" concern is that they can all read data written by other implementations. Since trailing commas are allowed by the spec, if any libraries cannot read data with trailing commas, it is a defect in that library and should not block the proposed change here.

TLDR; No concerns—do it!

popematt avatar Sep 25 '24 17:09 popematt

I discovered that there is already not feature parity between pure Python and the ion C module. Pretty printing (which trailing commas only makes sense in the context of) is already unsupported. So I'm going to publish a PR without C module support.

I've been trying for some time to run the tests and running into errors no matter how I try. Is there a way to run the tests as they are? With Python 3.12, my default Python, it doesn't find distutils, which I guess is gone from Python 3.12, and when I use pyenv to select 3.8.10 and 3.9.5 like is called for in the ion-python README, when I run tox I get:

...
TypeError: canonicalize_version() got an unexpected keyword argument 'strip_trailing_zero'

and when I run python setup.py test I get:

...
    import setuptools.command.test as orig
ModuleNotFoundError: No module named 'setuptools.command.test'

mavjop avatar Oct 09 '24 23:10 mavjop

New PR: https://github.com/amazon-ion/ion-python/pull/372

mavjop avatar Oct 10 '24 00:10 mavjop

I discovered that there is already not feature parity between pure Python and the ion C module. Pretty printing (which trailing commas only makes sense in the context of) is already unsupported. So I'm going to publish a PR without C module support.

I've been trying for some time to run the tests and running into errors no matter how I try. Is there a way to run the tests as they are? With Python 3.12, my default Python, it doesn't find distutils, which I guess is gone from Python 3.12, and when I use pyenv to select 3.8.10 and 3.9.5 like is called for in the ion-python README, when I run tox I get:

...
TypeError: canonicalize_version() got an unexpected keyword argument 'strip_trailing_zero'

and when I run python setup.py test I get:

...
    import setuptools.command.test as orig
ModuleNotFoundError: No module named 'setuptools.command.test'

The above errors were solved by:

python -m pip install 'setuptools<71.0.0'

setuptools.command.test was apparently removed in setuptools 72, but 71 also had the canonicalize_version() got an unexpected keyword argument 'strip_trailing_zero' error, so I went back to before 71.0.0 (70.3.0) which did not have the problem.

mavjop avatar Oct 10 '24 17:10 mavjop

Sorry for your troubles, we need to prioritize getting off of our legacy build setup: https://github.com/amazon-ion/ion-python/issues/305

rmarrowstone avatar Oct 10 '24 17:10 rmarrowstone

No worries, Rob. :)

mavjop avatar Oct 10 '24 18:10 mavjop

There are 3 unit tests still failing, with all of my changes in #372 ... but they appear to be entirely unrelated to my changes. I'm wondering if a Python / library version difference changed the limit for string parsing of integer values.

The 3 test failures are all because of extremely long integer strings in good/subfieldVarInt.ion.

I've elided some of the enormous integer string in this:

_ test_roundtrips[GOOD_ROUNDTRIP - good/subfieldVarInt_ion (binary) Decimal('0_012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789...012345678')] _

p = <tests.test_vectors._Parameter object at 0x116772bf0>

    @parametrize(*chain(
        _comparison_params(_T.GOOD_ROUNDTRIP, _GOOD_SUBDIR),
        _comparison_params(_T.GOOD_ROUNDTRIP, _GOOD_TIMESTAMP_SUBDIR),
        _comparison_params(_T.GOOD_EQUIVS_TIMESTAMP_INSTANTS_ROUNDTRIP, _EQUIVS_TIMELINE_SUBDIR),
        _comparison_params(_T.GOOD_EQUIVS_ROUNDTRIP, _EQUIVS_SUBDIR),
        _comparison_params(_T.GOOD_EQUIVS_ROUNDTRIP, _EQUIVS_UTF8_SUBDIR),
        _comparison_params(_T.GOOD_NONEQUIVS_ROUNDTRIP, _NONEQUIVS_SUBDIR),
    ))
    def test_roundtrips(p):
        """Tests roundtrips of good files for equality. Pre-parsing the streams is necessary to generate the comparison
        pairs, so there will necessarily be failures in parameter generation for this test if any good files fail to parse.
        """
>       p.test_thunk()

tests/test_vectors.py:401:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/test_vectors.py:343: in roundtrip
    assert ion_equals(value, _roundtrip(value, is_binary))
tests/test_vectors.py:337: in _roundtrip
    roundtripped = load(out)
amazon/ion/simpleion.py:274: in load
    return load_python(fp, catalog=catalog, single_value=single_value, parse_eagerly=parse_eagerly)
amazon/ion/simpleion.py:405: in load_python
    _load(out, reader)
amazon/ion/simpleion.py:472: in _load
    if event.value is None or ion_type is IonType.NULL or event.ion_type.is_container:
amazon/ion/core.py:286: in value
    self.cached_value = self[2]()
amazon/ion/reader_managed.py:111: in value_thunk
    value = ion_event.value
amazon/ion/core.py:286: in value
    self.cached_value = self[2]()
amazon/ion/reader_binary.py:532: in parse_decimal
    return _parse_decimal(BytesIO(data))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

buf = <_io.BytesIO object at 0x11b89c720>

    def _parse_decimal(buf):
        """Parses the remainder of a file-like object as a decimal."""
        exponent = _parse_var_int(buf, signed=True)
        sign_bit, coefficient = _parse_signed_int_components(buf)

        if coefficient == 0:
            # Handle the zero cases--especially negative zero
            value = Decimal((sign_bit, (0,), exponent))
        else:
            coefficient *= sign_bit and -1 or 1
            with localcontext() as context:
                # Adjusting precision for taking into account arbitrarily
                # large/small numbers
>               context.prec = len(str(coefficient))
E               ValueError: Exceeds the limit (4300) for integer string conversion; use sys.set_int_max_str_digits() to increase the limit

mavjop avatar Oct 10 '24 18:10 mavjop

Those tests are pulled in through a submodule: https://github.com/amazon-ion/ion-tests/commits/master/ It looks like there were some recent additions there that are likely the cause.

While fixing it would be good, there should be some skip lists in that test_vectors that you could add that test to for now.

rmarrowstone avatar Oct 10 '24 18:10 rmarrowstone

Issue #229 ("CI/CD failed due to large int<->str conversions") looks like it's still open (since December 2022!) and might be talking about the same issue (i.e. issue is preexisting?).

Or perhaps not, since it's PYPY-specific, which isn't applicable.

mavjop avatar Oct 10 '24 18:10 mavjop

It is in various skip lists, but not in the global one (that applies to the pure Python implementation).

I found the problem!

My Python version is likely a newer update of 3.10 than other devs are using (I'm on 3.10.15).

Python 3.10.7 introduced a breaking change for type conversion, in order to prevent DOS attacks.

So the tests won't fail for Python 3.10 before 3.10.7, and I assume probably won't fail for 3.9.5 or 3.8.10 that are called out in https://github.com/amazon-ion/ion-python/blob/master/README.md.

Verified

I verified with a build/test on Python 3.9.5 that all tests now pass with flying colours.

TL;DR:

The tests failing for me shouldn't actually cause test failures in validation analyzers, if they're not using a newer Python than called for in the README.

mavjop avatar Oct 10 '24 18:10 mavjop

I created a separate PR to update the README about setuptools version needs. https://github.com/amazon-ion/ion-python/pull/373

mavjop avatar Oct 10 '24 19:10 mavjop