detect breaking changes
Implement https://github.com/apache/iceberg-python/issues/334
- Uses griffe to detect breaking changes since the latest release
- Allows users to specify an exclusion list that will silence test failures on the introduction of intended breaking changes
The CI currently reports 58 breaking changes between 0.5.1 and 976c0ea. (I'm trying to understand in which PR these breaking changes were introduced)
When test_api.py was first written, it detected just 1 breaking change related to the backward incompatible Daft change:
Example stack trace:
assert not [{'kind': <BreakageKind.ATTRIBUTE_CHANGED_VALUE: 'Attribute value was changed'>, 'new_value': ExprDict(keys=['1', '2'], values=[ExprCall(function=ExprName(name='StructType', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-1b_iv7st/griffe_HEAD/pyiceberg/manifest.py'))), arguments=[ExprCall(function=ExprName(name='NestedField', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-1b_iv7st/griffe_HEAD/pyiceberg/manifest.py'))), arguments=[ExprKeyword(name='field_id', value='100', function=ExprName(name='NestedField', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-1b_iv7st/griffe_HEAD/pyiceberg/manifest.py')))), ExprKeyword(name='name', value="'file_path'", function=ExprName(name='NestedField', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-1b_iv7st/griffe_HEAD/pyiceberg/manifest.py')))), ExprKeyword(name='field_type', value=ExprCall(function=ExprName(name='StringType', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-1b_iv7st/griffe_HEAD/pyiceberg/manifest.py'))), arguments=[]), function=ExprName(name='NestedField', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-1b_iv7st/griffe_HEAD/pyiceberg/manifest.py')))), ExprKeyword(name='required', value='True', function=ExprName(name='NestedField', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-1b_iv7st/griffe_HEAD/pyiceberg/manifest.py')))), ExprKeyword(name='doc', value="'Location URI with FS scheme'", function=ExprName(name='NestedField', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-1b_iv7st/griffe_HEAD/pyiceberg/manifest.py'))))]), ExprCall(function=ExprName(name='NestedField', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-1b_iv7st/griffe_HEAD/pyiceberg/manifest.py'))), arguments=[ExprKeyword(name='field_id', value='101', function=ExprName(name='NestedField', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-1b_iv7st/griffe_HEAD/pyiceberg/manifest.py')))), ExprKeyword(name='name', value="'file_format'", function=ExprName(name='NestedField', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-1b_iv7st/griffe_HEAD/pyiceberg/manifest.py')))), ExprKeyword(name='field_type', value=ExprCall(function=ExprName(name='StringType', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-1b_iv7st/griffe_HEAD/pyiceberg/manifest.py'))), arguments=[]), function=ExprName(name='NestedField', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-1b_iv7st/griffe_HEAD/pyiceberg/manifest.py')))), ExprKeyword(name='required', value='True', function=ExprName(name='NestedField', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-1b_iv7st/griffe_HEAD/pyiceberg/manifest.py')))), ExprKeyword(name='doc', value="'File format name: avro, orc, or parquet'", function=ExprName(name='NestedField', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-1b_iv7st/griffe_HEAD/pyiceberg/manifest.py'))))]), ExprCall(function=ExprName(name='NestedField', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-1b_iv7st/griffe_HEAD/pyiceberg/manifest.py'))), arguments=[ExprKeyword(name='field_id', value='102', function=ExprName(name='NestedField', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-1b_iv7st/griffe_HEAD/pyiceberg/manifest.py')))), ExprKeyword(name='name', value="'partition'", function=ExprName(name='NestedField', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-1b_iv7st/griffe_HEAD/pyiceberg/manifest.py')))), ExprKeyword(name='field_type', value=ExprCall(function=ExprName(name='StructType', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-1b_iv7st/griffe_HEAD/pyiceberg/manifest.py'))), arguments=[]), function=ExprName(name='NestedField', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-1b_iv7st/griffe_HEAD/pyiceberg/manifest.py')))), ExprKeyword(name='required', value='True', function=ExprName(name='NestedField', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-1b_iv7st/griffe_HEAD/pyiceberg/manifest.py')))), ExprKeyword(name='doc', value="'Partition data tuple, schema based on the partition spec'", function=ExprName(name='NestedField', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-1b_iv7st/griffe_HEAD/pyiceberg/manifest.py'))))]), ExprCall(function=ExprName(name='NestedField', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-1b_iv7st/griffe_HEAD/pyiceberg/manifest.py'))), arguments=[ExprKeyword(name='field_id', value='103', function=ExprName(name='NestedField', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-1b_iv7st/griffe_HEAD/pyiceberg/manifest.py')))), ExprKeyword(name='name', value="'record_count'", function=ExprName(name='NestedField', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-1b_iv7st/griffe_HEAD/pyiceberg/manifest.py')))), ExprKeyword(name='field_type', value=ExprCall(function=ExprName(name='LongType', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-1b_iv7st/griffe_HEAD
Which of these breaking change types should we include in our unit test?
I've currently removed ATTRIBUTE_CHANGED_VALUE type because it was creating a lot of noise in the output.
Example:
{'kind': <BreakageKind.ATTRIBUTE_CHANGED_VALUE: 'Attribute value was changed'>, 'object_path': 'pyiceberg.catalog.hive.DEFAULT_PROPERTIES', 'old_value': ExprDict(keys=["'write.parquet.compression-codec'"], values=["'zstd'"]), 'new_value': ExprDict(keys=[ExprAttribute(values=[ExprName(name='TableProperties', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-v_k3_7xc/griffe_HEAD/pyiceberg/catalog/hive.py'))), ExprName(name='PARQUET_COMPRESSION', parent=ExprName(name='TableProperties', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-v_k3_7xc/griffe_HEAD/pyiceberg/catalog/hive.py'))))])], values=[ExprAttribute(values=[ExprName(name='TableProperties', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-v_k3_7xc/griffe_HEAD/pyiceberg/catalog/hive.py'))), ExprName(name='PARQUET_COMPRESSION_DEFAULT', parent=ExprName(name='TableProperties', parent=Module(PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-v_k3_7xc/griffe_HEAD/pyiceberg/catalog/hive.py'))))])])}
| Breakage Kind | Description | Ignore | Rationale |
|---|---|---|---|
| PARAMETER_MOVED | Positional parameter was moved | ||
| PARAMETER_REMOVED | Parameter was removed | ||
| PARAMETER_CHANGED_KIND | Parameter kind was changed | ||
| PARAMETER_CHANGED_DEFAULT | Parameter default was changed | ||
| PARAMETER_CHANGED_REQUIRED | Parameter is now required | ||
| PARAMETER_ADDED_REQUIRED | Parameter was added as required | ||
| RETURN_CHANGED_TYPE | Return types are incompatible | ||
| OBJECT_REMOVED | Public object was removed | ||
| OBJECT_CHANGED_KIND | Public object points to a different kind of object | ||
| ATTRIBUTE_CHANGED_TYPE | Attribute types are incompatible | ||
| ATTRIBUTE_CHANGED_VALUE | Attribute value was changed | Too noisy, and in most cases not a serious breaking change (like changing a constant value for Table Property) | |
| CLASS_REMOVED_BASE | Base class was removed |
Thanks for setting this up @syun64. This looks great. I think we can just give it a try after the 0.6.0 release and see how noisy it is.
@syun64 I was on a merging spree, can you rebase once more? 😓
Looks like we broke something already 😸 Can we make a list to allow breaking changes? Similar to https://github.com/apache/parquet-mr/blob/d8396086b3e3fefc6829f8640917c3bbde0fa9c4/pom.xml#L581-L606
Looks like we broke something already 😸 Can we make a list to allow breaking changes? Similar to https://github.com/apache/parquet-mr/blob/d8396086b3e3fefc6829f8640917c3bbde0fa9c4/pom.xml#L581-L606
Sure! Thank you for the suggestion. I took a stab at implementing this with a YAML file - let me know what you think!
Hey @syun64 Thanks for adding the yaml, that looks neat. What's your gist of Griffe? It looks like there are already some false positives.
Hey @syun64 Thanks for adding the yaml, that looks neat. What's your gist of Griffe? It looks like there are already some false positives.
Thank you for all the feedback @Fokko ! My general impression with griffe is that it's working pretty well, and after carefully reviewing and adding even more breaking changes we've introduced since the last release, I feel convinced that we should include some implementation of this tool.
There are some edge cases that I thought were worth summarizing:
- Cython Class interface inference isn't perfect. As I noted in this comment, either the .so or .pyi files are being used. And when .so file is used to interprete the interface, it returns None return types for all of its members:
>>> griffe.load("pyiceberg")['avro']['decoder_fast']['CythonBinaryDecoder'].relative_filepath
PosixPath('pyiceberg/avro/decoder_fast.cpython-38-x86_64-linux-gnu.so')
>>> griffe.load("pyiceberg")['avro']['decoder_fast']['CythonBinaryDecoder']['tell'].returns
>>> griffe.load_git("pyiceberg")['avro']['decoder_fast']['CythonBinaryDecoder'].relative_filepath
PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-nu9f66g2/griffe_HEAD/pyiceberg/avro/decoder_fast.pyi')
>>> griffe.load_git("pyiceberg")['avro']['decoder_fast']['CythonBinaryDecoder']['tell'].returns
ExprName(name='int', parent=Class('CythonBinaryDecoder', 20, 73))
- Import removal is also considered as a breaking change because it's a public function/Class that was available on the module, that's been removed.
Outside of these two edge cases, I think all the exclusions listed are very accurate, and I think the test will require the contributors to think twice about making a breaking change, or to document an intended breaking change into the yaml file so we developers are able to trace down breaking changes a lot easier.
Current pyiceberg-0.6.0 exclusion list:
exclude:
- pyiceberg.table.create_mapping_from_schema # Removed import (Alias) #441
- pyiceberg.table.Table.next_sequence_number # Table -> TableMetadata #471
- pyiceberg.table.Table.new_snapshot_id # Table -> TableMetadata #471
- pyiceberg.table.StaticTable.next_sequence_number # Table -> TableMetadata #471
- pyiceberg.table.StaticTable.new_snapshot_id # Table -> TableMetadata #471
- pyiceberg.table.Transaction.add_snapshot # removed in favor of directly using TableUpdate and TableRequirement #471
- pyiceberg.table.Transaction.set_ref_snapshot # removed in favor of directly using TableUpdate and TableRequirement #471
- pyiceberg.catalog.rest.TokenResponse.* # nullability change #466
- pyiceberg.io.pyarrow.write_file # Parameter change #471
- pyiceberg.cli.console.run # Positional parameter change #472
@jaychia 'suggestion -> reorganize modules to the top level
@Fokko @HonahX @jaychia - should we remove this from the 0.7.0 milestone? I think we'd benefit from a prolonged discussion and attempt in organizing our API. This feels like an item we just need to complete before the 1.0.0 release