pycardano icon indicating copy to clipboard operation
pycardano copied to clipboard

Support decoding list/indefinite list from cbor

Open theeldermillenial opened this issue 1 year ago • 6 comments

tl;dr: This PR adds support for proper parsing of fixed vs indefinite lists from CBOR.

Background Currently, pycardano uses a custom encoder to embed indefinite lists when encoding classes to cbor, even if the length of the list is less than 31 elements.

This creates an issue when decoding with cbor2, because cbor2 decodes all lists to the list class without any indication of whether the list was encoded as a fixed/indefinite list. This loss of information is problematic because hashes are frequently generated off of CBOR, so encoding as a fixed list vs indefinite list generates a different hash even if the content of the datum is the same.

Implementation Detail This PR is unfortunately dependent on a minor fix in cbor2. The CBORDecoder class doesn't have a custom decoding mechanism to make modifications to how data is decoded. One approach to fixing this issue would be to subclass CBORDecoder to intervene when an indefinite list is found. However, the CBORDecoder was implemented in such a way that it relies on a global dictionary that references internal method definitions on a class, making subclassing challenging. To resolve this issue a PR was opened to cbor2 that simply moves the global dictionaries inside the CBORDecoder class, which has no impact on any other section of code: https://github.com/agronholm/cbor2/pull/225

This PR takes advantage of that PR to subclass the CBORDecoderand intervene in the case of an indefinite list. When an indefinite list is discovered, it is cast to IndefiniteList and returned, while all other lists are returned as a standard list. This required some additional changes, specifically a change how data is being decoded on the pycardano side.

This should fix a few different Issues, and fixed a previously undiscovered issue related to roundtrip cbor encoding of datums (datum -> cbor ->datum -> cbor).

fixes #311 fixes #330

theeldermillenial avatar Mar 14 '24 02:03 theeldermillenial

Codecov Report

Attention: Patch coverage is 81.57895% with 7 lines in your changes missing coverage. Please review.

Project coverage is 83.92%. Comparing base (3d65c67) to head (b06c153).

Files Patch % Lines
pycardano/serialization.py 87.09% 1 Missing and 3 partials :warning:
pycardano/witness.py 0.00% 0 Missing and 2 partials :warning:
pycardano/hash.py 50.00% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #331      +/-   ##
==========================================
- Coverage   83.98%   83.92%   -0.06%     
==========================================
  Files          29       29              
  Lines        3733     3757      +24     
  Branches      941      946       +5     
==========================================
+ Hits         3135     3153      +18     
- Misses        433      436       +3     
- Partials      165      168       +3     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Mar 14 '24 02:03 codecov-commenter

I should mention I found a few other things as well.

Most importantly, this should not be merged in its current state. It might be appropriate to convert this to WIP until the cbor2 fix is merged.

There seemed to be type errors in code I did not touch. It might be related to the changes I made, but it could also be due to me updating packages.

theeldermillenial avatar Mar 14 '24 02:03 theeldermillenial

pycardano/plutus.py:77: error: Incompatible types in assignment (expression has type "list[Any]", target has type "bytes") [assignment] Installing missing stub packages: pycardano/plutus.py:565: error: Return type "CBORTag" of "to_shallow_primitive" incompatible with return type "list[bytes | bytearray | ByteString | str | int | float | Decimal | bool | tuple[Any, ...] | list[Any] | IndefiniteList | dict[Any, Any] | defaultdict[Any, Any] | OrderedDict[Any, Any] | UndefinedType | datetime | Pattern[Any] | CBORSimpleValue | CBORTag | set[Any] | frozenset[Any] | frozendict[Any, Any] | FrozenList[Any] | IndefiniteFrozenList | None]" in supertype "ArrayCBORSerializable" [override] pycardano/plutus.py:791: error: Return type "PlutusData | dict[Any, Any] | int | bytes | IndefiniteList | RawCBOR | CBORTag" of "to_primitive" incompatible with return type "bytes | bytearray | ByteString | str | int | float | Decimal | bool | tuple[Any, ...] | list[Any] | IndefiniteList | dict[Any, Any] | defaultdict[Any, Any] | OrderedDict[Any, Any] | UndefinedType | datetime | Pattern[Any] | CBORSimpleValue | CBORTag | set[Any] | frozenset[Any] | frozendict[Any, Any] | FrozenList[Any] | IndefiniteFrozenList | None" in supertype "CBORSerializable" [override] pycardano/cip/cip8.py:120: error: Incompatible types in assignment (expression has type "dict[str, Any]", variable has type "str") [assignment] pycardano/transaction.py:296: error: Incompatible types in assignment (expression has type "DatumHash | Any", variable has type "CBORTag") [assignment] /home/runner/.cache/pypoetry/virtualenvs/pycardano-XSOAH-2F-py3.10/bin/python -m pip install types-cachetools types-requests

There seem to be some type errors in the static analyzer. Either there is a bug or the analyzer needs more/better type hints

nielstron avatar Mar 18 '24 09:03 nielstron

Yes, that is exactly what I was talking about. I'm not entirely sure how those type errors got injected into this with my changes.

On the bright side, I do know this works because I've been using it.

I'm trying to get the attention of the cbor2 maintainer to merge the fix.

theeldermillenial avatar Mar 18 '24 11:03 theeldermillenial

I guess suddenly the return type of cbor2.encode which was previously untyped came into scope and unveiled a bunch of mismatching types :)

nielstron avatar Mar 18 '24 20:03 nielstron

So...it's my fault that I actually typed it...Lesson learned. Never type anything.

Thanks for working through it.

theeldermillenial avatar Mar 18 '24 23:03 theeldermillenial

Revisiting this PR.. @theeldermillenial is it still WIP, or we can prepare for a merge?

cffls avatar Nov 02 '24 18:11 cffls

@cffls Unfortunately still WIP. The maintainer of cbor2 wants to make sure that the fix I made has feature parity with the C extension. Unfortunately, that is something that will take more time than I currently have.

I am going to be adding someone to my team, and hopefully they can try to tackle this or free up enough of my time to fix it.

theeldermillenial avatar Nov 02 '24 18:11 theeldermillenial