pymp4
pymp4 copied to clipboard
Upgrade from Construct 2.8.8 to 2.10.68 (All tests pass on 3.10)
Warning This has changes to both tests and user code that would be breaking changes. This is not something a user should simply upgrade to without checking and changing code. This WILL break user code. Perhaps a major upgrade is necessary? (4.x). This pull request is based on #9 but with some semantics changed to be as non-breaking as possible.
In this pull request, I've made as few changes to the existing code as possible. I've also reviewed my code multiple times before pushing, unlike in #9 where changes in each commit seem unbalanced, bad commit titles and descriptions, and various debugging code left in.
Breaking changes to users in pymp4 (so far) if pulled:
- All data in the boxes and the high-level
Boxconstruct that were parsed as strings, are no longer left asbytestype. They are now decoded to unicode strings (i.e., "foo" not b"foo", or "tenc" not b"tenc"). This affects everything from box type, ftyp major_brand, and so on. - Since it now uses Construct 2.10, you can no longer define/modify a construct by re-calling the constructor of the constructor return, because it no longer returns the constructor like that. I.e., you must now do:
Container(
offset=0,
type="ftyp",
major_brand="iso5",
minor_version=1,
compatible_brands=["iso5", "avc1"],
end=24
)
Instead of:
Container(offset=0)
(type="ftyp")
(major_brand="iso5")
(minor_version=1)
(compatible_brands=["iso5", "avc1"])
(end=24)
- All custom adapters have been moved from
parser.pytoadapters.py. - TellMinusSizeOf is now TellPlusSizeOf and has been moved from
parser.pytosubconstructs.py. - All boxes field's are now nested within a
datafield. This was simply a requirement. I could not think of any way to embedded the box fields, nor can the maintainer of construct when asked. It seems to simply be a completely pulled feature that will not return. This means all user code that accesses fields must be updated from e.g.,tenc_box.is_encryptedtotenc_box.data.is_encrypted. - All boxes had the type field removed from their individual structs. It is unnecessary as the length information isn't prefixed on them, so they were more or less already risky to use manually/outside of the Box struct. This was required so as to not duplicate the type in the Box container and the Box's
datafield container. - All EmbeddedBitStruct and Embedded(BitStruct()) fields were nested under a "flags" field. This has the same ramifications as change 5.
- The ISO6392TLanguageCode Adapter has been reworked to expect an Int16 input as obj, which will then do Bitwise operations on it itself. This was so we don't need to nest the parsed Language Code under another field (i.e.,
box.language.codelike #9 did).
If you have any ideas for improvements or want to change how those "data"/"flags" fields are named or function, feel free to. Sadly I could not find anything better.
All tests pass on Python 3.10.
(pymp4-py3.10) PS C:\Users\rlaphoenix\Projects\pymp4> python -m pytest tests/
================================= test session starts ==================================
platform win32 -- Python 3.10.10, pytest-7.0.1, pluggy-1.0.0
rootdir: C:\Users\rlaphoenix\Projects\pymp4
plugins: cov-4.0.0
collected 17 items
tests\test_box.py ........ [ 47%]
tests\test_dashboxes.py .. [ 58%]
tests\test_util.py ....... [100%]
================================== 17 passed in 0.09s ==================================
Closes #23 and #3
Codecov Report
Patch coverage: 92.30% and project coverage change: +4.41% :tada:
Comparison is base (
8e00455) 75.24% compared to head (6e47a61) 79.66%.
:exclamation: Current head 6e47a61 differs from pull request most recent head 73a4ba4. Consider uploading reports for the commit 73a4ba4 to get more accurate results
Additional details and impacted files
@@ Coverage Diff @@
## master #24 +/- ##
==========================================
+ Coverage 75.24% 79.66% +4.41%
==========================================
Files 5 7 +2
Lines 202 177 -25
==========================================
- Hits 152 141 -11
+ Misses 50 36 -14
| Files Changed | Coverage Δ | |
|---|---|---|
| src/pymp4/adapters.py | 90.00% <90.00%> (ø) |
|
| src/pymp4/subconstructs.py | 91.66% <91.66%> (ø) |
|
| src/pymp4/parser.py | 93.58% <100.00%> (+9.88%) |
:arrow_up: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@rlaphoenix do you think you could rebase this PR?
@rlaphoenix do you think you could rebase this PR?
Done 👍
Do you think this is good to go? @rlaphoenix
Other than the cons/differences as listed in the pull description, it should be, yeah. All tests are successful, just any client that uses pymp4 should be very hesitant to update pymp4 until they can verify their own calls to construct will support 2.10, as well as understand the changes with pymp4 in regards to accessing most parsed box information, again referencing the pull request description.
Probably best to do a pymp4 v2 release if we do go ahead and pull this, just so ^-based dependency requirements won't update to pymp4 2.x without manual intervention.
I think the box type should be still be byte type since mp4 specs defines them as fourCC, which is just realy int made from readable 4 chars. Also there are specs, such as Apple metadata, which define types as non-ascii, i.e. byte > 0x7F (\251nam).
I would be happen with making them Bytes(4) instead of PaddedString(4, 'ascii'). This will keep existing application working and match the spec. Also makes extending pymp4 much more prodictable and work with boxes not defined in pymp4.
Hi, sorry if this is perhaps not the right place to report this, but I found a bug in this branch - saio boxes are not parsed correctly.
test case:
from pymp4.parser import Box
data = bytes.fromhex("000000147361696f000000000000000100000325")
saio = Box.parse(data)
print(saio)
In the master branch, we get the expected output:
Container(offset=0)(type=b'saio')(version=0)(flags=Container(has_aux_info_type=False))(aux_info_type=None)(aux_info_type_parameter=None)(offsets=[805])(end=20)
But in construct-2.10-patch:
[...]
File "/usr/lib/python3.12/site-packages/construct/expr.py", line 188, in __call__
return self.__parent(obj)[self.__field]
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
KeyError: 'version'
(the full backtrace is longer)
Relevant code: https://github.com/devine-dl/pymp4/blob/33dc5d620f361cb49d713b60dcb2686ab8696f91/src/pymp4/parser.py#L511-L522
Possibly related: https://github.com/construct/construct/issues/409