Modernization and cleanup
This PR completely restructures the MediaFile library from a single monolithic file into a modular, maintainable package architecture. The core functionality remains unchanged while improving code organization, readability, and future maintainability.
Architectural Restructuring
- Split monolithic
mediafile.py(2342 lines) into logical modules:mediafile/__init__.py- Main MediaFile class and public APImediafile/constants.py- Constants and enums (TYPES, ImageType)mediafile/exceptions.py- Custom exception hierarchymediafile/fields.py- Field descriptor classes (MediaField, DateField, etc.)mediafile/storage/- Format-specific storage strategiesmediafile/utils/- Utility functions and helpers
Build System & Dependencies
- Updated to modern Python packaging with
pyproject.toml - Dropped Python 3.7 and 3.8 support - now requires Python ≥3.9
- Updated dependency specifications and modernized build configuration
- Removed Tox in favor of plain GitHub Actions for testing
- New version:
1.0.0-alpha.1- reflecting the significant architectural changes - Coverage: Code test coverage is now uploaded as artifact (maybe we want to switch to codecov at some point)
To review, try to look at one commit at a time - the changes are organized logically across commits to make the refactoring easier to follow. The source code stayed mostly unchanged: I renamed some functions (removed the underscore _) but other than this it should be functionally identical.
Thank you for the PR! The changelog has not been updated, so here is a friendly reminder to check if you need to add an entry.
Great idea! Restructureing is overdue! I started reading this commit by commit. Thanks for thoroughly doing things in a readable manner! Will try to use it in my main branch too very soon and hope to get back with a review at some point
This looks good. Since this PR does not change the interface to the world, I think, it doesn't require a major version bump. We'd rather save it for the future when we do add breaking changes.
Are you happy to add .git-blame-ignore-revs?
I'm not sure the gitblame is able to pickup the moved files but I can add it.
it doesn't require a major version bump
Are we sure about that? Imports changed all over the place and I also renamed some functions. This will break some upstream consumer. Therefore major increase in the version number.
My idea was to to do an 1.0.0alpha now, than with the typehints do an 1.0.0beta (this will most likely also break some things too). Once we are happy with the changes and are sure everything works as expected we do an 1.0.0 or 1.0.0rc1. Standard pip installs will ignore pre-release versions and will still install the latest stable version.
We'd rather save it for the future when we do add breaking changes.
We can always do another major bump on breaking changes.
I'm not sure the gitblame is able to pickup the moved files but I can add it.
You need to supply -CM flags to git blame to take into account copied and moved code.
I think, commit 226e455a5605cf70a4387ba6a17da85d41c6ce87 also needs ignoring.
Imports changed all over the place
Internally, they have. However, from the perspective of an upstream consumer, they seem to still be available in mediafile.py. On the other hand, I think we should add all classes that were previously defined in mediafile module to __all__ - I think I've seen some of them being used in the wild.
I also renamed some functions.
Those seem to have been private (prefixed by _), unless you made them private in the previous PR?
It seems like commits cbc64e3 and 0c6ccd4 should be moved to #87, right? Presumably, the scope of this PR is restructuring of everything under mediafile folder.
It seems like commits https://github.com/beetbox/mediafile/commit/cbc64e3e0903585e1865f7e80f92a2b066776ac7 and https://github.com/beetbox/mediafile/commit/0c6ccd4478e897e8ea908caafe76229ab5f01790 should be moved to https://github.com/beetbox/mediafile/pull/87, right?
I could move them if you want. I would not put them into the uv PR tho. It is just a simple modernization/version bump of the flit packaging setup. And ofcourse I removed tox.
Imports changed all over the place
Internally, they have. However, from the perspective of an upstream consumer, they seem to still be available in
mediafile.py. On the other hand, I think we should add all classes that were previously defined inmediafilemodule to__all__- I think I've seen some of them being used in the wild.I also renamed some functions.
Those seem to have been private (prefixed by
_), unless you made them private in the previous PR?
I dont care too much tbh. Thought it might be a good time for the package to approach an 1.0.0 :shrug: No hard feelings either way. We can do an 0.14.0 otherwise. Btw, the semver spec is a surprisingly good read. https://semver.org/spec/v1.0.0.html#how-do-i-know-when-to-release-100. According to the spec, the first major version should mark the point at which we define the public api for future releases.
Those seem to have been private (prefixed by _), unless you made them private in the previous PR?
It's indeed the only renaming I have done. We should be fine here :crossed_fingers:
Internally, they have. However, from the perspective of an upstream consumer, they seem to still be available in mediafile.py. On the other hand, I think we should add all classes that were previously defined in mediafile module to all - I think I've seen some of them being used in the wild.
You mean all of them :rofl: ? It was only one file beforehand.
Imports changed all over the place
Internally, they have. However, from the perspective of an upstream consumer, they seem to still be available in
mediafile.py. On the other hand, I think we should add all classes that were previously defined inmediafilemodule to__all__- I think I've seen some of them being used in the wild.I also renamed some functions.
Those seem to have been private (prefixed by
_), unless you made them private in the previous PR?I dont care too much tbh. Thought it might be a good time for the package to approach an 1.0.0 🤷 No hard feelings either way. We can do an 0.14.0 otherwise. Btw, the semver spec is a surprisingly good read. https://semver.org/spec/v1.0.0.html#how-do-i-know-when-to-release-100. According to the spec, the first major version should mark the point at which we define the public api for future releases.
I read this. Thanks for the recommendation. mediafile should have been 1.something a long time ago already IMHO. I think it doesnt really matter what version you give after merging this. I think 1.00-alpha-something is a good idea!
Let's aim for the important part: 1.0.0 should be with new packaging (which then should stay the normal/stable way of packaging aka a stable public api / usage, reading a little bit between the lines this is what the guide tells me!)
Also I think we should merge this one quickly and release it. if it's an alpha, it's ok if something is not ok 100% yet. Even though I think it might be fine already. Am I right that actual changes are:
- Refactor Exceptions
- Rename some methods to
_something
?
Also I think we should merge this one quickly and release it. if it's an alpha, it's ok if something is not ok 100% yet. Even though I think it might be fine already. Am I right that actual changes are: Refactor Exceptions Rename some methods to _something
@JOJ0 Yes you are right here. These should be as far as I remember the only changes. To give a bit of context: I started to fix the exceptions and than thought f* it let's restructure all of mediafile. If we want to review this separately I can move it out.
IMHO moving out not necessary. @snejus agrees too?
No concerns here, I think it's good to get activity going on Mediafile again, and it'll be easier to fix any issues that do arise with this refactor.
That's the reason why I want to do a proper 1.0.0 release to properly define the public api. See here. I dont think we need to care about this too much if we do a major release.
See this comment too: https://github.com/beetbox/mediafile/pull/86#issuecomment-3446868911
That's the reason why I want to do a proper 1.0.0 release to properly define the public api. See here. I dont think we need to care about this too much if we do a major release.
I did not realise that you intended to break the public API here 😅. Major version upgrade makes sense in such case.
However, I am not at all convinced that we should do this, given that we simply need to extend __all__ definition to not break it.
A couple of repositories that depend on something that is not currently included in __all__:
$ gh search code --extension=py mediafile.MediaField --json=repository | jq '.[].repository.url' -r | grep -v beets_pr
https://github.com/beetbox/beets
https://github.com/Neurrone/beets-audible
https://github.com/adamjakab/BeetsPluginXtractor
https://github.com/jaeheonshim/vibenet
https://github.com/clinton-hall/nzbToMedia
https://github.com/kergoth/beets-kergoth
https://github.com/SimonTeixidor/beets-plugins
https://github.com/Morikko/beets-multivalue
https://github.com/AlexChalk/dotfiles
https://github.com/mgoltzsche/beets-ytimport
https://github.com/tweitzel/beets-recordingdate
https://github.com/rembo10/headphones
https://github.com/adamjakab/BeetsPluginGoingRunning
https://github.com/Multipixelone/beets-plugins
https://github.com/grantoesterling/beets-rym
https://github.com/bbaserdem/NixOS-Config
https://github.com/EcoG-One/Web-Media-Server-and-Player
https://github.com/beetbox/beets
Not all of them define a strict mediafile = "^1.0..." requirement, and beets is one of them that don't:
mediafile = ">=0.12.0"
That's the reason why I want to do a proper 1.0.0 release to properly define the public api. See here. I dont think we need to care about this too much if we do a major release.
I did not realise that you intended to break the public API here 😅. Major version upgrade makes sense in such case.
However, I am not at all convinced that we should do this, given that we simply need to extend
__all__definition to not break it.A couple of repositories that depend on something that is not currently included in
__all__:$ gh search code --extension=py mediafile.MediaField --json=repository | jq '.[].repository.url' -r | grep -v beets_pr https://github.com/beetbox/beets https://github.com/Neurrone/beets-audible https://github.com/adamjakab/BeetsPluginXtractor https://github.com/jaeheonshim/vibenet https://github.com/clinton-hall/nzbToMedia https://github.com/kergoth/beets-kergoth https://github.com/SimonTeixidor/beets-plugins https://github.com/Morikko/beets-multivalue https://github.com/AlexChalk/dotfiles https://github.com/mgoltzsche/beets-ytimport https://github.com/tweitzel/beets-recordingdate https://github.com/rembo10/headphones https://github.com/adamjakab/BeetsPluginGoingRunning https://github.com/Multipixelone/beets-plugins https://github.com/grantoesterling/beets-rym https://github.com/bbaserdem/NixOS-Config https://github.com/EcoG-One/Web-Media-Server-and-Player https://github.com/beetbox/beetsNot all of them define a strict
mediafile = "^1.0..."requirement, andbeetsis one of them that don't:mediafile = ">=0.12.0"
This is what you can expect if you don't use lockfiles and do not fix major releases tho. It was planning an alpha release anyways which should not affect these packages in any way. There will probably be quite some more changes if I introduces mypy here too.
Also we already had an __all__ which obviously defines the current "public" api. These exports have not changed so we should be fine on this end too.
It indeed doesn't break! I guess, confuse broke because it had
# confuse/__init__.py
from core import *
And the star import picked up only the members defined in __all__.
Could you explain why do we have these imports in __all__ and nothing else?
__all__ = [
"UnreadableFileError",
"FileTypeError",
"MediaFile",
"Image",
"TYPES",
"ImageType",
]
I did not realise that you intended to break the public API here 😅. Major version upgrade makes sense in such case.
I think you have a misunderstanding here. The semver guide states "1.0.0 DEFINES" the public API, I think that is what @semohr tries to do here. He didn't mention "breaking"
Could you explain why do we have these imports in all and nothing else?
It was like this before. Thought it might be useful to have the types exported too so I added them.
Even tho __all__ practically only defines how an from mediafile export * is handled it also serves as a curated list of names (functions, classes, variables) that constitute the module’s public API. This helps users identify intended entry points and prevents reliance on internal implementation details.
It was like this before. Thought it might be useful to have the types exported too so I added them.
I didn't realise this! This probably got lost in the big diff, and it showed it as if it was added for the first time 🤦🏼
Well, if that's the case then there's barely any change compared to what we had before. Let's merge this in - feel free to ignore the failing build which will get fixed in the next PR.
@semohr I took the liberty and merged-in #88 , you can rebase/update/cleanup this one now if you want/find time. HTH
Have just rebased it - keen for this to be merged!
Thanks! Let's merge this than :upside_down_face:
Attempted to test beets using the most recent commit and found that missing exports are causing issues, as I warned above:
ImportError while importing test module '/home/sarunas/repo/beets/test/plugins/test_art.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.local/share/uv/python/cpython-3.10.18-linux-x86_64-gnu/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
test/plugins/test_art.py:40: in <module>
from beetsplug import fetchart
beetsplug/fetchart.py:30: in <module>
from mediafile import image_mime_type
E ImportError: cannot import name 'image_mime_type' from 'mediafile' (/media/poetry/virtualenvs/beets-yAypcYUQ-py3.10/lib/python3.10/site-packages/mediafile/__init__.py)