git-archive-all
git-archive-all copied to clipboard
Allow setting tarfile format
For MediaWiki, we would like to be able to use a different tarfile format instead of the default one after identifying regressions in the new 3.8 default format.
We use this as a library, so if it would be possible to add an option to GitArchiver.create
to let us specify tarfile.GNU_FORMAT
that would be appreciated.
Our downstream ticket is https://phabricator.wikimedia.org/T257102.
Perhaps you can consider conversation after the archive is created, as explained on StackOverflow?
Thanks for the suggestion, I looked into that but bsdtar doesn't support --format=gnu (or at least the Fedora packaged version doesn't), and GNU tar doesn't support the easy conversion method that bsdtar does.
But while we could do it manually, it just seems pretty inefficient to create a tarball in a brokenish format, then uncompress it and recompress it in the correct format when we could just create it in the correct format to begin with.
There are myriad of options if you think about it, and not just for tar but for the compressors too. And then there are also their flavors.
What if I extend the archiver to produce an mtree-formatted file, will it suffice? Or perhaps in some other intermediate format which you can easily work with using builtin tools and trivial shel pipelines.
There are myriad of options if you think about it, and not just for tar but for the compressors too. And then there are also their flavors.
True. I haven't fully thought this through, but what about allowing some arbitrary options
dict to be passed to ZipFile or Tarfile as kwargs? That would allow us to pass through format without needing you to create a parameter for every single potential option and gives us flexibility in the future too.
What if I extend the archiver to produce an mtree-formatted file, will it suffice? Or perhaps in some other intermediate format which you can easily work with using builtin tools and trivial shel pipelines.
The really nice part about using this library is that it just takes care of everything for us, with very little complexity on our side :) But if that's what you think is best, we'll update our script to make it work.
As an update, we're now monkey patching tarfile.DEFAULT_FORMAT
so this isn't a priority for us, but would be nice to have a less hacky way if possible.
Since you are using it as a library, you should be able to call GitArchiver .archive_all_files
directly passing custom callable to archive files in a way that suits you best.