git-archive-all icon indicating copy to clipboard operation
git-archive-all copied to clipboard

Allow setting tarfile format

Open legoktm opened this issue 4 years ago • 6 comments

For MediaWiki, we would like to be able to use a different tarfile format instead of the default one after identifying regressions in the new 3.8 default format.

We use this as a library, so if it would be possible to add an option to GitArchiver.create to let us specify tarfile.GNU_FORMAT that would be appreciated.

Our downstream ticket is https://phabricator.wikimedia.org/T257102.

legoktm avatar Aug 08 '20 23:08 legoktm

Perhaps you can consider conversation after the archive is created, as explained on StackOverflow?

Kentzo avatar Aug 10 '20 18:08 Kentzo

Thanks for the suggestion, I looked into that but bsdtar doesn't support --format=gnu (or at least the Fedora packaged version doesn't), and GNU tar doesn't support the easy conversion method that bsdtar does.

But while we could do it manually, it just seems pretty inefficient to create a tarball in a brokenish format, then uncompress it and recompress it in the correct format when we could just create it in the correct format to begin with.

legoktm avatar Aug 11 '20 11:08 legoktm

There are myriad of options if you think about it, and not just for tar but for the compressors too. And then there are also their flavors.

What if I extend the archiver to produce an mtree-formatted file, will it suffice? Or perhaps in some other intermediate format which you can easily work with using builtin tools and trivial shel pipelines.

Kentzo avatar Aug 11 '20 11:08 Kentzo

There are myriad of options if you think about it, and not just for tar but for the compressors too. And then there are also their flavors.

True. I haven't fully thought this through, but what about allowing some arbitrary options dict to be passed to ZipFile or Tarfile as kwargs? That would allow us to pass through format without needing you to create a parameter for every single potential option and gives us flexibility in the future too.

What if I extend the archiver to produce an mtree-formatted file, will it suffice? Or perhaps in some other intermediate format which you can easily work with using builtin tools and trivial shel pipelines.

The really nice part about using this library is that it just takes care of everything for us, with very little complexity on our side :) But if that's what you think is best, we'll update our script to make it work.

legoktm avatar Aug 17 '20 07:08 legoktm

As an update, we're now monkey patching tarfile.DEFAULT_FORMAT so this isn't a priority for us, but would be nice to have a less hacky way if possible.

legoktm avatar Aug 24 '20 08:08 legoktm

Since you are using it as a library, you should be able to call GitArchiver .archive_all_files directly passing custom callable to archive files in a way that suits you best.

Kentzo avatar May 21 '21 01:05 Kentzo