ndarray-npy icon indicating copy to clipboard operation
ndarray-npy copied to clipboard

Add zstd-compressed archives support

Open DCNick3 opened this issue 2 years ago • 3 comments

zstd can offer compression rates/speed much superior to standard deflate used in zip.

zip files, which npz is based on, already have a standard way to use zstd compression, so this PR just utilizes this way.

It should be noted that python stdlib (and, hence, numpy), doesn't yet support this compression scheme, but support can be hacked in with this monkeypatch (requires zipfile_zstd package):

def numpy_zstd_monkeypatch():
	"""
	monkey patch numpy to support zstd for npz files
	"""
	import numpy as np
	from numpy.compat import os_fspath

	def zipfile_factory(file, *args, **kwargs):
		"""
		Create a ZipFile.
		Allows for Zip64, and the `file` argument can accept file, str, or
		pathlib.Path objects. `args` and `kwargs` are passed to the zipfile.ZipFile
		constructor.
		"""
		if not hasattr(file, 'read'):
			file = os_fspath(file)
		import zipfile_zstd as zipfile
		kwargs['allowZip64'] = True
		return zipfile.ZipFile(file, *args, **kwargs)

	np.lib.npyio.zipfile_factory = zipfile_factory

DCNick3 avatar Mar 21 '23 10:03 DCNick3

Instead of adding a zstd-specific method, let's just add a NpzWriter::new_with_options method which accepts a zip::write::FileOptions instance.

jturner314 avatar Mar 22 '23 00:03 jturner314

let's just add a NpzWriter::new_with_options method which accepts a zip::write::FileOptions instance.

Fair enough.

I am also going to remove the compressed_npz_zstd feature. If the user would want to use this method, they would need to have a dependency on the zip crate anyway, so it woudn't be that hard to enable the required features.

DCNick3 avatar Mar 22 '23 01:03 DCNick3

I think CI failures are unrelated to my changes

DCNick3 avatar Mar 22 '23 10:03 DCNick3