litdata icon indicating copy to clipboard operation
litdata copied to clipboard

Add training mode compression for zstd

Open tchaton opened this issue 1 year ago β€’ 3 comments

πŸš€ Feature

Motivation

https://github.com/facebook/zstd

Screenshot 2024-07-31 at 22 49 37

https://python-zstandard.readthedocs.io/en/latest/dictionaries.html

Pitch

Alternatives

Additional context

tchaton avatar Jul 31 '24 21:07 tchaton

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 16 '25 05:04 stale[bot]


Also, chatgpt says:

Image

The graph shared has 10K different json files of roughly 1KB each.

LitData chunks on an average will be 64MB or more, but definitely not in a few KBs.


should I close the issue, or give it a try?

deependujha avatar May 05 '25 11:05 deependujha

Let’s keep this open for now β€” could be fun to try out for learning purposes.

Also, cool to see zstd coming to the Python 3.14 stdlib! πŸš€ https://docs.python.org/3.14/whatsnew/3.14.html#whatsnew314-pep784

bhimrazy avatar Jun 04 '25 06:06 bhimrazy