thomas chaton
thomas chaton
Hey @cgebbe. Any updates ?
Hey @cgebbe, 1. That's perfect 2. So I set drop_last=False. This would lead to the training hanging due to gradient synchornization. I don't recommend doing so 3. It might hang...
Hey @cgebbe. I made a new release. Could you try the latest release 0.2.20. Use `pip install -U litdata` to upgrade.
Hey @cgebbe. That's exactly what I meant. It would hang if you set drop_last=False. Right now. the solution is for you to add more training samples in the dataset.
We could also explore padding with duplicated data but this would increase litdata complexity quite a lot.
Hey @schopra8. Feel free to make a contribution. The main challenge will be to ensure fault tolerance works properly.
Yes, that's what I had in mind. The main challenge will be to make the slicing and reading as fast as possible. Might be worth to use: https://github.com/pola-rs/polars
The goal is to enable reading pyarrow HF datasets with LitData
Hey @esivonxay-cognitiv, Thanks for the reproducible script. I will have a look into it.
Hey @esivonxay-cognitiv I am curious, what's your interest and usage of LitData ?