thomas chaton comments

Results 276 comments of


                                            thomas chaton

RuntimeError: Can't start new thread

Hey @cgebbe. Any updates ?

RuntimeError: Can't start new thread

Hey @cgebbe, 1. That's perfect 2. So I set drop_last=False. This would lead to the training hanging due to gradient synchornization. I don't recommend doing so 3. It might hang...

RuntimeError: Can't start new thread

Hey @cgebbe. I made a new release. Could you try the latest release 0.2.20. Use `pip install -U litdata` to upgrade.

RuntimeError: Can't start new thread

Hey @cgebbe. That's exactly what I meant. It would hang if you set drop_last=False. Right now. the solution is for you to add more training samples in the dataset.

RuntimeError: Can't start new thread

We could also explore padding with duplicated data but this would increase litdata complexity quite a lot.

Use different batch sizes in CombinedStreamingDataset

Hey @schopra8. Feel free to make a contribution. The main challenge will be to ensure fault tolerance works properly.

Add support for parquet files for storing the chunks

Yes, that's what I had in mind. The main challenge will be to make the slicing and reading as fast as possible. Might be worth to use: https://github.com/pola-rs/polars

Add support for parquet files for storing the chunks

The goal is to enable reading pyarrow HF datasets with LitData

Using a streaming dataloader with an unbalanced dataset yields unexpected batch sizes.

Hey @esivonxay-cognitiv, Thanks for the reproducible script. I will have a look into it.

Using a streaming dataloader with an unbalanced dataset yields unexpected batch sizes.

Hey @esivonxay-cognitiv I am curious, what's your interest and usage of LitData ?