Augustinas Malinauskas

Results 49 comments of Augustinas Malinauskas

Also getting something similar ``` [rank1]: Traceback (most recent call last): [rank1]: File "/home/august/cfdx/ai/dna_fm/lightning_gym.py", line 59, in [rank1]: main() [rank1]: File "/home/august/cfdx/ai/dna_fm/lightning_gym.py", line 55, in main [rank1]: run_experiment(config) [rank1]: File...

Hi @XiaohanZhangCMU incredible job! I just tested and can confirm that this solves a problem with large number of shards. However, there may be another bottleneck with `StreamingDataLoader` on datasets...

Sharing profiler results on time to first batch

I see... So MosaicML currently is not suitable for datasets with large number of shards 😢. We're using litdata, but been thinking to migrate to MosaicML as has great features....

Thank you for the message. Will try on H100 - 8GPUs and report back the findings.

Hi guys, for me to load the dataset very large dataset (2.4B rows) even with the spanner fix takes 2min 28sec. When it comes to loading first batch ```python for...

Implemented fix here https://github.com/openai/openai-go/pull/418

Hi @bhimrazy @tchaton thank you for the reply. Would you say that as long I am saving data as torch.tensor or numpy array there should be no problems loading the...