Jiawei Ou comments

Results 10 comments of


                                            Jiawei Ou

Add ContainerArguments to sagemaker.estimator.Estimator

What a bummer. I was planning to start using Hydra to manage the config files. There is just so many irritating things around sagemaker, I should just look for an...

Jupyter Notebook

I just installed Zed this morning and love the style and speed, but after finding that Notebook is not supported... I have to go back to VSCode.

Training slowed down as time progress with litdata streaming dataset

I am going to try what @tchaton suggested to reload the dataloader each epoch. I am also going to try to increase the on-disk cache by a lot to see...

Training slowed down as time progress with litdata streaming dataset

I figured out what was the issue. In some cases, I accidentally set num_worker=0. Setting num_worker=1 will solve the problem.

Time per sample grows as processed samples grows

I have observed the same issue for some of my datasets. In one case, over about 4 days, the training time grew from about 50 mins per epoch to almost...

Time per sample grows as processed samples grows

![Screenshot 2024-05-21 at 7 38 55 PM](https://github.com/Lightning-AI/litdata/assets/1028148/4f8b4c59-cfa2-47ce-8f18-8f971f8d9007) Another thing I noticed is that litdata, compared to the streaming dataset from MosiacML, underutilized the memory. The slow-down potentially coming from heavily...

Jiawei Ou

Add ContainerArguments to sagemaker.estimator.Estimator

Jupyter Notebook

Training slowed down as time progress with litdata streaming dataset

Training slowed down as time progress with litdata streaming dataset

Time per sample grows as processed samples grows

Time per sample grows as processed samples grows

Time per sample grows as processed samples grows

Time per sample grows as processed samples grows

Assert when deserializing `no_header_numpy` or `no_header_tensor`.

Assert when deserializing `no_header_numpy` or `no_header_tensor`.