chongxiaoc

Results 37 comments of chongxiaoc

``` Now the concern is how to feed the validation dataset after each training epoch and which GPU will process the validation dataset if multiple GPUs are available. ``` Validation...

`steps_per_epoch` is a pamater of keras fit function to tell how many batches to use per epoch on single GPU. Users use this paramter to explicitly force model to be...

> It means it is not compulsory to mention, right? Right. But users have to make every GPU has equivalent data to train per epoch (depends on data distribution). Horovod...

Attach some benchmark result using synthetic dataset with PyTorch: - Experiment setup: `BatchedDataLoader + make_batch_reader()`, batch_size=50000, shuffle_buffer_size=1000000, thread_pool, 10 workers. - Datasets from 100M rows to 1.6B rows are tested....

Had a discussion offline, start to work on moving the bottleneck `data shuffling part` into an asynchronous thread. https://github.com/uber/petastorm/blob/cf1159dc04416ed737eec25bcecef5d5aafa805a/petastorm/pytorch.py#L373 And see how it can hide/overlap the latency.

Just noticed this nice work. @jarandaf Thanks! Is it doable to support shuffle to be parallel as well? I think usually shuffling is the bottleneck and petastorm uses single thread...

hmm, this is specific logic from `databricks` about model serialization and deserialization. Not familiar with this.

This is the issue: `Tue Feb 7 15:26:13 2023[0]:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [2,3]`