chongxiaoc issues

Results 9 issues of


                                            chongxiaoc

Spark: reduce freeing memory delay from reading parquet files

Call pyarrow.jemalloc_set_decay_ms(0). Signed-off-by: Chongxiao Cao ## Checklist before submitting - [ ] Did you read the [contributor guide](https://github.com/horovod/horovod/blob/master/CONTRIBUTING.md)? - [ ] Did you update the docs? - [ ] Did...

Test PTL 1.6.3

## Checklist before submitting - [ ] Did you read the [contributor guide](https://github.com/horovod/horovod/blob/master/CONTRIBUTING.md)? - [ ] Did you update the docs? - [ ] Did you write any tests to...

Reader: shuffle row groups

[Tensorflow] Support tf.dataset.repeat() to avoid duplicating and dropping samples in one epoch with shuffle?

Current implementation of `make_petastorm_dataset` for tensorflow doesn't support multiple iterations:https://github.com/uber/petastorm/blob/7f37e8dde6ff1b13f055d22a6289e2de8bb5d473/petastorm/tf_utils.py#L370 It is recommended to set reader's `num_epochs` > 1 to support multiple iterations. This will cause possible duplication and drop...

question

Refactor inmem cache out of BatchedDataLoader, create an inmem dataloader instead?

I think BatchedDataLoader is dealing with the case files are larger than memory, so it streams rows from disk into memory, and shuffles data in the meanwhile. However, if in-memory...

question

Implementing Asynchronous Data Shuffling Part

Existing dataloader implementation is sequential, for common use case, it limits training speed if data shuffle part is slow. This happens when user defines a pretty large `shuffling queue capacity`,...

Failure in import: undefined symbol error from Python 3.7 + CUDA113

I'm using Python 3.7 and CUDA113. I tried `fbgemm-gpu` and `fbgemm-gpu-nightly` from pip, both versions failed in import: ``` [root@/ml-code/data/michelangelo/examples/torchrec_example/test #]pip3 show fbgemm-gpu Name: fbgemm-gpu Version: 0.1.2 Summary: UNKNOWN Home-page:...

[llm_text_generation] RuntimeError: Expected all tensors to be on the same device,

**Describe the bug** ``` RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in...

[Code] best_model_path in ModelCheckpointCallback (rank 0 and driver node)

Driver node and rank 0 use same path to save and load weights in ModelCheckpointCallback. It is possible driver node and rank 0 are not on the same machine, or...