Results 39 comments of botbw

# trace previous now # benchmark before colossalai run --nproc_per_node 8 --hostfile hosts.txt benchmark.py -g -x -b 4 -s 100 ``` num_samples: 392, dp_world_size: 8, flop_megatron: 9.7808637396779e+16, flop: 86555325938794496, avg_duration:...

# LLaMa trace (static) ## Prefetch = 0 ## Prefetch = 10 # Benchmark

Hey @Little-devil1 , this should have been resolved and you are welcome to pull the main branch and try again.

> Hello, after pulling the main branch, using the above configuration in 8XH100(80G) for training test, the above situation will still appear, is it the 8XH100(80G) memory problem? > @Little-devil1...

> > > Hello, after pulling the main branch, using the above configuration in 8XH100(80G) for training test, the above situation will still appear, is it the 8XH100(80G) memory problem?你好,在拉取主分支后,使用上述配置在...

@hadipash Could you please specify any exact bugs this usage causes? Poping a config might be useful if we don't want it to be consumed twice (but I'm not sure...

> @botbw It doesn't cause bugs with the current configs (as they use `i2v` configuration only), but if one were to add `v2v` configurations, a single short video would cause...

@zhengzangw A tiny change for the robustness of open-sourced code.

@xilanhua12138 At this point, I would suggest: 1. set `pin_memory_cache_pre_alloc_numels = None` in `cfg` or `train.py`. 2. set `pin_memory = False` when initializing dataloader in `train.py`. Note: you might still...

@tianyu-l Thanks for sharing the information! Regarding point 2 and 3, I'm not sure why `ProcessGroup` initialization affects p2p comm, could please further explain it? I thought that all P2P...