botbw
botbw
# trace previous now # benchmark before colossalai run --nproc_per_node 8 --hostfile hosts.txt benchmark.py -g -x -b 4 -s 100 ``` num_samples: 392, dp_world_size: 8, flop_megatron: 9.7808637396779e+16, flop: 86555325938794496, avg_duration:...
# LLaMa trace (static) ## Prefetch = 0 ## Prefetch = 10 # Benchmark
Hey @Little-devil1 , this should have been resolved and you are welcome to pull the main branch and try again.
> Hello, after pulling the main branch, using the above configuration in 8XH100(80G) for training test, the above situation will still appear, is it the 8XH100(80G) memory problem? > @Little-devil1...
> > > Hello, after pulling the main branch, using the above configuration in 8XH100(80G) for training test, the above situation will still appear, is it the 8XH100(80G) memory problem?你好,在拉取主分支后,使用上述配置在...
@hadipash Could you please specify any exact bugs this usage causes? Poping a config might be useful if we don't want it to be consumed twice (but I'm not sure...
> @botbw It doesn't cause bugs with the current configs (as they use `i2v` configuration only), but if one were to add `v2v` configurations, a single short video would cause...
@zhengzangw A tiny change for the robustness of open-sourced code.
@xilanhua12138 At this point, I would suggest: 1. set `pin_memory_cache_pre_alloc_numels = None` in `cfg` or `train.py`. 2. set `pin_memory = False` when initializing dataloader in `train.py`. Note: you might still...
@tianyu-l Thanks for sharing the information! Regarding point 2 and 3, I'm not sure why `ProcessGroup` initialization affects p2p comm, could please further explain it? I thought that all P2P...