YeBin2018
Results
2
comments of
YeBin2018
I tried using the default value of cross_device_ops, and now it get stuck, repeating print log "Local rendezvous recv item cancelled. Key hash: 15504120126296904051". Anyone knows something about this?
Sorry, it is not convenient to provide the source code because it may involve company secrets. Our environment is: H800 machine, one machine has eight cards, using all-reduce architecture. The...