Chun Cai
Chun Cai
Hi @Entropy-Enthalpy , When training a DPA model using the PyTorch backend, the training data for each step is randomly selected and packed as a batch of input. This means...
> @caic99 Would you please look into this issue? @hongriTianqi Sorry, I have no ideas on it.
Hi @PeizeLin , Would you please first check if OOM error happened.
@PeizeLin These might be related to errors in MPI communications. I noticed that your program will hang; you can try using `gdb attach` to analyze the cause.
@Satinelamp Please contact your cluster admin for accessing the computing node.