Denghui Lu
Denghui Lu
It looks like an error occurred during the training process. The output of lcurve.out should not be nan. Which version of deepmd-kit are you using in the dcu environment?
> While improvement comes in #913, these 3 functions seems have problem in unit tests for 100% code coverage. > > * [ ] deepmd.cluster.slurm._pad_zeros > * [ ] deepmd.cluster.slurm._expand_ids...
It seems that on non-data center GPU cards, the transfered model has an impressive speedup performance. I have tested the new model in a local 1080ti environment and achieved a...
> > @denghuilu Are they the same model? The output looks different. No, they are not the same models, `ascend method transfer model` was casted from the `original model`. There...
I have reviewed each STDOUTER.log file and found that the abnormal stops were caused by an Out of Memory error. ``` COMMAND: echo ks_solver cg >> INPUT; bash run.sh -o...
This was caused by the execution of the `hipblasCreate` function within the math_kernel_op.hip.cu (called by the ESolver_KS_PW init function). ``` void createGpuBlasHandle(){ if (cublas_handle == nullptr) { hipblasErrcheck(hipblasCreate(&cublas_handle)); } }...
Similar to issues #4017, after upgrade the DTK version, no error founded.
> I use bohrium `4 * NVIDIA GPU_24g` run this example, the calculation is successful. It indicates that 4*24 G memory is enough for gpu. > > I also try...
LGTM! However, I find the namespace `base_device` a bit confusing since it's not a common convention. I've never seen a namespace used like this before. Would it be possible to...
The error encountered appears to be an Out of Memory (OOM) issue, as indicated by the program's output. The computation of stress typically demands additional device memory, which may lead...