Ruochi Zhang
Ruochi Zhang
Hey, did you train the model on GPU device or CPU device, and what would be the CPU / GPU utilization.
Could you try to run `nvidia-smi -q -d Memory |grep -A4 GPU|grep Free` and `nvidia-smi -q -d Memory |grep -A4 GPU` in you command line and see what it returns....
Hi, Thank you for your interest. Yeah, with the newer version of tensorflow, I stopped maintaining the code for the random walk part. The answer is that if you use...
the latter. training both models while offer some advantages when we did the benchmarking in the paper. But in some later applications of the model to other datasets, I found...
Do you see nan loss during the training? Based on what I've tested, if you turn off amp=True, it resolves nan loss under certain cases (with the cost of being...
For debugging purpose, if you set, OMP_NUM_THREADS=1 in environment, would that change the observed behavior? I agree that this multiprocessing + global variable function is probably the stuck point.
That's very strange.... Cuz these 3 parameters would have default values anyway...
I think the figure and notebook are not finished uploading... I cannot open them. The conda installation mode is a bit tricky here, could you try it with pip install...
Could you try the git clone repo then `pip install .` method? The conda version is a bit out dated, I will need some time to recompile it.
Hum. Could you try to re-run that with fewer cpu workers. Or you can try to do this, to increase the maximum number of open files: ``` # Check current...