CUTS
CUTS copied to clipboard
Retina-generate_kmeans
Hello! I finished training the encoder on retina dataset, however, there went some problem when I wanted to use generate_kmeans and generate_diffusion. I tried the tips that you mentioned about the 'deadlock', but it still cannot work. The only script can work is generate_baseline.py.
Hi.
By far, the known problem is that kmeans and diffusion requires some decent RAM (e.g., on slurm, I would need to set --mem=20G for it to successfully run). If you are using a service like slurm, you may need to request more memory.
If that is not the issue,
- Did you try using the latest code?
- Can you give more details (perhaps screenshots) of the error?
I use a single A100 for experiment. Maybe it will not be the problem? I donot find --mem=20G in generate_xx,py 1.yes, I updated code already. 2.
Oh I might have said something confusing. --mem=20G is the setting for running a slurm job on a server. It means we need to request 20GB of RAM for that job in order to run the script successfully. If you don't have enough RAM it may be a problem. But in most cases if you are running on a server that does not have job allocation, you shall have more than enough RAM
A single A100 shall be more than enough.
Regarding your screenshot:
What if you do not use the --rerun argument? I recent found that to be more helpful.
Acctually, I tried --rerun already. However it cannot work.
When using generate_diffusion.py, it looks like get into deadlock also?
Unfortunately I don't really understand the root cause of the problem.
So far the setting that works on my end is:
- Do NOT use the
--rerunflag. - Make sure you have around 20GB of RAM. (I have not tested to see the limits, but what I can say is that <=10GB of RAM will not work on my server).
If this still does not work, the following resources might be helpful.
- You may try running
export MKL_THREADING_LAYER=GNUbefore running thegenerate_diffusion.pyorgenerate_kmeans.py. - Additional resources at https://github.com/joblib/threadpoolctl/blob/master/multiple_openmp.md
I set the export MKL_THREADING_LAYER=GNU on Linux, but it still remain 'deadlock
Thanks for trying these out. At this moment I am basically clueless. Sorry for not being able to be more helpful.
I am still suspecting it's a RAM problem but I don't have a solid proof.
I have updated the code for generating kmeans and diffusion condensation. I believe it may be good now?