Error with relion5 using 2D classification on aws g6 instances
Hi there I am using relion5 running via SGE/qsub on aws clusters.
So far everything was running fine on g5 instances which use a NVIDIA A10G Tensor Core GPUs. We now switched to g6 instances which use NVIDIA L4 Tensor Core GPUs. During 2D classification we get the error: "failed to create cuffs plan".
Any idea what could be wrong?
Thanks and best
Toby
Environment:
- OS: Ubuntu 18.04.5 LTS
- MPI runtime: [e.g. OpenMPI 2.0.1]
- RELION version: Relion 5.0
- Memory: 192 GB
- GPU: NVIDIA L4 Tensor Core GPU
Dataset:
- Box size: 180 pix
- Pixel size: 0.71 Å/px
- Number of particles: 50,000
Job options:
- Type of job: Class2D
- Number of MPI processes: 1
- Number of threads: 12
- Full command:
which relion_refine--o Class2D/job010/run --grad --class_inactivity_threshold 0.1 --grad_write_iter 10 --iter 100 --i Extract/job006/particles.star --dont_combine_weights_via_disc --pool 30 --pad 2 --ctf --tau2_fudge 2 --particle_diameter 198.0 --K 25 --flatten_solvent --zero_mask --center_classes --oversampling 1 --psi_step 12 --offset_range 5 --offset_step 2 --norm --scale --j 12 --gpu "0,1,2,3" --pipeline_control Class2D/job010/
Error message:
in: /relion/src/projector.cpp, line 362 ERROR: failed to create cufft plan === Backtrace === /opt/relion/bin/relion_refine(_ZN11RelionErrorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES7_l+0x77) [0x56106c48bbd7] /opt/relion/bin/relion_refine(_ZN9Projector26computeFourierTransformMapER13MultidimArrayIdES2_iibbiPKS1_b+0x36a3) [0x56106c52a8c3] /opt/relion/bin/relion_refine(_ZN7MlModel23setFourierTransformMapsEbidPK13MultidimArrayIdE+0x901) [0x56106c69d271] /opt/relion/bin/relion_refine(_ZN11MlOptimiser16expectationSetupEv+0x5a) [0x56106c4b16ea] /opt/relion/bin/relion_refine(_ZN11MlOptimiser11expectationEv+0x34) [0x56106c4e1824] /opt/relion/bin/relion_refine(_ZN11MlOptimiser7iterateEv+0x37a) [0x56106c4fd63a] /opt/relion/bin/relion_refine(main+0x51) [0x56106c476c91] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x14c1b6623bf7] /opt/relion/bin/relion_refine(_start+0x2a) [0x56106c47a5ea]
ERROR: failed to create cufft plan
Which version of CUDA did you use to compile RELION? Is it compatible with "Ubuntu 18.04.5 LTS"? This is very very old OS and you shouldn't use it.
Did you specify CUDA_ARCH? (You shouldn't, if you want to share the binary with different GPUs).