relion
relion copied to clipboard
DynaMight running issue "Index tensor must have the same number of dimensions as self tensor"
DynaMight flexibility run issue I have recently installed relion5 on our computing clusters. The 3D classification job and auto refinement job with Blush regularization can run successfully. However, the DynaMight flexibility jobs persistently failed with the error message pasted at the end. Could you help me resolve this?
Environment:
- OS: CentOS7
- MPI runtime: openmpi 4.1.1
- RELION version: RELION-5.0-beta-0-commit-70875e
- Memory: 200GB requested with sbatch
- GPU: A40 with 46GB memory
Dataset:
- Box size: 320 px
- Pixel size: 0.996 A
- Number of particles: 58890
- Description: 250 kDa transmembrane protein with multiple fragments bound
Job options:
- Type of job: DynaMight Flexbility
- Number of MPI processes: 1
- Number of threads: 8
- Full command:
relion_python_dynamight optimize-deformations --refinement-star-file Refine3D/job025/run_data.star --output-directory DynaMight/job027/ --initial-model PostProcess/job026/postprocess_masked.mrc --n-gaussians 21000 --initial-threshold 0.0025 --regularization-factor 1 --n-threads 8 --preload-images --gpu-id 0 --pipeline-control DynaMight/job027/
Error message:
Initializing the particle dataset
Assigning a diameter of 299 angstrom
Number of particles: 58890
Initialized data loaders for half sets of size 26500 and 26501
consensus updates are done every 1 epochs.
box size: 320 pixel_size: 0.996 virtual pixel_size: 0.003115264797507788 dimension of latent space: 5
Number of used gaussians: 21000
Optimizing scale only
Initializing gaussian positions from reference
100%|##########| 50/50 [00:31<00:00, 1.60it/s]
Final error: 2.1116328241532756e-07
Optimizing scale only
Initializing gaussian positions from reference
100%|##########| 50/50 [00:31<00:00, 1.61it/s]
Final error: 2.1116328241532756e-07
consensus gaussian models initialized
consensus model initialization finished
mean distance in graph for half 1: 1.8233627080917358 Angstrom ;This distance is also used to construct the initial graph
mean distance in graph for half 2: 1.8233627080917358 Angstrom ;This distance is also used to construct the initial graph
Computing half-set indices
100%|##########| 47/47 [00:03<00:00, 12.25it/s]
98%|#########7| 46/47 [00:14<00:00, 3.27it/s]
Index tensor must have the same number of dimensions as self tensor
Possibly related: https://github.com/3dem/DynaMight/issues/8
Hi Sunchang,
This is probably a bug, where the batch size number is such that there is a single image in one batch. I will fix that today. A workaround would be to force the batch size to a number such that the remainder is not 1 for all dataloaders. I think --batch-size 120 would work in your case.
I hope this helps,
Johannes
Thank you, Johannes, for your quick response!
I can confirm that by adding the --batch-size 120
option, DynaMight training appears to run correctly.
For your information, one new warning message does pop up, but it doesn't stop the training.
/relion5.0/lib/python3.10/sitepackages/dynamight/deformations/optimize_deformations.py:1295:
UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach()
or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
'indices_half1': torch.tensor(half1_indices)