cryodrgn `downsample` fails using `.cs` file from heterogeneous refinement
Hi,
Sometimes when performing 3D classification without alignments on particularly heterogeneous datasets, I find that starting using poses from a classification, rather than a single class refinement, yields better results.
I would like to try the same strategy using cryodrgn, but when I perform cryodrgn downsample using the all_classes particles.cs file from cryosparc heterogeneous refinement, I get the attached error. Any suggestions for getting this to work? I get the same error when using a particles.cs file from a single class.
Cheers Oli
(cryodrgn2) cp ~/processing/cryosparc_projects/francesca/P2/J3522/cryosparc_P1_J3522_00105_particles.cs .
(cryodrgn2) cryodrgn downsample cryosparc_P1_J3522_00105_particles.cs -D 64 --datadir ~/processing/cryosparc_projects/francesca/P2/ -o ank_preclassify_downsample_64.mrcs
Traceback (most recent call last):
File "/usr/local/envs/cryodrgn2/bin/cryodrgn", line 8, in <module>
sys.exit(main())
File "/usr/local/envs/cryodrgn2/lib/python3.9/site-packages/cryodrgn/__main__.py", line 64, in main
args.func(args)
File "/usr/local/envs/cryodrgn2/lib/python3.9/site-packages/cryodrgn/commands/downsample.py", line 44, in main
old = dataset.load_particles(args.mrcs, lazy=lazy, datadir=args.datadir)
File "/usr/local/envs/cryodrgn2/lib/python3.9/site-packages/cryodrgn/dataset.py", line 34, in load_particles
particles = starfile.csparc_get_particles(mrcs_txt_star, datadir, lazy)
File "/usr/local/envs/cryodrgn2/lib/python3.9/site-packages/cryodrgn/starfile.py", line 153, in csparc_get_particles
ind = metadata['blob/idx'] # 0-based indexing
ValueError: no field of name blob/idx
(cryodrgn2)
Assuming you haven't done any re-extractions before heterogeneous refinement, you could run downsample on the consensus particle stack. Could you send us the .cs file to see why the blob/idx field is missing?
By the way, you'll need to use the flag --hetrefine in cryodrgn parse_pose_cs. Fwiw, I haven't used this in a very long time, and cryoSPARC may have changed things around under the hood... You should check the poses with cryodrgn backproject_voxel. Let me know how it goes!
Thanks Ellen! Here is the .cs file:
https://www.dropbox.com/s/ge4cxvozp1zgd7o/cryosparc_P1_J3522_00105_particles.cs?dl=0
Thanks @olibclarke - and sorry for the late reply on this. I looked at the attached .cs file and see the following 18 fields (all numeric) for each of 6 alignment classes across 710,437 particles:
alignments_class_0/split
alignments_class_0/shift
alignments_class_0/pose
alignments_class_0/psize_A
alignments_class_0/error
alignments_class_0/error_min
alignments_class_0/resid_pow
alignments_class_0/slice_pow
alignments_class_0/image_pow
alignments_class_0/cross_cor
alignments_class_0/alpha
alignments_class_0/alpha_min
alignments_class_0/weight
alignments_class_0/pose_ess
alignments_class_0/shift_ess
alignments_class_0/class_posterior
alignments_class_0/class
alignments_class_0/class_ess
So it's missing the blob/path and blob/idx fields that would tell cryoDRGN where to find the .mrcs files (and how to index inside them).
I'm not very familiar with cryoSparc processing, but is it possible that we need to ask it to merge the output .cs with particle location information before exporting?
hmmm perhaps - maybe it is in the passthrough.cs file...