cryodrgn
cryodrgn copied to clipboard
Support .cs file writing for export to cryoSPARC
We should have a tool cryodrgn_utils write_cs
to streamline re-importing particles to cryoSPARC.
If the input to cryoDRGN originated from cryoSPARC, this tool could keep certain (all?) fields like the uid
from a reference .cs file. Then, the reimport to cryoSPARC wouldn't define an entirely new dataset in their database.
I'm not 100% on what we would need to implement yet, but maybe we can refine the idea in this thread.
Related to issues #72, #101, #148.
One additional complexity is that particles are typically downsampled before training cryoDRGN, so the question is do we want the .cs file to point to the (new) downsampled particle stack or refer to the original extracted particles (more error-prone).
In the latter case, the information that cryoDRGN provides is an index filtering. So maybe it makes sense to have a cryodrgn_utils filter_cs
tool instead.
Being able to send a selection of particles back to cryoSPARC would be so useful!
And I think it is best as only a selection, pointing to the original particles in the cryoSPARC project (not re-importing the downsampled particles), because the typical use case is to refine a subset of particles to high resolution, using the particles at their original pixel size.
Related to issues https://github.com/zhonge/cryodrgn/issues/72, https://github.com/zhonge/cryodrgn/issues/101, https://github.com/zhonge/cryodrgn/issues/148.
Also #109
Just recapping some salient points from discussions with @zhonge about this:
- The
write_star
command currently takes in a.mrcs
/.txt
file, a required ctf.pkl
, an optional index.pkl
, an optional poses.pkl
- The command will be enhanced to take in a
.star
input in addition to.mrcs
/.txt
- The
--ref-star
,--keep-micrograph
, and--copy-header
flags will go away. - If a
.star
is provided as input- All inputs other than an optional index
.pkl
will be ignored (and checked to make sure they are not specified). - All fields from the input
.star
will be carried over to the output.star
file (including micrograph attributes, the original image name as _rlnImageName, ctf/pose information, anything else). Essentially the output.star
will have exactly the same columns as in the input.star
. - If an index
.pkl
is provided, the output.star
will be filtered to have only those rows (row numbers are assumed 0-indexed).
- All inputs other than an optional index
- If a
.mrcs
/.txt
file is provided as input, the behavior will remain unchanged.
A corresponding write_cs
command will be implemented exactly as outlined above, except that it will take in either an input .cs
file or an input .mrcs
/.txt
file. The command will mimic the behavior above, except that for .cs
files.
A couple of additional points I'd like to propose here:
- The
cryodrgn filter_star
command will still be supported for existing users, though it will be refactored to internally usewrite_star
. ADeprecationWarning
will be emitted on its use and it may become unsupported at some future point (write_star
can provide filtering as outlined above). - A
cryodrgn filter_cs
command will not be implemented (cryodrgn write_cs
can provide filtering as outlined above).
Seems like a great plan!
Sorry to bring this up again (https://github.com/zhonge/cryodrgn/pull/70#issuecomment-941061328), maybe you already discussed it and chose to not use an external library, but just in case you didn't: check out the starfile
library, its goal is compatibility with star files from RELION, and it might simplify all your handling of star files (it turns star files into pandas dataframes and vice versa).
Hi @Guillawme - I agree on using the starfile
library. However, I'd like to handle that as a separate issue so the migration can be independent of anything we do here. I'll create an issue on that and hopefully we can get it done quickly.
I think write_cs
will also need to write a .csg
file, right? In order to be able to import into Csparc using "Import Result Group"?
e.g. something like this - simple metadata file (see here for details):
group:
description: A stack of imported particles. May or may not contain data, ctfs, pick
locations, etc.
name: imported_particles
title: Imported particles
type: particle
results:
blob:
metafile: '>J4539_imported_particles_exported.cs'
num_items: 1616
type: particle.blob
ctf:
metafile: '>J4539_imported_particles_exported.cs'
num_items: 1616
type: particle.ctf
version: v4.0.1```
Here's how to reimport a particle stack filtered by cryoDRGN back into cryoSPARC, while pointing to the original particles in the cryoSPARC project.
- Export the original cryoSPARC particles from the associated Job so that there's a single
.cs
and.csg
file describing the particle stack (i.e. noPXXX_JYYY_passthrough_particles.cs
file). You can do this with the "Export" button in the Outputs tab.
-
You'll find the
.cs
and.csg
files in theexports
subdirectory of your project directory:/path/to/project/directory/PXXX/exports/groups/JYYY_particles
-
Filter the .cs file with the index selection .pkl file using
cryodrgn_utils write_cs
. For example, here is the command to filterJ929_particles_exported.cs
by a selection saved inind_keep.214511_particles.pkl
and save out a newJ929_particles_filtered.cs
file:
(cryodrgn) $ cryodrgn_utils write_cs J929_particles_exported.cs --ind ind_keep.214511_particles.pkl -o J929_particles_filtered.cs
- Make a copy of the
.csg
text file and replace themetafile
field with the new .cs filename and thenum_items
field with the new number of particles. Here's a comparison of the before and after:
(cryodrgn) [Sat Mar 11 23:50 J929_particles] sdiff J929_particles_exported.csg J929_particles_filtered.csg
created: 2023-03-12 03:52:35.411011 created: 2023-03-12 03:52:35.411011
group: group:
description: All particles that were processed, including a description: All particles that were processed, including a
name: particles name: particles
title: All particles title: All particles
type: particle type: particle
results: results:
alignments2D: alignments2D:
metafile: '>J929_particles_exported.cs' | metafile: '>J929_particles_filtered.cs'
num_items: 286801 | num_items: 214511
type: particle.alignments2D type: particle.alignments2D
alignments3D: alignments3D:
metafile: '>J929_particles_exported.cs' | metafile: '>J929_particles_filtered.cs'
num_items: 286801 | num_items: 214511
type: particle.alignments3D type: particle.alignments3D
blob: blob:
metafile: '>J929_particles_exported.cs' | metafile: '>J929_particles_filtered.cs'
num_items: 286801 | num_items: 214511
type: particle.blob type: particle.blob
ctf: ctf:
metafile: '>J929_particles_exported.cs' | metafile: '>J929_particles_filtered.cs'
num_items: 286801 | num_items: 214511
type: particle.ctf type: particle.ctf
location: location:
metafile: '>J929_particles_exported.cs' | metafile: '>J929_particles_filtered.cs'
num_items: 286801 | num_items: 214511
type: particle.location type: particle.location
pick_stats: pick_stats:
metafile: '>J929_particles_exported.cs' | metafile: '>J929_particles_filtered.cs'
num_items: 286801 | num_items: 214511
type: particle.pick_stats type: particle.pick_stats
version: v4.1.2 version: v4.1.2
- In cryoSPARC, use the "Import Results Group" job type and reimport the new
.csg
file. :tada:
We can probably have cryodrgn_utils write_cs
write out the csg
file as well as @olibclarke suggested to skip over Step 4. It may be worth looking at the new cryosparc-tools
API to see if there's a better way to write out the .csg
file.
I just tried this and it seems it is going to work! :tada:
(The file was generated, but I will know for sure when I'm able to copy these newly generated .csg
and .cs
files to the correct location; on our cluster we don't have write permission to the cryosparc project directory, but cryosparc will only import result groups from there, so I need somebody else to copy the files for me or change permissions.)
It would be great for usability if this tool could work this way (merging steps 3 and 4 above, as you say):
- we point it to the original
.csg
file (in the cryosparc exports directory) and theind.pkl
file (from the cryoDRGN job), and provide a file name for the.csg
file to be newly created - the tool then automatically
- finds the original
.cs
file to filter (the one that the original.csg
file points to) - saves the filtered
.cs
file in the current directory with the same base name as the newly created.csg
file - makes a copy of the original
.csg
file to the file name provided - and finally edits this newly created
.csg
file to point to the filtered.cs
file and contain the correct number of particles.
- finds the original