cryodrgn icon indicating copy to clipboard operation
cryodrgn copied to clipboard

1-based indexing of output volumes instead of 0-based

Open Guillawme opened this issue 3 years ago • 7 comments

Describe the bug This is not a bug report, only a proposal for a little improvement in user experience.

CryoDRGN numbers all the maps it generates from 0, but ChimeraX numbers all the maps it opens from 1, and this trips me up every single time I look at maps from cryoDRGN.

To Reproduce

cd /path/to/kmeans20
chimerax vol_???.mrc

Then, vol_000.mrc has model ID 1 in ChimeraX, vol_001.mrc has model ID 2, and so on. I always look at the model ID column in the model panel in ChimeraX, but the number in this column doesn't match the number in the UMAP plot.

Expected behavior It would be a lot easier if the numbering from cryoDRGN started at vol_001.mrc. Or if ChimeraX numbered its open models from 0, if you can convince them that they should be the ones changing their software.

Additional context It is definitely possible to look at the filename column, where the correct vol_???.mrc are listed, instead of the model ID column. But it is confusing when one got into the habit of ignoring the filename column (I got into this habit because output files from two RELION jobs have the same names, so in this situation only the model ID column is informative to remember which file comes from which job).

Guillawme avatar Sep 30 '22 08:09 Guillawme

Maybe we could have a flag in cryodrgn analyze for the start index of the volume numbering.

zhonge avatar Oct 09 '22 14:10 zhonge

@Guillawme, @zhonge - ok, so I'm seeing 3 places where volumes are generated by cryoDRGN and this option (start index for volume numbering) would take effect:

cryodrgn analyze command:

  • kMeans<k>/ folder, where k is the number of k-means samples generated.
  • pc<1> to pc<N> folders, where N is the number of PC traversals generated.

cryodrgn eval_vol command:

  • <output_dir>/ folder, where output_dir is the output directory specified to this command.

Let me know if there are any more.

EDIT - I'm seeing one more place where this functionality would need to be added to keep things consistent: cryodrgn analyze_landscape command - this is essentially similar to analyze in that it generates many volumes in kMeans<k> and pc<1> to pc<N> folders.

vineetbansal avatar Oct 10 '22 14:10 vineetbansal

This is all I can think of too.

Now another question: since this is becoming an option, what should be the default value? 0, to keep behavior consistent with previous versions of cryoDRGN? Or 1, to reduce friction by default?

Guillawme avatar Oct 10 '22 15:10 Guillawme

I'd say we keep it at 0 for now. I'll add the rationale for this flag in the documentation so you (and other users like you who're using chimera) and benefit from it by overriding it. At some future point we can modify the default to be 1.

vineetbansal avatar Oct 10 '22 15:10 vineetbansal

@Guillawme - we added a --vol-start-index flag (default value 0) to cryodrgn analyze command. Can you try this out and see if it addresses your use case? If so, I'll close this issue.

vineetbansal avatar Dec 08 '22 17:12 vineetbansal

Hello!

Very sorry it took me so long to get back to this.

I have finally tested it, and it works nicely. This is so much easier to read now:

Screenshot from 2023-07-06 14-09-14

I think it will be beneficial if the default value becomes 1 in a future version.

Guillawme avatar Jul 06 '23 12:07 Guillawme

Re-opening this issue because the numbering is still off in the UMAP and PCA plots found in the kmeans directory after running cryodrgn analyze. This is apparent in the very small cluster on the left in this UMAP plot:

umap

And also in this PCA plot:

z_pca

While the volumes from this job start at vol_001.mrc because I used --vol-start-index 1.

Guillawme avatar Jul 10 '23 08:07 Guillawme