1-based indexing of output volumes instead of 0-based
Describe the bug This is not a bug report, only a proposal for a little improvement in user experience.
CryoDRGN numbers all the maps it generates from 0, but ChimeraX numbers all the maps it opens from 1, and this trips me up every single time I look at maps from cryoDRGN.
To Reproduce
cd /path/to/kmeans20
chimerax vol_???.mrc
Then, vol_000.mrc has model ID 1 in ChimeraX, vol_001.mrc has model ID 2, and so on. I always look at the model ID column in the model panel in ChimeraX, but the number in this column doesn't match the number in the UMAP plot.
Expected behavior
It would be a lot easier if the numbering from cryoDRGN started at vol_001.mrc. Or if ChimeraX numbered its open models from 0, if you can convince them that they should be the ones changing their software.
Additional context
It is definitely possible to look at the filename column, where the correct vol_???.mrc are listed, instead of the model ID column. But it is confusing when one got into the habit of ignoring the filename column (I got into this habit because output files from two RELION jobs have the same names, so in this situation only the model ID column is informative to remember which file comes from which job).
Maybe we could have a flag in cryodrgn analyze for the start index of the volume numbering.
@Guillawme, @zhonge - ok, so I'm seeing 3 places where volumes are generated by cryoDRGN and this option (start index for volume numbering) would take effect:
cryodrgn analyze command:
kMeans<k>/ folder, wherekis the number of k-means samples generated.pc<1>topc<N>folders, whereNis the number of PC traversals generated.
cryodrgn eval_vol command:
<output_dir>/folder, whereoutput_diris the output directory specified to this command.
Let me know if there are any more.
EDIT - I'm seeing one more place where this functionality would need to be added to keep things consistent:
cryodrgn analyze_landscape command - this is essentially similar to analyze in that it generates many volumes in kMeans<k> and pc<1> to pc<N> folders.
This is all I can think of too.
Now another question: since this is becoming an option, what should be the default value? 0, to keep behavior consistent with previous versions of cryoDRGN? Or 1, to reduce friction by default?
I'd say we keep it at 0 for now. I'll add the rationale for this flag in the documentation so you (and other users like you who're using chimera) and benefit from it by overriding it. At some future point we can modify the default to be 1.
@Guillawme - we added a --vol-start-index flag (default value 0) to cryodrgn analyze command. Can you try this out and see if it addresses your use case? If so, I'll close this issue.
Hello!
Very sorry it took me so long to get back to this.
I have finally tested it, and it works nicely. This is so much easier to read now:
I think it will be beneficial if the default value becomes 1 in a future version.
Re-opening this issue because the numbering is still off in the UMAP and PCA plots found in the kmeans directory after running cryodrgn analyze. This is apparent in the very small cluster on the left in this UMAP plot:
And also in this PCA plot:
While the volumes from this job start at vol_001.mrc because I used --vol-start-index 1.