Decide on Run vs. VCF & rename
User feedback indicates some confusion; do we really want to enforce new terminology?
:fearful: no file names please.
Well. We have an "Add BAM" button. So shouldn't the other file upload button follow suit?
I'd rather rename "Add BAM" to "Add reads" and "Add run" to "Add variants" or something similar. We should talk about the entities, not the file formats.
Why the push to avoid file formats?
That's my preference as well (hence my gradual migration from VCF to Run in Cycledash), but we need to think hard about how to communicate this to our users.
I think "variants" is clearer than "run", to start.
Why the push to avoid file names?
It'd be nice to support multiple serializations of the entities of interest; e.g. variants are serialized and annotated in VCFs, MAFs, AdamGenotypes, CSV (within CycleDash) & relations in a RDMS (within CycleDash, maybe via Impala later).
Supporting import from and export to various formats would be a great use of CycleDash, which just needs to know how to filter/transform/annotate the component variants/reads/etc.
@jaclynperrone file formats are artifacts of implementation and hide the scientific entity under consideration. Bioinformatics (and science in general) is broken because it thinks in terms of files, not entities. It's a plague. Cf. http://www.odbms.org/blog/2014/09/david-haussler.
Ahhh I see. I'll keep this in mind when talking to researchers. Would be interested to hear their assumptions when they see something like "Add Variants" or "Add Reads".
Yeah we're going to have to break a few bad habits. You're going to be asked for faster horses (i.e. better file format handling), we're going to try to push them to hop in a car.
I like "Add Variants" + "Add Reads"!
My gut feeling right now is to keep it as "Add BAM" and to change the other to say "Add VCF". But only until we support other file types. When those are on the table (which could be soon!) then we should regroup and figure out a naming convention that would encapsulate all of them. And by that point, we may already have enough user feedback to point us in a particular direction.
FWIW, I brought this up in the user interview today. I asked Nicole what she thought the "Add Variants" and "Add Reads" buttons would do. For "Add Variants", she would expect to manually type in a variant. As for the "Add Reads" button, she was hesitant at first. She first said "I don't know what that would do", but when pushed she said add a read depth.
Sure, I believe that longtime users of standard bioinformatics tools have really bad habits. I don't think we should design our product to reinforce those bad habits though.
From Leo at MSSM, when I asked him what "Runs" refers to: "One discreet instance where the sequencing machine pushes a sample through. Depending on the platform, a machine can (for example) sequence 50 samples or 1 sample in one go. Therefore, I would think a “run” is either a collection of those 50 samples, or just the 1."