cycledash icon indicating copy to clipboard operation
cycledash copied to clipboard

Decide on Run vs. VCF & rename

Open ihodes opened this issue 10 years ago • 14 comments

User feedback indicates some confusion; do we really want to enforce new terminology?

ihodes avatar May 27 '15 17:05 ihodes

:fearful: no file names please.

hammer avatar May 27 '15 17:05 hammer

Well. We have an "Add BAM" button. So shouldn't the other file upload button follow suit?

jaclynperrone avatar May 27 '15 17:05 jaclynperrone

I'd rather rename "Add BAM" to "Add reads" and "Add run" to "Add variants" or something similar. We should talk about the entities, not the file formats.

hammer avatar May 27 '15 17:05 hammer

Why the push to avoid file formats?

jaclynperrone avatar May 27 '15 17:05 jaclynperrone

That's my preference as well (hence my gradual migration from VCF to Run in Cycledash), but we need to think hard about how to communicate this to our users.

I think "variants" is clearer than "run", to start.

ihodes avatar May 27 '15 17:05 ihodes

Why the push to avoid file names?

It'd be nice to support multiple serializations of the entities of interest; e.g. variants are serialized and annotated in VCFs, MAFs, AdamGenotypes, CSV (within CycleDash) & relations in a RDMS (within CycleDash, maybe via Impala later).

Supporting import from and export to various formats would be a great use of CycleDash, which just needs to know how to filter/transform/annotate the component variants/reads/etc.

ihodes avatar May 27 '15 17:05 ihodes

@jaclynperrone file formats are artifacts of implementation and hide the scientific entity under consideration. Bioinformatics (and science in general) is broken because it thinks in terms of files, not entities. It's a plague. Cf. http://www.odbms.org/blog/2014/09/david-haussler.

hammer avatar May 27 '15 17:05 hammer

Ahhh I see. I'll keep this in mind when talking to researchers. Would be interested to hear their assumptions when they see something like "Add Variants" or "Add Reads".

jaclynperrone avatar May 27 '15 17:05 jaclynperrone

Yeah we're going to have to break a few bad habits. You're going to be asked for faster horses (i.e. better file format handling), we're going to try to push them to hop in a car.

hammer avatar May 27 '15 17:05 hammer

I like "Add Variants" + "Add Reads"!

ryan-williams avatar May 27 '15 18:05 ryan-williams

My gut feeling right now is to keep it as "Add BAM" and to change the other to say "Add VCF". But only until we support other file types. When those are on the table (which could be soon!) then we should regroup and figure out a naming convention that would encapsulate all of them. And by that point, we may already have enough user feedback to point us in a particular direction.

jaclynperrone avatar May 27 '15 18:05 jaclynperrone

FWIW, I brought this up in the user interview today. I asked Nicole what she thought the "Add Variants" and "Add Reads" buttons would do. For "Add Variants", she would expect to manually type in a variant. As for the "Add Reads" button, she was hesitant at first. She first said "I don't know what that would do", but when pushed she said add a read depth.

jaclynperrone avatar May 28 '15 19:05 jaclynperrone

Sure, I believe that longtime users of standard bioinformatics tools have really bad habits. I don't think we should design our product to reinforce those bad habits though.

hammer avatar May 28 '15 19:05 hammer

From Leo at MSSM, when I asked him what "Runs" refers to: "One discreet instance where the sequencing machine pushes a sample through. Depending on the platform, a machine can (for example) sequence 50 samples or 1 sample in one go. Therefore, I would think a “run” is either a collection of those 50 samples, or just the 1."

jaclynperrone avatar Jun 09 '15 17:06 jaclynperrone