goleft icon indicating copy to clipboard operation
goleft copied to clipboard

Suggestions : paths ; case/control

Open lindenb opened this issue 7 years ago • 3 comments

Hi Brent , here are two suggestions for indexcov

  • using a file containing the path to the bams (to avoid something like xargs )

  • if we could include the fact that some samples are 'cases' or 'controls', would it improve your algorithm ?

thanks

lindenb avatar Sep 12 '18 20:09 lindenb

could you expand on the first point? you mean you want to avoid argument list too long error or something?

for the 2nd point, indexcov only does within sample normalization, not between sample. I did have a mode where you could specify that the first $N samples were of interest and the remaining were background--to give an idea of how a maybe small $N looks given a large background, but I removed this as it made the code and interface more complex. I'm hesitant to revisit, but I might be convinced.

brentp avatar Sep 13 '18 18:09 brentp

? you mean you want to avoid argument list too long error or something?

yes. Something like what the broad is doing with the '.list' suffix: https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_engine_CommandLineGATK.php#--input_file

An input file containing sequence data mapped to a reference, in BAM or CRAM format, or a text file containing a list of input files (with extension .list).

I'm hesitant to revisit, but I might be convinced

I wouldn't be able to convince you. I wondered if there was something to explore here.

lindenb avatar Sep 13 '18 18:09 lindenb

? you mean you want to avoid argument list too long error or something?

yes. Something like what the broad is doing with the '.list' suffix: https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_engine_CommandLineGATK.php#--input_file

that's doable, though I suspect if your command is that long then it'll be hard to do much with the output. Though I guess you have your own viewer that can overcome the limitations of the html one included. Let's keep this open and I'll try to get around to adding that.

An input file containing sequence data mapped to a reference, in BAM or CRAM format, or a text file containing a list of input files (with extension .list).

I'm hesitant to revisit, but I might be convinced

I wouldn't be able to convince you. I wondered if there was something to explore here.

There may be, but I don't have the bandwidth for now.

brentp avatar Sep 14 '18 02:09 brentp