downstream analyses: expected counts vs counts
What are pros and cons of using expected counts from proseg rather than the integer count matrix produced when exporting to 10x format ? Which is recommended for use in downstream analyses?
Also, is there a way to regenerate the expected count matrix from the transcripts table ? This would be useful, for example, if one wants to filter out some transcripts based on a transcript QC metric after running proseg.
Many thanks!
My recommendation is to use expected counts if you can. Because proseg is a sampler, this is essentially what it natively computes (that is, parameters averaged over multiple samples). The optional integer point estimates shouldn't be drastically different, though.
I don't have code to compute a matrix from the transcript metadata, though it shouldn't be too hard to implement. There is a --min-qv option to have proseg filter out low quality transcripts, but a convenient way to do post-hoc filtering would be nice to have. I'll think about how best to implement that.
Thanks a lot, that'd be very useful !
Happy to write my own function for a quick try but I'm simply unsure how to obtain the expected counts from transcripts metadata. I tried summing up the probability column in the transcript metadata but it doesn't seem to sum up to the expected counts