proseg icon indicating copy to clipboard operation
proseg copied to clipboard

downstream analyses: expected counts vs counts

Open j-bac opened this issue 10 months ago • 2 comments

What are pros and cons of using expected counts from proseg rather than the integer count matrix produced when exporting to 10x format ? Which is recommended for use in downstream analyses?

Also, is there a way to regenerate the expected count matrix from the transcripts table ? This would be useful, for example, if one wants to filter out some transcripts based on a transcript QC metric after running proseg.

Many thanks!

j-bac avatar Feb 05 '25 15:02 j-bac

My recommendation is to use expected counts if you can. Because proseg is a sampler, this is essentially what it natively computes (that is, parameters averaged over multiple samples). The optional integer point estimates shouldn't be drastically different, though.

I don't have code to compute a matrix from the transcript metadata, though it shouldn't be too hard to implement. There is a --min-qv option to have proseg filter out low quality transcripts, but a convenient way to do post-hoc filtering would be nice to have. I'll think about how best to implement that.

dcjones avatar Feb 05 '25 17:02 dcjones

Thanks a lot, that'd be very useful !

Happy to write my own function for a quick try but I'm simply unsure how to obtain the expected counts from transcripts metadata. I tried summing up the probability column in the transcript metadata but it doesn't seem to sum up to the expected counts

j-bac avatar Feb 05 '25 17:02 j-bac