condition-transcript-expression question
Hi, thanks for isolator! I started playing around and it seems very useful!
I try to unterstand the different summarize functions.
My samples.yaml looks like this:
KO_young:
RE_8wks_KO_01: bam/RE_8wks_KO_01.bam
RE_8wks_KO_04: bam/RE_8wks_KO_04.bam
RE_8wks_KO_05: bam/RE_8wks_KO_05.bam
KO_old:
RE_25wks_KO_02: bam/RE_25wks_KO_02.bam
RE_25wks_KO_03: bam/RE_25wks_KO_03.bam
ctrl_young:
RE_8wks_HET_01: bam/RE_8wks_HET_01.bam
RE_8wks_HET_02: bam/RE_8wks_HET_02.bam
RE_8wks_HET_03: bam/RE_8wks_HET_03.bam
RE_8wks_WT_01: bam/RE_8wks_WT_01.bam
RE_8wks_WT_03: bam/RE_8wks_WT_03.bam
RE_8wks_WT_04: bam/RE_8wks_WT_04.bam
ctrl_old:
RE_25wks_HET_01: bam/RE_25wks_HET_01.bam
RE_25wks_HET_02: bam/RE_25wks_HET_02.bam
RE_25wks_WT_02: bam/RE_25wks_WT_02.bam
RE_25wks_WT_03: bam/RE_25wks_WT_03.bam
RE_25wks_WT_04: bam/RE_25wks_WT_04.bam
Now with
isolator summarize condition-transcript-expression isolator-output.4_cond.h5
I get a file "condition-transcript-expression" starting with:
gene_name gene_id transcript_id KO_young_adjusted_tpm KO_young_adjusted_tpm KO_young_adjusted_tpm KO_old_adjusted_tpm
mt-Tf ENSMUSG00000064336.1 ENSMUST00000082387.1 3.459316e-02 5.277997e-02 3.047849e-02 2.762534e-02
mt-Rnr1 ENSMUSG00000064337.1 ENSMUST00000082388.1 7.140579e+01 1.003876e+02 7.329236e+01 6.638102e+01
mt-Tv ENSMUSG00000064338.1 ENSMUST00000082389.1 5.484299e-03 7.384523e-03 6.614153e-03 4.906841e-03
...
What are the columns 4-7? Why 3x the same column name? I would expect my 4 different conditions in the header, or?
Each column is the "mean" expression value of one condition?
What is the best way to get a "mean" expression per condition in a way that it matches (or something close with some simple approx.) the expression used to get "median_log2_fold_change" from a "differential-transcript-expression.tsv" file?
Thanks!