IsoQuant run isoquant on several samples

Hi, When I run 10 samples separately, the percentage of novel isoforms with CAGE support are like 50% in each samples. But, I want to generate one GTF file. So, I run 10 samples together. The results are strange to me. The percentage of novel isoforms with CAGE support decreased a lot, to 30%. I have no idea about it. Do you have any suggestion? When I input 10 bam files together, isoquant precessed them as 1 sample, and reported that sample has 10 BAM files.

Thanks. Lina

May 03 '24 21:05 linalu1121

Dear @linalu1121

How do you measure CAGE support?

Could you send me some logs from your runs, both individual and joint?

Best Andrey

May 07 '24 23:05 andrewprzh

Hi Andrey,

Thank you for replying.

I use the transcript_models.gtf as input to SQANTI3 to calculate the CAGE support.

I ran 10 samples together, then I obtained the following log: isoquant.py -d nanopore --bam_list bam_file_list.txt --read_group tag:CB --genedb genes.gtf --complete_genedb --reference genome.fa --output allsamples --prefix allsamples --threads 20 --clean_start 2024-04-19 14:21:20,860 - INFO - Running IsoQuant version 3.3.1 2024-04-19 14:21:33,937 - INFO - Novel unspliced transcripts will not be reported, set --report_novel_unspliced true to discover them 2024-04-19 14:21:33,937 - INFO - === IsoQuant pipeline started === 2024-04-19 14:21:33,938 - INFO - Converting gene annotation file to .db format (takes a while)... 2024-04-19 14:29:08,993 - INFO - Gene database written to /allsamples/genes.db 2024-04-19 14:29:08,993 - INFO - Provide this database next time to avoid excessive conversion 2024-04-19 14:29:08,994 - INFO - Loading gene database from /allsamples/genes.db 2024-04-19 14:29:08,994 - INFO - Loading reference genome from /all/genome.fa 2024-04-19 14:29:08,996 - INFO - Processing 1 sample 2024-04-19 14:29:08,996 - INFO - Processing sample allsamples 2024-04-19 14:29:08,996 - INFO - Sample has 10 BAM files: sample1.bam, sample2.bam, sample3.bam, sample4.bam, sample5.bam, sample6.bam, sample7.bam, sample8.bam, sample9.bam, sample10.bam

Actually, these 10 BAM files represent 10 different samples. However, I aim to generate a single GIF file. Thus, I executed the analysis for all 10 samples concurrently.

And below is the code for processing each sample individually: isoquant.py -d nanopore --bam sample1.bam --read_group tag:CB --genedb genes.gtf --complete_genedb --reference genome.fa --output sample1 --prefix sample1 --threads 20 --clean_start

May 07 '24 23:05 linalu1121

@linalu1121

Yes, to get a single GTF it makes sense to provide all BAMs together, so everything is correct in this part. It doesn't really matter that the log says that a sample has 10 BAM files.

Could you send me the entire log files? I'm more interested in the statistics at the end of the log, with respect to discovered transcripts.

Best Andrey

May 07 '24 23:05 andrewprzh

New IsoQuant 3.5 should be far more optimal in terms of RAM consumption, especially when using multiple samples.

I'll close this issue for now, please, re-open if needed.

Aug 03 '24 11:08 andrewprzh