RNAseqEval icon indicating copy to clipboard operation
RNAseqEval copied to clipboard

the exon coordinate calculated by Process_pbsim_data is not the same to GTF file?

Open ydLiu-HIT opened this issue 6 years ago • 3 comments

When I use Process_pbsim_data to evaluate the SAM file, I print the coordinates of expected exons(Items), but I found that the coordinate is not the same as the coordinates in GTF file.

coordinate printed by Process_pbsim_data: image

coordinate in GTF image

ydLiu-HIT avatar Mar 29 '18 02:03 ydLiu-HIT

            Hello, and thanks for reporting this.

            Can you tell me what dataset did you test this on, and for which read is this?

            One possibile explanation is that the simulated read doesn't contain the last nucleotide. The other one is the error in the script.

            I will have a look at the script to check if it is indeed an error.

            Best regards,
            Krešimir Križanović Ph.D.
            University of Zagreb
            Faculty of Electrical Engineering and Computing
            Croatia

From: ydLiu-HIT [mailto:[email protected]] Sent: Thursday, March 29, 2018 4:27 AM To: kkrizanovic/RNAseqEval [email protected] Cc: Subscribed [email protected] Subject: [kkrizanovic/RNAseqEval] the exon coordinate calculated by Process_pbsim_data is not the sam to GTF file? (#1)

When I use Process_pbsim_data to evaluate the SAM file, I print the coordinates of expected exons(Items), but I found that the coordinate is not the same as the coordinates in GTF file.

coordinate printed by Process_pbsim_data: [image]https://user-images.githubusercontent.com/27715065/38066632-78b59f66-333b-11e8-99ae-c93c51d9c759.png

coordinate in GTF [image]https://user-images.githubusercontent.com/27715065/38066650-92adec66-333b-11e8-847c-f94ced6a11f2.png

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/kkrizanovic/RNAseqEval/issues/1, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKOrMs_XNpAhVmKBvHnm22mIotRUGGw4ks5tjEZkgaJpZM4S_pWE.

kkrizanovic avatar Mar 29 '18 07:03 kkrizanovic

            Hello again,

            Sorry for taking this long to respond again. I've reviewed our Python code and the difference in exon end coordinates between GTF and our Python code are simply a difference in design. Our end coordinate denotes the position after the last nucleotide belonging to the exon, while in GTF file end coordinate denotes exactly the last base belonging to an exon.

            However, you remark helped me uncover another error in calculating expected partial alignments, which I have now fixed and pushed to GitHub. I appreciate your interest in out evaluator, and taking time to comment on it.

            Hope this clarifies thing for you.

            Best regards,
            Krešimir Križanović Ph.D.
            University of Zagreb
            Faculty of Electrical Engineering and Computing
            Croatia

From: Krešimir Križanović Sent: Thursday, March 29, 2018 9:35 AM To: 'kkrizanovic/RNAseqEval' [email protected] Subject: RE: [kkrizanovic/RNAseqEval] the exon coordinate calculated by Process_pbsim_data is not the sam to GTF file? (#1)

            Hello, and thanks for reporting this.

            Can you tell me what dataset did you test this on, and for which read is this?

            One possibile explanation is that the simulated read doesn't contain the last nucleotide. The other one is the error in the script.

            I will have a look at the script to check if it is indeed an error.

            Best regards,
            Krešimir Križanović Ph.D.
            University of Zagreb
            Faculty of Electrical Engineering and Computing
            Croatia

From: ydLiu-HIT [mailto:[email protected]] Sent: Thursday, March 29, 2018 4:27 AM To: kkrizanovic/RNAseqEval <[email protected]mailto:[email protected]> Cc: Subscribed <[email protected]mailto:[email protected]> Subject: [kkrizanovic/RNAseqEval] the exon coordinate calculated by Process_pbsim_data is not the sam to GTF file? (#1)

When I use Process_pbsim_data to evaluate the SAM file, I print the coordinates of expected exons(Items), but I found that the coordinate is not the same as the coordinates in GTF file.

coordinate printed by Process_pbsim_data: [image]https://user-images.githubusercontent.com/27715065/38066632-78b59f66-333b-11e8-99ae-c93c51d9c759.png

coordinate in GTF [image]https://user-images.githubusercontent.com/27715065/38066650-92adec66-333b-11e8-847c-f94ced6a11f2.png

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/kkrizanovic/RNAseqEval/issues/1, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKOrMs_XNpAhVmKBvHnm22mIotRUGGw4ks5tjEZkgaJpZM4S_pWE.

kkrizanovic avatar Apr 12 '18 13:04 kkrizanovic

Thank you for committing your new python code. It solves my problem very well. And I have another question about the python code now. But I am not sure, it is just a suggestion.

In file Process_pbsim_data.py, line 321, the sentence "while annotation.items[i].getLength() < maf_startpos:", I think it shuould be "while annotation.items[i].getLength() <= maf_startpos:". Because when I test dataset dataset4_sim_dm_g2as.fastq, the later sentence works better.

ydLiu-HIT avatar Apr 17 '18 07:04 ydLiu-HIT