Question about ref_position in modkit extract results
Hello developer!
I am wondering whether the ref_position in the modkit extract results considers the strand orientation of the transcript.
For example, if strand orientation is not taken into account, wouldn't it be possible for a site with a smaller ref_position value to actually be located toward the 3' end if the transcript is on the reverse strand?
Since I use transcriptome.mmi for data preprocessing with dorado_aligner, I assume that the positions are determined relative to the 5'UTR, regardless of strand orientation.
Could you provide any insights or clarification on this?
Thank you for your time!
Hello @Seongmin-Jang-1165,
The ref_position is always the position as it would be in the FASTA reference. So a smaller ref_position is always toward the 5-prime end of the transcript since the FASTA will be 5-prime to 3-prime.
Here is an example, a small selection from modkit extract full:
| read_id | forward_read_position | ref_position | chrom | mod_strand | ref_strand | ref_mod_strand |
|---|---|---|---|---|---|---|
| 7463c233-f41d-41ee-9fbd-675d3a4f2f5d | 1172 | 999 | ENST00000616016.5 | + | + | + |
| fd389cb1-c73a-465c-9a92-056f62b880a2 | 50 | 965 | ENST00000616016.5 | + | - | - |
| fd389cb1-c73a-465c-9a92-056f62b880a2 | 44 | 971 | ENST00000616016.5 | + | - | - |
The top read, 7463c233-f41d-41ee-9fbd-675d3a4f2f5d, is mapped to the forward strand so the reference position is unambiguous. The bottom read, fd389cb1-c73a-465c-9a92-056f62b880a2 is mapped to the reverse strand but the ref_position coordinates are still counting "from the left". I'm also pretty sure that when aligning to the transcriptome, reverse mappings are usually partial and/or spurious, but I'm not going to claim that I know that for certain for every situation.
@ArtRand Thank you for advice!
How to make the tworead.bam file in your IGV photo..? I want to visualize my data in IGV but it didn't work when I put the several files into IGV. I also tried to make bigwig file, but it doesn't work also...
@Seongmin-Jang-1165,
You should be able to use a modBAM directly in IGV. Which version do you have? I'm using 2.19.1. To get the coloring, you need to specify color by base modification:
@ArtRand Thank you !!
I tried with modBAM, but the error occur. my IGV version is 2.19.2 ---------------- error File: C:\Users\tjdal\OneDrive\바탕 화면\랩실\히스톤\m6A seq\NANOPORE\시퀀싱 진행_20240923\분석\modkit\m6a_basecall\modkit_inosine제외 버젼\ModBAM_Modkit의 input_m6abasecall\m6A_barcode1_dorado_aligned_sorted.bam does not contain any sequence names which match the current genome. Sequence names in 'filename': ENST00000456328.2|ENSG00000290825.1|-|OTTHUMT00000362751.1|DDX11L2-202|DDX11L2|1657|lncRNA|, ENST00000450305.2|ENSG00000223972.6|OTTHUMG00000000961.2|OTTHUMT00000002844.2|DDX11L1-201|DDX11L1|632|transcribed_unprocessed_pseudogene|, ENST00000488147.1|ENSG00000227232.5|OTTHUMG00000000958.1|OTTHUMT00000002839.1|WASH7P-201|WASH7P|1351|unprocessed_pseudogene|, ENST00000619216.1|ENSG00000278267.1|-|-|MIR6859-1-201|MIR6859-1|68|miRNA|, ... Sequence names in genome reference sequence: chr1, chr2, chr3, chr4, ...
I think this is because the ID doesn't match between IGV reference and modBam. modBam has transcript ID and IGV has chromosome ID.
can you give me some tips about this problem..?
I may find the clue.
the modBAM file I put into IGV is aligned with DORADO(m6A_barcode1_dorado_aligned.bam), the command is like this
dorado aligner GRCh38.primary_assembly.transcriptome.mmi m6a_basecalled_barcode1.bam > m6A_barcode1_dorado_aligned.bam
this means the modBAM file is aligned with Transcriptome ID
But IGV's default reference setting is genome, not transcriptome, so it is not compatible.
this is my idea, is it right..?
and if so, How can I see the m6a site in the IGV..? Do I need to re-align modBAM with genome reference..? this is direct RNA seq data.
it is good to me to use alternative way of alignment... one of my idea is use Minimap2 instead of DORADO, that means re-align this [m6a_basecalled_barcode1.bam] file. but I'm worry about the m6A tag would be removed...