modkit icon indicating copy to clipboard operation
modkit copied to clipboard

Question about ref_position in modkit extract results

Open Seongmin-Jang-1165 opened this issue 9 months ago • 5 comments

Hello developer!

I am wondering whether the ref_position in the modkit extract results considers the strand orientation of the transcript.

For example, if strand orientation is not taken into account, wouldn't it be possible for a site with a smaller ref_position value to actually be located toward the 3' end if the transcript is on the reverse strand?

Since I use transcriptome.mmi for data preprocessing with dorado_aligner, I assume that the positions are determined relative to the 5'UTR, regardless of strand orientation.

Could you provide any insights or clarification on this?

Thank you for your time!

Seongmin-Jang-1165 avatar Mar 21 '25 02:03 Seongmin-Jang-1165

Hello @Seongmin-Jang-1165,

The ref_position is always the position as it would be in the FASTA reference. So a smaller ref_position is always toward the 5-prime end of the transcript since the FASTA will be 5-prime to 3-prime.

Here is an example, a small selection from modkit extract full:

read_id forward_read_position ref_position chrom mod_strand ref_strand ref_mod_strand
7463c233-f41d-41ee-9fbd-675d3a4f2f5d 1172 999 ENST00000616016.5 + + +
fd389cb1-c73a-465c-9a92-056f62b880a2 50 965 ENST00000616016.5 + - -
fd389cb1-c73a-465c-9a92-056f62b880a2 44 971 ENST00000616016.5 + - -

Image

The top read, 7463c233-f41d-41ee-9fbd-675d3a4f2f5d, is mapped to the forward strand so the reference position is unambiguous. The bottom read, fd389cb1-c73a-465c-9a92-056f62b880a2 is mapped to the reverse strand but the ref_position coordinates are still counting "from the left". I'm also pretty sure that when aligning to the transcriptome, reverse mappings are usually partial and/or spurious, but I'm not going to claim that I know that for certain for every situation.

ArtRand avatar Mar 24 '25 23:03 ArtRand

@ArtRand Thank you for advice!

How to make the tworead.bam file in your IGV photo..? I want to visualize my data in IGV but it didn't work when I put the several files into IGV. I also tried to make bigwig file, but it doesn't work also...

Seongmin-Jang-1165 avatar Mar 31 '25 17:03 Seongmin-Jang-1165

@Seongmin-Jang-1165,

You should be able to use a modBAM directly in IGV. Which version do you have? I'm using 2.19.1. To get the coloring, you need to specify color by base modification:

Image

ArtRand avatar Mar 31 '25 21:03 ArtRand

@ArtRand Thank you !!

I tried with modBAM, but the error occur. my IGV version is 2.19.2 ---------------- error File: C:\Users\tjdal\OneDrive\바탕 화면\랩실\히스톤\m6A seq\NANOPORE\시퀀싱 진행_20240923\분석\modkit\m6a_basecall\modkit_inosine제외 버젼\ModBAM_Modkit의 input_m6abasecall\m6A_barcode1_dorado_aligned_sorted.bam does not contain any sequence names which match the current genome. Sequence names in 'filename':      ENST00000456328.2|ENSG00000290825.1|-|OTTHUMT00000362751.1|DDX11L2-202|DDX11L2|1657|lncRNA|, ENST00000450305.2|ENSG00000223972.6|OTTHUMG00000000961.2|OTTHUMT00000002844.2|DDX11L1-201|DDX11L1|632|transcribed_unprocessed_pseudogene|, ENST00000488147.1|ENSG00000227232.5|OTTHUMG00000000958.1|OTTHUMT00000002839.1|WASH7P-201|WASH7P|1351|unprocessed_pseudogene|, ENST00000619216.1|ENSG00000278267.1|-|-|MIR6859-1-201|MIR6859-1|68|miRNA|, ... Sequence names in genome reference sequence: chr1, chr2, chr3, chr4, ...

I think this is because the ID doesn't match between IGV reference and modBam. modBam has transcript ID and IGV has chromosome ID.

can you give me some tips about this problem..?

Seongmin-Jang-1165 avatar Apr 10 '25 19:04 Seongmin-Jang-1165

I may find the clue.

the modBAM file I put into IGV is aligned with DORADO(m6A_barcode1_dorado_aligned.bam), the command is like this

dorado aligner GRCh38.primary_assembly.transcriptome.mmi m6a_basecalled_barcode1.bam > m6A_barcode1_dorado_aligned.bam

this means the modBAM file is aligned with Transcriptome ID

But IGV's default reference setting is genome, not transcriptome, so it is not compatible.

this is my idea, is it right..?

and if so, How can I see the m6a site in the IGV..? Do I need to re-align modBAM with genome reference..? this is direct RNA seq data.

it is good to me to use alternative way of alignment... one of my idea is use Minimap2 instead of DORADO, that means re-align this [m6a_basecalled_barcode1.bam] file. but I'm worry about the m6A tag would be removed...

Seongmin-Jang-1165 avatar Apr 10 '25 19:04 Seongmin-Jang-1165