xpore
xpore copied to clipboard
What happens when nanopolish model_kmer is NNNNN?
Looking at the code for xpore 2.0, I can see the following:
assert list(set(g_kmer_array))[0].count('N') == 0 ##to weed out the mapped kmers from tx_seq that contain 'N', which is not in diffmod's model_kmer
Does this mean any nanopore read which includes an unmapped (not in model_kmer) events for that chromosome/position are discarded? Or do we just discard the mean from the NNNNN event and use the rest?
I ask this because from your demo data I can see that sometimes one of these events happens in the middle of a mapping, and according to the paper the multiple event means are weighted/averaged by their event length.
ENST00000351111.6 689 GCTGA 3 t 1416 98.74 3.721 0.00232 GCTGA 89.96 2.85 2.75 41380 41387
ENST00000351111.6 689 GCTGA 3 t 1417 87.95 2.286 0.00896 GCTGA 89.96 2.85 -0.63 41353 41380
ENST00000351111.6 689 GCTGA 3 t 1418 102.76 4.575 0.00232 NNNNN 0.00 0.00 inf 41346 41353
ENST00000351111.6 689 GCTGA 3 t 1419 89.48 1.892 0.01228 GCTGA 89.96 2.85 -0.15 41309 41346
ENST00000351111.6 689 GCTGA 3 t 1420 86.30 1.539 0.00365 GCTGA 89.96 2.85 -1.15 41298 41309
ENST00000351111.6 689 GCTGA 3 t 1421 89.38 2.461 0.01228 GCTGA 89.96 2.85 -0.18 41261 41298
ENST00000351111.6 689 GCTGA 3 t 1422 88.21 1.945 0.00432 GCTGA 89.96 2.85 -0.55 41248 41261
ENST00000351111.6 689 GCTGA 3 t 1423 85.51 1.369 0.00332 GCTGA 89.96 2.85 -1.39 41238 41248
ENST00000351111.6 689 GCTGA 3 t 1424 87.93 1.238 0.00365 GCTGA 89.96 2.85 -0.64 41227 41238
ENST00000351111.6 689 GCTGA 3 t 1425 90.67 3.229 0.00432 GCTGA 89.96 2.85 0.22 41214 41227
I also noticed that when not using genome mapping, there doesn't seem to be any check for these 'NNNNN' events in the preprocess_tx function.
Hi @cathoderaymission, Yes, those Kmers containing 'N' are discarded. Oh, you are right. There is no check for the "NNNNN" Kmers in the transcriptome mode. We will add the 'N' checker in the preprocess_tx soon - Thank you very much.
However, this does not affect the process of xpore diffmod; except that the result table may contain those 'N' kmers, which can be filtered out later.
Dear Developer, Could you please tell me the kit used for the cell line? Thanks very much!