tombo Levelstats statistics file raw data extract to csv

Levelstats statistics file raw data extract to csv

Open bhargava-morampalli opened this issue 3 years ago • 1 comments

I ran the following command on my direct RNA sequencing data

tombo detect_modifications level_sample_compare \
   --fast5-basedirs /native/singlefast5/ \
   --alternate-fast5-basedirs /ivt/singlefast5 \
   --statistics-file-basename level_testing_strain \
   --store-p-value \
   --statistic-type ks --processes 30

I want to extract the data from the resulting stats file and have used tombo api as follows

from tombo import tombo_helper, tombo_stats, resquiggle
import pandas as pd

sample_level_stats = tombo_stats.LevelStats('/data/level_testing_strain.tombo.stats')
reg_level_stats = sample_level_stats.get_reg_stats('chrm', '+', 1, 1525)
pd.DataFrame(reg_level_stats).to_csv("/results/tombotest.csv")

and the resulting csv looks like this.

,stat,pos,cov,control_cov
0,2.6928592163926735e-28,2,219,456
1,1.1185329170881968e-21,3,226,463
2,4.624989306606759e-18,4,261,529
3,1.7881359403179843e-25,5,306,533
4,2.540133370261695e-69,6,880,567
5,9.020681930756034e-76,7,1391,574
6,2.3636818898014833e-85,8,1754,578
7,1.1817788672225994e-58,9,2731,582
8,4.566511057754994e-49,10,3743,586

the first column I assume is just the index.

How do I interpret the statistic in 2nd column -> closer to one as modified (guessing this is probably the case) or most significant (<0.05)?

3rd column is the position of the nucleotide in the reference, coverage for sample and control in 4th and 5th columns

Am I correct in the steps I did for extracting statistics info from the level_sample_compare command?

Jul 08 '21 00:07 bhargava-morampalli

The first column is left by the to_csv method. Use index=False to get rid of it. The second column is the p-value from a Kolmogorov-Smirnov test of two populations of current levels, one from the sample and the other from the control. The p-values are lower when the sample and control differ more.

Aug 10 '21 11:08 SycamoreLeaf

tombo tombo copied to clipboard

Levelstats statistics file raw data extract to csv

tombo
tombo copied to clipboard