kineticsTools icon indicating copy to clipboard operation
kineticsTools copied to clipboard

Determine significant coverage and score value in the ipdSummary gff

Open uceleste opened this issue 6 years ago • 1 comments

Hi All,

I would like to know what is the coverage and score value in the ipdSummary gff to consider a modified base as confident.

Example:

seqname source feature start end score strand frame coverage context IPDRatio CognateBase
genome kinModCall modified_base 64078 64078 42 - . 764 TTCGCAAGAAGACCTGAAGACCCTAGTGAAGTTTCTTCTTC 1.53 C
genome kinModCall modified_base 63115 63115 21 - . 759 TATAGTGAAATGAGAGGGAGTTACGAGGAGCAATGTAATGC 1.41 T
genome kinModCall modified_base 63168 63168 28 - . 759 AGCCATGCTTCGTTTGTGGAGGGGTGAAACATTTAGCTAAG 1.46 G
genome kinModCall modified_base 63203 63203 62 - . 757 AGGAATCCACATGGTCACAAGGGCAGAGTCACAAGAGCCAT 1.74 G
genome kinModCall modified_base 61924 61924 73 - . 756 TTCGGGAACATGATCTTGGAGGTAAATGTTTTCCACATTGC 1.87 G

Thanks

uceleste avatar Dec 14 '18 11:12 uceleste

Score is dependent on coverage, and it isn't always possible to define a confidant cutoff. I would plot the data that you have (coverage vs score), the modified bases should form a distinct cluster. From the plot you should be able to define a function to distinguish modified from unmodified bases. Obviously this is much easier if you have some kind of control, known modified motif etc.

rhallPB avatar Dec 14 '18 18:12 rhallPB