What_the_Phage icon indicating copy to clipboard operation
What_the_Phage copied to clipboard

Validate prediction results via nucleotide shuffling

Open replikation opened this issue 4 years ago • 4 comments

random nucleotide shuffling for error-prone ont reads to improve/validate read hit

  • @hoelzer can you add some code and/or github ref for the nucleotide shuffling?
  • it should be added to the barplots in a meaningful way

replikation avatar Nov 25 '19 17:11 replikation

@hoelzer can you add your example code for this?

replikation avatar Nov 29 '19 10:11 replikation

What I did this night ago was something really simple with a small ruby script:

#!/usr/bin/env ruby

fastq = File.open("\${reads}", 'r')
shuffled = File.open("shuffled.fastq", 'w')

i = 0

out = ''
fastq.each do |line|
    i += 1
    if i == 1 || i == 3 || i == 4
        out << line
    end
    if i == 2
        out << line.chomp.split("").shuffle.join << "\n"
    end
    if i == 4
        shuffled << out if out.length > 1
        out = ''
    end
end

shuffled.close
fastq.close

So this does just randomly shuffle every single read. My idea was simple: if a tool predicts as many viruses from a completely shuffled read set than for an unshuffled set, this means nothing. Maybe this can be also done in a more convenient way with some ready-to-use script/tool.

hoelzer avatar Dec 02 '19 15:12 hoelzer

One could also shuffle by preserving the dinucleotide distribution: https://www.biostars.org/p/134467/ using this script for example: https://github.com/wassermanlab/BiasAway/blob/master/altschulEriksonDinuclShuffle.py

or using uShuffle https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2375906/

But maybe just implementing my ruby code into a module and then we can play around with this and see if it generates helpful insights could be enough for now

hoelzer avatar Dec 02 '19 15:12 hoelzer

@replikation @Stormrider935 I adjusted the above ruby code a little bit so that it should be copy-pastable as ruby code into a module

hoelzer avatar Dec 04 '19 02:12 hoelzer