What_the_Phage
What_the_Phage copied to clipboard
Validate prediction results via nucleotide shuffling
random nucleotide shuffling for error-prone ont reads to improve/validate read hit
- @hoelzer can you add some code and/or github ref for the nucleotide shuffling?
- it should be added to the barplots in a meaningful way
@hoelzer can you add your example code for this?
What I did this night ago was something really simple with a small ruby script:
#!/usr/bin/env ruby
fastq = File.open("\${reads}", 'r')
shuffled = File.open("shuffled.fastq", 'w')
i = 0
out = ''
fastq.each do |line|
i += 1
if i == 1 || i == 3 || i == 4
out << line
end
if i == 2
out << line.chomp.split("").shuffle.join << "\n"
end
if i == 4
shuffled << out if out.length > 1
out = ''
end
end
shuffled.close
fastq.close
So this does just randomly shuffle every single read. My idea was simple: if a tool predicts as many viruses from a completely shuffled read set than for an unshuffled set, this means nothing. Maybe this can be also done in a more convenient way with some ready-to-use script/tool.
One could also shuffle by preserving the dinucleotide distribution: https://www.biostars.org/p/134467/ using this script for example: https://github.com/wassermanlab/BiasAway/blob/master/altschulEriksonDinuclShuffle.py
or using uShuffle https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2375906/
But maybe just implementing my ruby code into a module and then we can play around with this and see if it generates helpful insights could be enough for now
@replikation @Stormrider935 I adjusted the above ruby code a little bit so that it should be copy-pastable as ruby code into a module