biocode icon indicating copy to clipboard operation
biocode copied to clipboard

Needed: Speed-optimized FASTQ statistics script

Open jorvis opened this issue 11 years ago • 1 comments

One of the really common tasks when given a FASTQ file is to find the following statistics:

  • total read count
  • total base count

While this is trivial itself, what can get more interesting is finding the method to do it that performs the best. Because this will be an important component of a few other projects, speed and proper error handling is important. Most apps assume python, but I'm up for implementations in whatever language will give the best results here as long as they don't open up a huge can of worms dependency-wise.

jorvis avatar Dec 29 '13 05:12 jorvis

Here is an example implementation which could serve as the starting point to improve upon:

https://github.com/jorvis/biocode/blob/master/fastq/fastq_simple_stats.py

jorvis avatar Dec 29 '13 15:12 jorvis