varCA icon indicating copy to clipboard operation
varCA copied to clipboard

run the post-processing steps in the prepare pipeline in parallel

Open aryarm opened this issue 4 years ago • 0 comments

At the end of the prepare pipeline, a couple of post-processing steps are performed on the merged TSV before we feed it to the classify pipeline. All of the scripts used in these steps support reading from stdin and writing to stdout except for fillna.bash

  • [x] remove the first parameter from fillna.bash and make it read the TSV from stdin, instead
  • [x] connect all of the post-processing steps together via pipes
    • this will allow us to save on file IO and wasted time compressing and uncompressing the file between steps
  • [x] remove extra config params that nobody uses (like keepna, pure_numerics, and friends) - they just make things more complicated
  • [x] mark extra files as temp

aryarm avatar Jun 09 '20 18:06 aryarm