varCA
varCA copied to clipboard
run the post-processing steps in the prepare pipeline in parallel
At the end of the prepare
pipeline, a couple of post-processing steps are performed on the merged TSV before we feed it to the classify
pipeline. All of the scripts used in these steps support reading from stdin
and writing to stdout
except for fillna.bash
- [x] remove the first parameter from
fillna.bash
and make it read the TSV fromstdin
, instead - [x] connect all of the post-processing steps together via pipes
- this will allow us to save on file IO and wasted time compressing and uncompressing the file between steps
- [x] remove extra config params that nobody uses (like
keepna
,pure_numerics
, and friends) - they just make things more complicated - [x] mark extra files as temp