nonpareil icon indicating copy to clipboard operation
nonpareil copied to clipboard

Input from stdin

Open koopkaup opened this issue 6 years ago • 2 comments

Is it possible to get the input file from standard input? For example, all my data is compressed and it would be more convenient to just pipe gunzip output to stdout and the use it as stdin in nonpareil.

koopkaup avatar Jan 26 '18 09:01 koopkaup

Hello @koopkaup Unfortunately, that would require major changes in the code, because the input files are read multiple times:

  • For -T kmer or -T alignment in one machine: There is an initial file pass to sample query reads and count total reads, and a second pass to run the comparisons.
  • For -T alignment with MPI: Each machine makes a pass as opposed to sending data directly to worker nodes to reduce bandwidth use.

However, I think we could implement an option to read directly from zipped files (gzip / bzip2), what do you think @gunturus ?

M

lmrodriguezr avatar Jan 26 '18 15:01 lmrodriguezr

For random sampling, we randomly move to a position in the file. So, this will require us to have the file uncompressed to begin with.

gunturus avatar Jan 26 '18 15:01 gunturus