ray
ray copied to clipboard
Re-implement the code that read input files in parallel.
The code that counts sequences in file (Partitioner) is fine.
But after that, the code that reads sequences from file is not very good.
The problem is that too many processes are reading the same file at once.
The code can't really use MPI I/O for that directly because (I think) because MPI I/O functions are collectives.
One thing that would great would be:
Have just 1 process that takes care of one file and dispatch the sequences to other ranks / actors.
code/SequencesLoader/SequencesLoader.cpp
metadata has to be sent too (LEFT_READ, RIGHT_READ, PAIR MATE and so on).
This is not trivial because code/SequencesLoader/SequencesLoader.cpp is quite ugly.