kollector.sh: use all assigned physical cores + hyperthreading for everything except MPI
The use of OpenMPI by parts of ABySS requires specifying the number of physical CPU cores to use. However, there's no such requirement for the rest of the tools used in the pipeline (biobloom). On the assumption that the number of physical cores on the machine is half that of physical cores + hyperthreads, simply divide the -j parameter when passing it to abyss-pe.
This is tested on linux only and on a small personal machine and may or may not be appropriate for other configurations. Perhaps it would be simpler to provide yet another option to kollector.sh (and kollector_multiple.sh) for setting ABySS jobs separately.
The difference of CPU architecture aside (some cpus don't have hyperthreads or will have more than 2 hyperthreads per core). I'm not sure if the physical core vs hyperthreads distiction is that important for abyss. Did you notice a performance difference if you had used the default number of threads?
If you did it may have more to do with scaling than the use of hyperthreads.
I only noticed this because abyss errored out when using the same number of threads as kollector (biobloom), limiting its maximum number of threads to the number of physical cores only.
From what I understand, by using -j with kollector, abyss-pe calls ABYSS-P through mpirun, and if no hostfile, --host parameter or resource manager is specified, openMPI's maximum number of slots will default to the number of processor cores.
In our case, we had 4 cores + 4 hyperthreads, we ran kollector with -j 8 and when calling abyss-pe, we get:
/usr/bin/mpirun -np 8 ABYSS-P -k64 -q3 -v --coverage-hist=coverage.hist -s kct_1-bubbles.fa -o kct_1-1.fa kct_1.recruited_pe.fastq
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 8
slots that were requested by the application:
ABYSS-P
Either request fewer slots for your application, or make more slots
available for use.
A "slot" is the Open MPI term for an allocatable unit where we can
launch a process. The number of slots available are defined by the
environment in which Open MPI processes are run:
1. Hostfile, via "slots=N" clauses (N defaults to number of
processor cores if not provided)
2. The --host command line parameter, via a ":N" suffix on the
hostname (N defaults to 1 if not provided)
3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
4. If none of a hostfile, the --host command line parameter, or an
RM is present, Open MPI defaults to the number of processor cores
[...]
So, there are cases in which we could specify more threads for biobloom than for abyss. The patch is wrong anyway (sorry), it should change both invocations of abyss-pe, but the point is that it might be useful to either allow changing mpirun parameters, or provide a way to set a different number of threads for abyss only.
Depending on the system configuration I guess it does make sense to perhaps add a seperate parameter for Abyss to use.