DRAGMAP
DRAGMAP copied to clipboard
Multiple issues with commit c87d93a
I have been rerunning my torture test since issues I posted earlier have apparently been addressed. The test:
- generates random genomes and simulates reads from those genomes parameterizable in many ways
- runs dragen-os to build the hashmaps
- runs dragen-os to map the simulated reads
- verifies the results
The dragen-os executable passes the test more often than not, but it fails roughly 10% of the time in one of two ways:
- infinite loops in the hashtable building phase
- a reliable crash from a macro assertion always at lib/reference/Hashtable.cpp:96 (even in release builds)
FWIW, the random genomes are small, and unrealistic at least in that sense, but they're not pathologically small and I see both failures for a range of simulation parameters. The hangs and crashes both seem to be more common in simulated genomes with high (hundreds) of contigs rather than 10's, but I have not tried to quantify this.
Hi, and thanks for the update. Just to make sure, the crashes you experience are always in the hash building phase, not when doing mapping ?
No, the crashes (at Hashtable.cpp:96) are always in the mapping phase, and the (apparent) infinite loop is always in the hash building phase.
Hi, Would you be able to share one of the random genome / command line that generates the problem ? Thanks, Guillaume
Hi Guillaume, sorry for delay. I have a 47MB archive I could upload somewhere containing one of multiple example runs that I just confirmed aborts at the Hashtable.cpp:96 assertion. Alternatively, you can clone my SAM file simulation repo to generate lots of failing examples with the bash below after editing the FIXME's.
DRAGMAP='FIXME/dragen-os'
BOOST='FIXME/lib/boost/lib'
SIMSAM="FIXME/simsam.py"
REFDIR="${PWD}/genome"
GENOME="${REFDIR}/genome.fa"
MASKED="${REFDIR}/mask.bed"
HASHDIR="${PWD}/dragen-ht"
DECOYS="FIXME/decoys.fa"
SAM='result.sam'
CHECK='FIXME/verify-mapping-coords.py'
SIMSAM_LOG='simsam.log'
HASHER_LOG='hasher.log'
MAPPER_LOG='mapper.log'
let -i ITER=0
SEQS=23
while true; do
echo "Iteration ${ITER}: generating simulated data"
sleep 1
rm -rf ${HASHDIR} ${REFDIR}
if ! python3 ${SIMSAM} \
-s E09 \
--platform-unit XFASD42D_L2 \
-C ${SEQS} \
-d 5 \
--snvs 1000 \
--mask-ends 0.05 \
--mask-internal 0.05 \
-T 700,50 \
-W ${REFDIR} \
--seed ${RANDOM} > ${SIMSAM_LOG}; then
break
fi
echo "Iteration ${ITER}: Dragen building hashtables"
mkdir -p ${HASHDIR}
if ! LD_LIBRARY_PATH=${BOOST} ${DRAGMAP} --build-hash-table true \
--num-threads 16 \
--ht-reference ${GENOME} \
--output-directory ${HASHDIR} \
--ht-decoys ${DECOYS} \
--ht-mask-bed=${MASKED} &> ${HASHER_LOG}; then
break
fi
# --ht-decoys "${PWD}/tiny-decoy.fa" \
echo "Iteration ${ITER}: Dragen mapping simulated data"
if ! LD_LIBRARY_PATH=${BOOST} ${DRAGMAP} \
-r ${HASHDIR} \
-1 "${REFDIR}/E09_XFASD42D_L2.1.fq" \
-2 "${REFDIR}/E09_XFASD42D_L2.2.fq" \
--num-threads 16 1> ${SAM} 2> ${MAPPER_LOG}; then
break
fi
if ! [ -s ${SAM} ]; then
echo "${DRAGMAP} crashed since ${SAM} is empty"
break
fi
# Halt (preserving the offending data!) when we encounter mapping coordinate problem.
if ! python3 ${CHECK} < ${SAM} 1> ok.sam 2> bad.sam; then
echo "python3 ${CHECK} < ${SAM} failed"
break
fi
echo "Iteration ${ITER}: OK"
ITER=$((ITER+1))
SEQS=$((SEQS+10))
done
Hi, it seems like I cannot access your gitlab repo. Would you be able to contact me via mail (available on my profile) so that I can send you an upload link ?