DRAGMAP icon indicating copy to clipboard operation
DRAGMAP copied to clipboard

Multiple issues with commit c87d93a

Open biork opened this issue 2 years ago • 5 comments

I have been rerunning my torture test since issues I posted earlier have apparently been addressed. The test:

  1. generates random genomes and simulates reads from those genomes parameterizable in many ways
  2. runs dragen-os to build the hashmaps
  3. runs dragen-os to map the simulated reads
  4. verifies the results

The dragen-os executable passes the test more often than not, but it fails roughly 10% of the time in one of two ways:

  1. infinite loops in the hashtable building phase
  2. a reliable crash from a macro assertion always at lib/reference/Hashtable.cpp:96 (even in release builds)

FWIW, the random genomes are small, and unrealistic at least in that sense, but they're not pathologically small and I see both failures for a range of simulation parameters. The hangs and crashes both seem to be more common in simulated genomes with high (hundreds) of contigs rather than 10's, but I have not tried to quantify this.

biork avatar Mar 11 '22 20:03 biork

Hi, and thanks for the update. Just to make sure, the crashes you experience are always in the hash building phase, not when doing mapping ?

rizkg avatar Mar 11 '22 21:03 rizkg

No, the crashes (at Hashtable.cpp:96) are always in the mapping phase, and the (apparent) infinite loop is always in the hash building phase.

biork avatar Mar 11 '22 22:03 biork

Hi, Would you be able to share one of the random genome / command line that generates the problem ? Thanks, Guillaume

rizkg avatar Mar 15 '22 19:03 rizkg

Hi Guillaume, sorry for delay. I have a 47MB archive I could upload somewhere containing one of multiple example runs that I just confirmed aborts at the Hashtable.cpp:96 assertion. Alternatively, you can clone my SAM file simulation repo to generate lots of failing examples with the bash below after editing the FIXME's.

DRAGMAP='FIXME/dragen-os'
BOOST='FIXME/lib/boost/lib'
SIMSAM="FIXME/simsam.py"
REFDIR="${PWD}/genome"
GENOME="${REFDIR}/genome.fa"
MASKED="${REFDIR}/mask.bed"
HASHDIR="${PWD}/dragen-ht"
DECOYS="FIXME/decoys.fa"
SAM='result.sam'
CHECK='FIXME/verify-mapping-coords.py'

SIMSAM_LOG='simsam.log'
HASHER_LOG='hasher.log'
MAPPER_LOG='mapper.log'

let -i ITER=0
SEQS=23

while true; do

	echo "Iteration ${ITER}: generating simulated data"

	sleep 1

	rm -rf ${HASHDIR} ${REFDIR}

	if ! python3 ${SIMSAM} \
		-s E09 \
		--platform-unit XFASD42D_L2 \
		-C ${SEQS} \
		-d 5 \
		--snvs 1000 \
		--mask-ends 0.05 \
		--mask-internal 0.05 \
		-T 700,50 \
		-W ${REFDIR} \
		--seed ${RANDOM} > ${SIMSAM_LOG}; then
		break
	fi

	echo "Iteration ${ITER}: Dragen building hashtables"

	mkdir -p ${HASHDIR}

	if ! LD_LIBRARY_PATH=${BOOST} ${DRAGMAP} --build-hash-table true \
		--num-threads 16 \
		--ht-reference ${GENOME} \
		--output-directory ${HASHDIR} \
		--ht-decoys ${DECOYS} \
		--ht-mask-bed=${MASKED} &> ${HASHER_LOG}; then
		break
	fi

#	--ht-decoys "${PWD}/tiny-decoy.fa" \

	echo "Iteration ${ITER}: Dragen mapping simulated data"

	if ! LD_LIBRARY_PATH=${BOOST} ${DRAGMAP} \
		-r ${HASHDIR} \
		-1 "${REFDIR}/E09_XFASD42D_L2.1.fq" \
		-2 "${REFDIR}/E09_XFASD42D_L2.2.fq" \
		--num-threads 16 1> ${SAM} 2> ${MAPPER_LOG}; then
		break
	fi

	if ! [ -s ${SAM} ]; then
		echo "${DRAGMAP} crashed since ${SAM} is empty"
		break
	fi

	# Halt (preserving the offending data!) when we encounter mapping coordinate problem.

	if ! python3 ${CHECK} < ${SAM} 1> ok.sam 2> bad.sam; then
		echo "python3 ${CHECK} < ${SAM} failed"
		break
	fi

	echo "Iteration ${ITER}: OK"

	ITER=$((ITER+1))
	SEQS=$((SEQS+10))
done

biork avatar Mar 16 '22 21:03 biork

Hi, it seems like I cannot access your gitlab repo. Would you be able to contact me via mail (available on my profile) so that I can send you an upload link ?

rizkg avatar Mar 17 '22 14:03 rizkg