hagfish
hagfish copied to clipboard
ValueError: cannot convert float NaN to integer
I'm trying to run hagfish_extract
and am getting the following error:
[smoss@biolserva pacbio_assembly]$ hagfish_extract pbreads_to_pbasm_blasr.sorted.bam
/home/smoss/.local/lib/python2.7/site-packages/numpy/core/_methods.py:59: RuntimeWarning: Mean of empty slice.
warnings.warn("Mean of empty slice.", RuntimeWarning)
/home/smoss/.local/lib/python2.7/site-packages/numpy/core/_methods.py:71: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
File "/home/smoss/tools/hagfish/hagfish_extract", line 643, in <module>
stats = doStats(bamBase, seqInfo, readPairs)
File "/home/smoss/tools/hagfish/hagfish_extract", line 293, in doStats
label='Peak top (%d)' % int(topInsert))
ValueError: cannot convert float NaN to integer
I notice that topInsert
is assigned as smids[top]
in the code (where top
is the indices of the maximum values in the histogram). In tracing this issue back I took the liberty of printing smids
and am left with a list of nan
. Printing mids
also returns a list of nan
. Printing out insertSizes
, hist
, and edges
returns an empty list, a list of zeros and a list of nan
respectively.
I printed bamBase
, seqInfo
and readPairs
and get the following:
pb_to_pb_blasr
{'scf7180000000002|quiver': {'length': 5350059}}
{'scf7180000000002|quiver': {'start2': array([], dtype=float64), 'start1': array([], dtype=float64), 'stop1': array([], dtype=float64), 'stop2': array([], dtype=float64)}}
Debug output here:
[smoss@biolserva pb_pbalign]$ hagfish_extract -vvv ../pb_to_pb_pbalign.bam
HAGFISH INFO processing bamfile pb_to_pb_pbalign
HAGFISH DEBUG get sequence info from ../pb_to_pb_pbalign.bam
HAGFISH INFO Reading cached seqInfo for pb_to_pb_pbalign
HAGFISH INFO discovered 1 sequences
HAGFISH INFO processing BAM file: ../pb_to_pb_pbalign.bam
HAGFISH INFO Basename pb_to_pb_pbalign
HAGFISH INFO Processing 1 sequences < 1000 nt (from a total of 1)
HAGFISH DEBUG executing samtools
HAGFISH DEBUG samtools view -f 67 ../pb_to_pb_pbalign.bam
HAGFISH INFO discovered 0 readpairs (insert < 20000 nt) out of a total of 0
HAGFISH INFO wroted data for 1 sequences with zero pairs
HAGFISH INFO total no readpairs: 0
/home/smoss/.local/lib/python2.7/site-packages/numpy/core/_methods.py:59: RuntimeWarning: Mean of empty slice.
warnings.warn("Mean of empty slice.", RuntimeWarning)
/home/smoss/.local/lib/python2.7/site-packages/numpy/core/_methods.py:71: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
HAGFISH DEBUG stats {'average': nan, 'nopairs': 0, 'median': nan}
HAGFISH DEBUG creating a histogram (0, nan)
HAGFISH INFO insert size tops at nan
HAGFISH INFO Estimating min ok insert size as nan
HAGFISH INFO Estimating max ok insert size as nan
HAGFISH INFO plotting normal figure
Traceback (most recent call last):
File "/home/smoss/tools/hagfish/hagfish_extract", line 643, in <module>
stats = doStats(bamBase, seqInfo, readPairs)
File "/home/smoss/tools/hagfish/hagfish_extract", line 293, in doStats
label='Peak top (%d)' % int(topInsert))
ValueError: cannot convert float NaN to integer
It seems to work fine with short-read data from Illumina that I have mapped to the PacBio (PB) assembly using bwa, but for PB to PB mapping using pbalign/blasr it fails. This seems to be down to the samtools step?
I changed the samFlag
input flag to --samFlag=0
and now I am getting output. I'm not entirely sure how this impacts things downstream?
Dear @gawbul - sorry - I was (so it appears) not paying any attention to this page - is this still relevant?
@mfiers Not working on that project anymore, but was still an issue if I remember. I'm not sure if it was down to issues with the data, but haven't had time to investigate since then.