HiNT
HiNT copied to clipboard
Error running CNV from .hic files with hg38
Hello,
Thank you so much for all of your hard work developing this wonderful tool! I have run HiNT successfully on the test data provided (hg19), but encountered the following error when trying to run it on my own data (hg38).
The command
hint cnv -m /data/test.hic \
-f juicer \
--refdir /data/HiNT_ref/refData/hg38 \
-r 50 \
-g hg38 \
-n TEST \
--bicseq /usr/local/apps/bicseq2/0.7.3/ \
-e HindIII \
-o /data/TEST_CNV
The error
From log.out:
[12:57:50] Argument List:
[12:57:50] Hi-C contact matrix = /data/test.hic
[12:57:50] Hi-C contact matrix format = juicer
[12:57:50] resolution = 50 kb
[12:57:50] Genome = hg38
[12:57:50] BICseq directory = /usr/local/apps/bicseq2/0.7.3/
[12:57:50] Name = TEST
[12:57:50] Output directory = /data/TEST_CNV
HiC version: 8
One of the chromosomes wasn't found in the file. Check that the chromosome name matches the genome.
From log.err:
Traceback (most recent call last):
File "/usr/local/apps/hint/2.2.7/bin/hint", line 201, in <module>
main()
File "/usr/local/apps/hint/2.2.7/bin/hint", line 194, in main
cnvrun(argparser)
File "/usr/local/Anaconda/envs_app/hint/2.2.7/lib/python3.6/site-packages/HiNT/runhint.py", line 79, in cnvrun
rowSumFilesInfo = getGenomeRowSums(opts.resolution, opts.matrixfile, chromlf, opts.outdir,opts.name)
File "/usr/local/Anaconda/envs_app/hint/2.2.7/lib/python3.6/site-packages/HiNT/getGenomeRowSumsFromHiC.py", line 69, in getGenomeRowSums
sumInfo = getSumPerChrom(i, j, hicfile, binsize, chroms, chromInfo, sumInfo)
File "/usr/local/Anaconda/envs_app/hint/2.2.7/lib/python3.6/site-packages/HiNT/getGenomeRowSumsFromHiC.py", line 20, in getSumPerChrom
result = straw('NONE', hicfile, str(chr1), str(chr2), 'BP', binsize)
File "/usr/local/Anaconda/envs_app/hint/2.2.7/lib/python3.6/site-packages/HiNT/straw.py", line 471, in straw
master=list1[0]
TypeError: 'int' object is not subscriptable
Potential issue
It seems that the issue is happening due to lines 18-19 in the getGenomeRowSumsFromHiC.py script. These lines trim the "chr" string from the chromosome names before passing them to the straw
function.
However, for hg38 (at least for the .hic file I am working with), straw will only work when the "chr" string is included. For example, straw("NONE", "/data/test.hic", "chr1", "chr1", "BP", 50000)
works and returns data, while straw("NONE", "/data/test.hic", "1", "1", "BP", 50000)
does not work and returns the same error as seen when launching HiNT CNV.
Possible solution?
One solution would be to remove these lstrip
functions from the script. However, this might cause issues for other genome builds (i.e. hg19). If these chromosome names are being taken from the hg19.len and hg38.len files, then this solution could still work with hg19 by just removing the "chr" strings there, although I am not sure if that would affect other steps.
I completely understand if that is too disruptive of a change to make. I wanted to still post this regardless in case any other users are experiencing similar difficulties.