CNV segmentation crash
Hi @XiaoTaoWang
When running the segment-cnv module, I keep running into the error below.
When running at 10kb, the CNV profile was full of 0's, and I got this error on the first chromosome.
With the same cool file, but rebinned at 100kb, the calculate-cnv module ran fine and the CNV profile looked alright, but I got the following traceback at chromosome 19. Any idea why this happens ?
Thanks
NOTE: I truncated the log at [...], but all chromosomes in between have normal INFO logs
root INFO @ 06/09/21 22:02:15:
# ARGUMENT LIST:
# CNV Profile = ../neoloopfinder/detected_cnv.txt
# Ploidy = 2
# Bin Size = 100000
# Output Path = ../neoloopfinder/segmented_cnv.txt
# Number of Processes = 12
# Log file name = cnv-seg.log
root INFO @ 06/09/21 22:02:17: Loading CNV profile ...
root INFO @ 06/09/21 22:02:17: Perform segmentation ...
neoloop.cnv.segcnv INFO @ 06/09/21 22:02:17: Segmenting Chromosome 1 ...
neoloop.cnv.segcnv INFO @ 06/09/21 22:02:22: Estimated HMM state number: 5 (log scale)
...
neoloop.cnv.segcnv INFO @ 06/09/21 22:07:57: Segmenting Chromosome 19 ...
neoloop.cnv.segcnv INFO @ 06/09/21 22:08:00: Estimated HMM state number: 4 (log scale)
Traceback (most recent call last):
File "/home/cmatthey/anaconda3/bin/segment-cnv", line 87, in run
work.segment()
File "/home/cmatthey/anaconda3/lib/python3.8/site-packages/neoloop/cnv/segcnv.py", line 139, in segment
sig, hmm_seg, scale = self._segment(sig) # pure HMM-based segmentation
File "/home/cmatthey/anaconda3/lib/python3.8/site-packages/neoloop/cnv/segcnv.py", line 301, in _segment
seq = np.r_[[s.name for i, s in model.viterbi(np.log2(queue))[1][1:]]]
TypeError: 'NoneType' object is not subscriptable
Hello @XiaoTaoWang, I am having the same problem as @cmdoret.
Hi @cmdoret and @andyjslee, I couldn't reproduce this error with any of the samples I have. Just curious, are you running calculate-cnv and segment-cnv on some extremely sparse matrix (since @cmdoret mentioned there are many 0s at 10kb)?
The matrix is a bit sparse indeed, it has 258M contacts and below are pictures of the region chr3:160Mb-195Mb at 10 and 100 kb resolution respectively. The 100kb matrix looks like alright to me.
I encountered the errors after running the following commands:
calculate-cnv -g hg19 -H PM25_100kb.cool --output ../neoloopfinder/detected_cnv.txt -e Arima --cachefolder ../neoloopfinder
segment-cnv --cnv-file ../neoloopfinder/detected_cnv.txt --binsize 100000 --output ../neoloopfinder/segmented_cnv.txt --nproc 12

@cmdoret Interesting ... I think 258M contacts are more than enough for calculate-cnv and segment-cnv to be run at 10kb (and your 10kb map looks good to me), so the matrix sparsity shouldn't be the issue.
If you don't mind, can you share the CNV profiles (of a chromosome with CNV segmentation failed, maybe chr1 at 10kb and chr19 at 100kb) to me ([email protected]), just for debugging. Thanks!
Hello @XiaoTaoWang, thank you for your prompt response. My sample has around 670M contacts.
Thanks for your help @XiaoTaoWang I sent you the files along with the matrix :)
Thanks @cmdoret, I have downloaded your files, however, the funny thing is that I still couldn't reproduce your error when I ran segment-cnv with the file "detected_cnv_100kb.txt" as input. I suspect the only reason might be minor differences between package versions, can you double check the versions of all the dependent packages (especially pomegranate ) to make sure they are the same as the ones listed in https://github.com/XiaoTaoWang/NeoLoopFinder#installation?
Below are details of my test:
command I used:
$ segment-cnv --cnv-file detected_cnv_100kb.txt --ploidy 2 --binsize 100000 --output test.cnv-seg.100k.txt --nproc 4
CNV segments of chr19:
$ cat test.cnv-seg.100k.txt | grep chr19
chr19 0 700000 1
chr19 700000 3600000 2
chr19 3600000 4000000 1
chr19 4000000 14000000 2
chr19 14000000 16300000 3
chr19 16300000 22900000 2
chr19 22900000 23100000 3
chr19 23100000 23700000 2
chr19 23700000 24000000 6
chr19 24000000 24400000 3
chr19 24400000 27800000 1
chr19 27800000 50400000 2
chr19 50400000 54000000 1
chr19 54000000 54700000 2
chr19 54700000 55400000 1
chr19 55400000 59200000 2
Thanks for trying ! I have included a dump of my conda environment below. I did not notice an obvious problem with my package versions, but I agree it does look like a version compatibility bug. Just throwing some ideas here, sorry if I'm wrong (not familiar with pomegranate):
- Is there a specific reason why you require these packages be installed separately, rather than putting them in a
requirements.txtfile and letting pip handle the dependencies ? Maybe it could make things easier. - I could try setting up a Dockerfile to have a fully isolated and predictable environment, but it seems overkill for a python package ^^'.
- I tried installing pomegranate=0.10.0 via pip and it would not let me (I can reproduce it in a brand new conda environment:
conda create -n debug
conda activate debug
pip install pomegranate==0.10.0
Perhaps there is an issue with this version and allowing a more recent version would be safer (and more future proof) ?
channels:
- conda-forge
- bioconda
- defaults
- r
dependencies:
- cooler=0.8.6
- scipy=1.3.1
- networkx=1.11
- pybigwig=0.3.17
- scikit-learn=0.20.2
- joblib=0.13.2
- matplotlib=3.1.1
- python=3.7.1
- cython=0.29.13
- pyensembl=1.8.0
- numpy=1.17.2
- rpy=2-2.9.4
- r=3.5.1
- r-mgcv
- ipython
- pomegranate=0.10.0
Hi @cmdoret I just released a new NeoLoopFinder version (https://github.com/XiaoTaoWang/NeoLoopFinder#release-notes). According to my test, this version is more stable and is compatible with the latest versions of dependent packages. Hope this update solves this issue!
Hello @XiaoTaoWang , I just wanted to note that I'm also seeing this error during segmentation after a fresh install of the environment. I will be troubleshooting as well!
Edit: I can confirm that I have at least one calculate-cnv output file that does run to completion with segment-cnv, and several others resolutions that do not: the only difference is bin size. In case you would like to see them, I've attached 5 files: the CNV bedGraphs, the segment-cnv log files , and the resulting segment file for the successful run. segment-cnv succeeds with the 1MB bin size CNV bedGraph, and fails with the 100k bin size file. I have also tried 10k, 50k, and 5MB, and 10MB bin sizes which appear to fail as well.
1MB_cnv.bedGraph.txt 1MB_segment.log 1MB_segments.bedGraph.txt 100k_cnv.bedGraph.txt 100k_segment.log
Hi @AndrewC160 , sorry for the late response. Thanks for sharing these files with me, they were very helpful for me to troubleshoot the instability of the original CNV segmentation algorithm. I finally solved this problem by changing the training dataset of HMM from chromosome-wide CNV signals to genome-wide CNV signals.
As a result, I released a new version 0.4.2 (https://pypi.org/project/neoloop/). Please upgrade your NeoLoopFinder to this version and try again. For your information, here is the command I used to perform the segmentation on the file "100k_cnv.bedGraph.txt":
$ segment-cnv --cnv-file 100k_cnv.bedGraph.txt --binsize 100000 --output 100k_cnv_seg.bedGraph.txt --num-of-states 3 --min-segment-size 3 --cbs-pvalue 5e-2 --max-dist 1
Note that I've also made key parameters of the segmentation algorithm tunable. The results will look like this.

Let me know if you have any further questions! Thanks!
Xiaotao