ViFi icon indicating copy to clipboard operation
ViFi copied to clipboard

Problems with --disable_hmms

Open stsergbg opened this issue 5 years ago • 6 comments

Hi,

I have encountered problems running ViFi on EBV genome with --disable_hmms. The possible bug leads to 0 clusters in output.clusters.txt and output.clusters.txt.range for several samples clearly having traces of EBV integration. The outputs.trans.bam files contain hundreds of reads though, so that ViFi seems to have successfully identified the integrations.

This is a possible duplicate of issue #4 but the issue doesn't seem to have been answered explicitly. The traceback generated indicates that there's script merge_viral_reads.py that is being run regardless of --disable_hmms option, nevertheless it seems to require the file tmp/temp/reduced.csv that can only be generated in run_hmms.py.

The actual command and the lines of traceback: python ${VIFI_DIR}/scripts/run_vifi.py -f ${FQ1} -r ${FQ2} -o ${vifi_output_dir} -v ebv --cpus 8 --disable_hmms 1

4017.630011 45100000 reads done: #(Trans reads) = 995 38 D7ZQJ5M1:683:C4BGFACXX:6:2315:14661:71501 D7ZQJ5M1:683:C4BGFACXX:6:2315:14367:71714 4026.487438 45200000 reads done: #(Trans reads) = 998 38 D7ZQJ5M1:683:C4BGFACXX:6:2316:9952:35091 D7ZQJ5M1:683:C4BGFACXX:6:2316:9757:35047 Traceback (most recent call last): File "/home/scripts/get_trans_new.py", line 238, in miscFile.write(b) AttributeError: 'NoneType' object has no attribute 'write' [Finished identifying chimeric reads]: 6156.258875 [Cluster and identify integration points]: 6156.258919 scores = read_scores_file(args.reducedName[0]) Traceback (most recent call last): File "/home/scripts/merge_viral_reads.py", line 128, in IOError: [Errno 2] No such file or directory: 'tmp/temp/reduced.csv' File "/home/scripts/merge_viral_reads.py", line 21, in read_scores_file input = open(hmm_file, 'r')
0 [Finished cluster and identify integration points]: 6158.720271

Thank you in advance, Sergei

stsergbg avatar Aug 21 '19 11:08 stsergbg

Sorry for bothering, no update? It seems that this kind of error the software makes ViFi hardly usable for EBV integration analysis.

stsergbg avatar Sep 18 '19 13:09 stsergbg

Not sure if this is still helpful, but I ran into the same issue. Looks like even if HMMs are disabled, ViFi is still looking for some files that would only be produced if they're enabled. I found that if I change lines 155 and 156 of run_vifi.py, which are:

os.system("python %s/scripts/merge_viral_reads.py --unknown %s.unknown.bam --trans %s.trans.bam --reduced tmp/temp/reduced.csv --map tmp/temp/unmapped.map --output %s.fixed.trans.bam" % (vifi_dir, options.prefix, options.prefix, options.prefix))
os.system("samtools sort -m 2G -@ %d %s.fixed.trans.bam > %s.fixed.trans.cs.bam" % (options.cpus, options.prefix, options.prefix))

To the conditional statement:

if options.disable_hmms is False:
	os.system("python %s/scripts/merge_viral_reads.py --unknown %s.unknown.bam --trans %s.trans.bam --reduced tmp/temp/reduced.csv --map tmp/temp/unmapped.map --output %s.fixed.trans.bam" % (vifi_dir, options.prefix, options.prefix, options.prefix))
	os.system("samtools sort -m 2G -@ %d %s.fixed.trans.bam > %s.fixed.trans.cs.bam" % (options.cpus, options.prefix, options.prefix))
else:        
	os.system("samtools sort -m 2G -@ %d %s.trans.bam > %s.fixed.trans.cs.bam" % (options.cpus, options.prefix, options.prefix))

This resolves the error, but I can't vouch for the correctness - perhaps @namphuon could comment?

szsctt avatar Sep 16 '20 01:09 szsctt

Got the same issue as well. Ran into this issue with the @sara-javadzadeh fork as well. Any advice @sara-javadzadeh, @namphuon ?

brownmp avatar Jan 19 '22 21:01 brownmp

Hi Everyone,

Thanks @brownmp for mentioning me. Could you please provide more information on the exact command you used and the output when you run ViFi? Could you also please check the content of output.fixed.trans.bam (output is the default keyword, unless you changed it when running)? Does it include any reads?

What @szsctt's comment mentioned is definitely in the correct direction, but might not fix the issue entirely. I'll be working on this when you provide more information on the ViFi input arguments and output. Just a heads up that I will fixing this issue on my forked repository: https://github.com/sara-javadzadeh/ViFi.

In the meantime, I would like to share that we are working on a faster version of ViFi: https://github.com/sara-javadzadeh/FastViFi. It could be more convenient to run FastViFi on large datasets. Just a note that, we are still working on the manuscript of FastViFi and making FastViFi repository easier to use. FastViFi uses ViFi, more specifically, the forked repo. I'll be maintaining that repo.

We have EBV HMMs available on the forked ViFi repo in case you are interested in using them, but you were not able to generate them.

jsara72 avatar Jan 21 '22 01:01 jsara72

Hello @jsara72!

Thank you so much for the response. I am running your forked updated version. The command I used is as follows:

docker run --rm -it \
-v /Users/mbrown/CTAT/CTAT_VIF/OtherPipelines/ViFi:/usr/local/src/fastq1/ \
-v /Users/mbrown/CTAT/CTAT_VIF/OtherPipelines/ViFi:/usr/local/src/fastq2/ \
-e READ1=reads_1.fastq.gz \
-e READ2=reads_2.fastq.gz \
-v /Users/mbrown/CTAT/CTAT_VIF/OtherPipelines/ViFi/Virus_DB:/usr/local/src/repo/data \
-v /Users/mbrown/CTAT/CTAT_VIF/OtherPipelines/ViFi/OUTPUT:/usr/local/src/output/    \
-e REFERENCE_REPO=/usr/local/src/repo/data \
-v /Users/mbrown/CTAT/CTAT_VIF/OtherPipelines/ViFi/data_repo:/usr/local/src/data_repo/ \
-v /Users/mbrown/CTAT/CTAT_VIF/OtherPipelines/ViFi/ViFi/scripts/run_vifi.py:/usr/local/src/scripts/run_vifi.py \
-v /Users/mbrown/CTAT/CTAT_VIF/OtherPipelines/ViFi/ViFi/scripts/get_trans_new.py:/usr/local/src/scripts/get_trans_new.py \
-v /Users/mbrown/CTAT/CTAT_VIF/OtherPipelines/ViFi/ViFi/scripts/cluster_trans_new.py:/usr/local/src/scripts/cluster_trans_new.py \
-v /Users/mbrown/CTAT/CTAT_VIF/OtherPipelines/ViFi/ViFi/scripts/run_hmms.py:/usr/local/src/scripts/run_hmms.py \
-v /Users/mbrown/CTAT/CTAT_VIF/OtherPipelines/ViFi/ViFi/scripts/merge_viral_reads.py:/usr/local/src/scripts/merge_viral_reads.py \
-v /Users/mbrown/CTAT/CTAT_VIF/OtherPipelines/ViFi/ViFi/scripts:/usr/local/src/scripts \
-e VIFI_DIR=/usr/local/src/ViFi \
-e AA_DATA_REPO=/usr/local/src/data_repo \
brownmp/vifi:devel \
python /usr/local/src/scripts/run_vifi.py \
-f /usr/local/src/fastq1/reads_1.fastq.gz \
-r /usr/local/src/fastq2/reads_2.fastq.gz \
--virus all \
-c 4 --threshold 0.020000 \
-o /usr/local/src/output/ \
-p output \
--disable_hmms True

I am using my own virus fasta file located in the directory all.

The file output.fixed.trans.cs.bam is empty as the issue comes during the merge_viral_reads.py step

~ @brownmp

brownmp avatar Jan 21 '22 15:01 brownmp

Hi @brownmp

I pushed a commit to my forked ViFi repository: https://github.com/sara-javadzadeh/ViFi/commit/50356630581cb376782f04e3d72c9a29f43a7fb8

Please fetch the code and try again. Let me know if it works for you, and please mention my ID. Thanks!

Sara

sara-javadzadeh avatar Jan 26 '22 03:01 sara-javadzadeh