MoMI-G icon indicating copy to clipboard operation
MoMI-G copied to clipboard

Error running vcf2xg.sh on c. elegans

Open moldach opened this issue 5 years ago • 22 comments

First off, I have to say great job on the tool! I've played around with the practice data and now I'm very excited to try this tool out and show it to my laboratory. I've been evaluating ~10 tools for looking at SVs and yours by-far looks the most promising!

I'm getting an error trying to convert a .VCF to .xg using the vcf2xg.sh script on the reference genome (C. elegans) used for alignment.

(base):~/TEST-MOMIG$ sudo bash ~/MoMI-G/scripts/vcf2xg.sh \
    ~/projects/maddog/CNVnator/470.vcf  \
    momig-vcf-test-output ~/vg \
    c_elegans.PRJNA13758.WS265.genomic.fa.gz

Traceback (most recent call last):
	3: from /home/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:10:in `<main>'
	2: from /home/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:10:in `each'
	1: from /home/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:89:in `block in <main>'
/home/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:89:in `+': no implicit conversion of nil into String (TypeError)
+ input=.//momig-vcf-test-output.pcf
+ reference=c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ pref=.//momig-vcf-test-output.pcf
+ sort -t, -k 7n .//momig-vcf-test-output.pcf
+ sed -e 's/[ ]*//g'
+ awk -F '[,:]' '{print $1,"\t",$2,"\n",$4,"\t",$5}' /dev/fd/63
+ sort -k 1,1 -k 2,2n
+ uniq
++ sed 1d .//momig-vcf-test-output.pcf
+ '[' '!' -s c_elegans.PRJNA13758.WS265.genomic.fa.gz.fai ']'
++ dirname /home/mtg/MoMI-G/scripts/vcf2gfa/pcf2gfa.sh
+ ruby /home/mtg/MoMI-G/scripts/vcf2gfa/json_to_breakpoint_list.rb .//momig-vcf-test-output.pcf.bp.tsv c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ cat .//momig-vcf-test-output.pcf.bp.tsv breakpoint_list_tmp.tsv
+ sort -k 1,1 -k 2,2n
+ uniq
++ dirname /home/MoMI-G/scripts/vcf2gfa/pcf2gfa.sh
+ ruby /home/MoMI-G/scripts/vcf2gfa/gfa_generator.rb .//momig-vcf-test-output.pcf.bp.merged.tsv .//momig-vcf-test-output.pcf.output.pcf c_elegans.PRJNA13758.WS265.genomic.fa.gz
[W::fai_get_val] Reference :0-270000000 not found in FASTA file, returning empty sequence
[faidx] Failed to fetch sequence in :0-270000000
ERROR: Signal 11 occurred. VG has crashed. Run 'vg bugs --new' to report a bug.
Stack trace path: /tmp/vg_crash_YNr9xC/stacktrace.txt
Please include the stack trace file in your bug report!
error[VPKG::load_one]: Correct input type not found while loading handlegraph::MutablePathDeletableHandleGraph
error[VPKG::load_one]: Correct input type not found while loading handlegraph::HandleGraph

I tried another call set (Delly) to see if it's something specific to the format of VCF but got a similar error:

(base) mtg@mtg-ThinkPad-P53:~/TEST-MOMIG$ sudo bash ~/MoMI-G/scripts/vcf2xg.sh ~/projects/maddog/delly/delly.vcf  momig-vcf-test-output ~/vg c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ input=.//momig-vcf-test-output.pcf
+ reference=c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ pref=.//momig-vcf-test-output.pcf
+ sort -t, -k 7n .//momig-vcf-test-output.pcf
+ sed -e 's/[ ]*//g'
+ awk -F '[,:]' '{print $1,"\t",$2,"\n",$4,"\t",$5}' /dev/fd/63
+ sort -k 1,1 -k 2,2n
+ uniq
++ sed 1d .//momig-vcf-test-output.pcf
+ '[' '!' -s c_elegans.PRJNA13758.WS265.genomic.fa.gz.fai ']'
++ dirname /home/MoMI-G/scripts/vcf2gfa/pcf2gfa.sh
+ ruby /home/MoMI-G/scripts/vcf2gfa/json_to_breakpoint_list.rb .//momig-vcf-test-output.pcf.bp.tsv c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ cat .//momig-vcf-test-output.pcf.bp.tsv breakpoint_list_tmp.tsv
+ sort -k 1,1 -k 2,2n
+ uniq
++ dirname /home/MoMI-G/scripts/vcf2gfa/pcf2gfa.sh
+ ruby /home/MoMI-G/scripts/vcf2gfa/gfa_generator.rb .//momig-vcf-test-output.pcf.bp.merged.tsv .//momig-vcf-test-output.pcf.output.pcf c_elegans.PRJNA13758.WS265.genomic.fa.gz
[faidx] Truncated sequence: I:15066834-270000000
[faidx] Truncated sequence: II:15124909-270000000
[faidx] Truncated sequence: III:13778981-270000000
[faidx] Truncated sequence: IV:17493322-270000000
[faidx] Truncated sequence: MtDNA:13794-270000000
[faidx] Truncated sequence: V:20916455-270000000
[faidx] Truncated sequence: X:17629941-270000000
ERROR: Signal 11 occurred. VG has crashed. Run 'vg bugs --new' to report a bug.
Stack trace path: /tmp/vg_crash_Vnjilw/stacktrace.txt
Please include the stack trace file in your bug report!
error[VPKG::load_one]: Correct input type not found while loading handlegraph::MutablePathDeletableHandleGraph
error[VPKG::load_one]: Correct input type not found while loading handlegraph::HandleGraph

The stack-trace from Delly is:

Crash report for vg v1.23.0 "Lavello"
Stack trace (most recent call last):
#8    Object "/home/vg", at 0x4dd279, in _start
#7    Object "/home/vg", at 0x1bb8ec8, in __libc_start_main
#6    Object "/home/vg", at 0x40aed7, in main
#5    Object "/home/vg", at 0x9e77c7, in vg::subcommand::Subcommand::operator()(int, char**) con>
#4    Object "/home/vg", at 0x9f6a61, in main_view(int, char**)
#3    Object "/home/vg", at 0xa6c4ab, in vg::get_input_file(std::__cxx11::basic_string<char, std>
#2    Object "/home/vg", at 0x9f2ad5, in std::_Function_handler<void (std::istream&), main_view(>
#1    Object "/home/vg", at 0xbc103b, in vg::gfa_to_graph(std::istream&, vg::VG*, bool)
#0    Object "/home/vg", at 0x12e3f20, in stPinchThread_getLast

Any idea what could be happening here? One thought is that chromosome in H. sapiens are usuallly labeled chr1 but in C. elegans they are labeled (I, II, III, IV, V, X, MtDNA) - not sure if this information helps?

As for the Reference :0-270000000 part I have no idea of the context of this number: 270000000? It seems like a very round number? The entire genome of C. elegans is ~100286401bp.

moldach avatar May 01 '20 23:05 moldach

Thank you for the feedback! As you said, we regarded the input as H. sapiens and thus we used 270000000 as the maximum length as temporally, but, actually we can calculate the actual length. I'll fix it. The latter issue needs further investigation because I have not used Delly as an input. I will tell you if I know something.

6br avatar May 02 '20 02:05 6br

I fixed the truncated reference problem. For the latter data, the gfa file seems to be generated at least. https://github.com/sjackman/gfalint can be used for validation of gfa file.

6br avatar May 05 '20 05:05 6br

Okay I've remove MoMI-G and pulled in the new version with git clone and then re-ran for Delly.

(base) mtg@mtg-ThinkPad-P53:~/TEST-MOMIG$ sudo bash ~/MoMI-G/scripts/vcf2xg.sh ~/mtg-lab-projects/maddog/delly/delly.vcf  momig-vcf-test-output ~/vg c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ input=.//momig-vcf-test-output.pcf
+ reference=c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ pref=.//momig-vcf-test-output.pcf
+ sort -t, -k 7n .//momig-vcf-test-output.pcf
+ sed -e 's/[ ]*//g'
+ awk -F '[,:]' '{print $1,"\t",$2,"\n",$4,"\t",$5}' /dev/fd/63
+ sort -k 1,1 -k 2,2n
++ sed 1d .//momig-vcf-test-output.pcf
+ uniq
+ '[' '!' -s c_elegans.PRJNA13758.WS265.genomic.fa.gz.fai ']'
++ dirname /home/mtg/MoMI-G/scripts/vcf2gfa/pcf2gfa.sh
+ ruby /home/mtg/MoMI-G/scripts/vcf2gfa/json_to_breakpoint_list.rb .//momig-vcf-test-output.pcf.bp.tsv c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ cat .//momig-vcf-test-output.pcf.bp.tsv breakpoint_list_tmp.tsv
+ sort -k 1,1 -k 2,2n
+ uniq
++ dirname /home/mtg/MoMI-G/scripts/vcf2gfa/pcf2gfa.sh
+ ruby /home/mtg/MoMI-G/scripts/vcf2gfa/gfa_generator.rb .//momig-vcf-test-output.pcf.bp.merged.tsv .//momig-vcf-test-output.pcf.output.pcf c_elegans.PRJNA13758.WS265.genomic.fa.gz
ERROR: Signal 11 occurred. VG has crashed. Run 'vg bugs --new' to report a bug.
Stack trace path: /tmp/vg_crash_A71OY2/stacktrace.txt
Please include the stack trace file in your bug report!
error[VPKG::load_one]: Correct input type not found while loading handlegraph::MutablePathDeletableHandleGraph
error[VPKG::load_one]: Correct input type not found while loading handlegraph::HandleGraph

and the stack-trace:

Crash report for vg v1.23.0 "Lavello"
Stack trace (most recent call last):
#8    Object "/home/mtg/vg", at 0x4dd279, in _start
#7    Object "/home/mtg/vg", at 0x1bb8ec8, in __libc_start_main
#6    Object "/home/mtg/vg", at 0x40aed7, in main
#5    Object "/home/mtg/vg", at 0x9e77c7, in vg::subcommand::Subcommand::operator()(int, char**) const
#4    Object "/home/mtg/vg", at 0x9f6a61, in main_view(int, char**)
#3    Object "/home/mtg/vg", at 0xa6c4ab, in vg::get_input_file(std::__cxx11::basic_string<char, std:>
#2    Object "/home/mtg/vg", at 0x9f2ad5, in std::_Function_handler<void (std::istream&), main_view(i>
#1    Object "/home/mtg/vg", at 0xbc103b, in vg::gfa_to_graph(std::istream&, vg::VG*, bool)
#0    Object "/home/mtg/vg", at 0x12e3f20, in stPinchThread_getLast

It's doesn't look like it generated the .gfa file (although there is a .ggf?)

I have the following call sets if you are more familiar with one of them we can try that: CNVnator, DeepVariant, Delly, GRIDSS, Lumpy, Manta, MindTheGap, NGSep, Pindel, Tardis.

moldach avatar May 06 '20 15:05 moldach

We defined .ggf as a subset of .gfa, so gfalint <output.ggf works. Sorry I am not familiar with these SV callers since they are for Illumina.

6br avatar May 07 '20 10:05 6br

I've posted a follow up in the gfalint to show the output of MoMI-G. If you'd like a more detailed error message I can try a couple other tools:

"You may wish to try another tool that may give a better error message, such as perhaps Bandage if it's GFA1, or Gfapy for either GFA1 or GFA2."

Which format is this in?

I tried Gfapy just now but ran into an issue

Any idea why this odd output is being produced by MoMI-G?

moldach avatar May 21 '20 01:05 moldach

Is it possible to try it again on the latest MoMI-G master? MoMI-G tools output is compatible with GFA1 format. I'm sorry that I have no idea why gfapy does not work.

6br avatar May 21 '20 04:05 6br

Okay making some progress now.

I've tried again off of the latest MoMI-G master (git clone). There is no warning issued if the user forgets to install vg (I did). However, I think I've now done that properly.

cd ~/projects/TEST-MOMIG
wget https://github.com/vgteam/vg/releases/download/v1.24.0/vg
chmod +x vg

And test it's working:

(base) mtg@mtg-ThinkPad-P53:~/projects/TEST-MOMIG$ ./vg
vg: variation graph tool, version v1.24.0 "Montieri"

usage: ./vg <command> [options]

main mapping and calling pipeline:
  -- construct     graph construction
  -- index         index graphs or alignments for random access or mapping
  -- map           MEM-based read alignment
  -- augment       augment a graph from an alignment
  -- pack          convert alignments to a compact coverage index
  -- call          call or genotype VCF variants
  -- help          show all subcommands

For more commands, type `vg help`.
For technical support, please visit: https://www.biostars.org/t/vg/

Now I run MoMI-G:

(base) mtg@mtg-ThinkPad-P53:~/projects/TEST-MOMIG$ sudo bash ~/bin/MoMI-G/scripts/vcf2xg.sh ~/projects/maddog/CNVnator/470.vcf momig-test-output ./vg c_elegans.PRJNA13758.WS265.genomic.fa.gz
Traceback (most recent call last):
	3: from /home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:10:in `<main>'
	2: from /home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:10:in `each'
	1: from /home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:90:in `block in <main>'
/home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:90:in `+': no implicit conversion of nil into String (TypeError)
	4: from /home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:10:in `<main>'
	3: from /home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:10:in `each'
	2: from /home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:89:in `block in <main>'
	1: from /home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:92:in `rescue in block in <main>'
/home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:92:in `+': no implicit conversion of Array into String (TypeError)
+ input=.//momig-test-output.pcf
+ reference=c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ pref=.//momig-test-output.pcf
+ sort -t, -k 7n .//momig-test-output.pcf
+ sed -e 's/[ ]*//g'
+ awk -F '[,:]' '{print $1,"\t",$2,"\n",$4,"\t",$5}' /dev/fd/63
+ uniq
++ sed 1d .//momig-test-output.pcf
+ sort -k 1,1 -k 2,2n
+ '[' '!' -s c_elegans.PRJNA13758.WS265.genomic.fa.gz.fai ']'
++ dirname /home/mtg/bin/MoMI-G/scripts/vcf2gfa/pcf2gfa.sh
+ ruby /home/mtg/bin/MoMI-G/scripts/vcf2gfa/json_to_breakpoint_list.rb .//momig-test-output.pcf.bp.tsv c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ cat .//momig-test-output.pcf.bp.tsv breakpoint_list_tmp.tsv
+ sort -k 1,1 -k 2,2n
+ uniq
++ dirname /home/mtg/bin/MoMI-G/scripts/vcf2gfa/pcf2gfa.sh
+ ruby /home/mtg/bin/MoMI-G/scripts/vcf2gfa/gfa_generator.rb .//momig-test-output.pcf.bp.merged.tsv .//momig-test-output.pcf.output.pcf c_elegans.PRJNA13758.WS265.genomic.fa.gz


(base) mtg@mtg-ThinkPad-P53:~/projects/TEST-MOMIG$ ll
total 430272
drwxrwxr-x 4 mtg  mtg       4096 May 26 11:21 ./
drwxrwxr-x 7 mtg  mtg       4096 May 22 10:49 ../
-rw-r--r-- 1 root root     21912 May 26 11:21 breakpoint_list_tmp.tsv
-rwxrwx--- 1 mtg  mtg  101957874 May  1 16:05 c_elegans.PRJNA13758.WS265.genomic.fa*
-rwxrwx--- 1 mtg  mtg        181 May  1 16:01 c_elegans.PRJNA13758.WS265.genomic.fa.fai*
-rw-r----- 1 mtg  mtg   30726020 May  1 16:00 c_elegans.PRJNA13758.WS265.genomic.fa.gz
-rw-r--r-- 1 root root       181 May  1 16:00 c_elegans.PRJNA13758.WS265.genomic.fa.gz.fai
-rw-r--r-- 1 root root     24984 May  1 16:00 c_elegans.PRJNA13758.WS265.genomic.fa.gz.gzi
drwxrwxr-x 6 mtg  mtg       4096 May  6 09:27 gfalint/
-rw-r--r-- 1 root root 100334394 May 26 11:21 momig-test-output.ggf
-rw-r--r-- 1 root root       112 May 26 11:21 momig-test-output.json
-rw-r--r-- 1 root root       113 May 26 11:21 momig-test-output.pcf
-rw-r--r-- 1 root root     21912 May 26 11:21 momig-test-output.pcf.bp.merged.tsv
-rw-r--r-- 1 root root         0 May 26 11:21 momig-test-output.pcf.bp.tsv
-rw-r--r-- 1 root root       113 May 26 11:21 momig-test-output.pcf.output.pcf
-rw-r--r-- 1 root root 104128225 May 26 11:21 momig-test-output.vg
-rw-r--r-- 1 root root  63365361 May 26 11:21 momig-test-output.xg
drwxrwxr-x 3 mtg  mtg       4096 May 20 19:35 src/
-rwxrwxr-x 1 mtg  mtg   39823464 May 11 16:12 vg*

Should I be concerned about the TypeError's?

moldach avatar May 26 '20 17:05 moldach

Because the momig-test-output.xg file is there it looks like vg worked. I'm now wondering about the next step.

I need to create a static/ directory in the current working directory and then move the momig-test-output.xg and momig-test-output.pcf into static/ correct?

What about the config.yaml: a configuration file? What is this supposed to look like?

moldach avatar May 26 '20 17:05 moldach

Actually there is a line in vcf that is not parsed by momig-tools, and I intend to output error message when it encounters such a line. But it won't work now due to a bug. I suppose the output pcf file might be insufficient. I'll fix it soon.

6br avatar May 27 '20 07:05 6br

After that, this configuration YAML file is needed to be loaded as the next step. https://momi-g.readthedocs.io/en/latest/configure_file.html The instruction is as follows. https://momi-g.readthedocs.io/en/latest/quick_start.html#how-to-use-your-own-custom-data

  1. git clone from https://github.com/MoMI-G/MoMI-G.
  2. mkdir static on the directory MoMI-G clones.
  3. Put your XG data in the static folder like static/data.xg.
  4. Add configure file on static/config.yaml.
  5. Modify Docker.backend file.
  6. Build backend server and run.

The fastest way is to copy this config file and change the file name of xg in the following config.yaml. https://github.com/MoMI-G/MoMI-G_backend/blob/master/sample/chm1/config.yaml

6br avatar May 27 '20 08:05 6br

Is it possible for you to try it on the latest master and tell me the output of MoMI-G script? I suppose the lines starting with Error line: would be output.

6br avatar May 27 '20 08:05 6br

Actually there is a line in vcf that is not parsed by momig-tools, and I intend to output error message when it encounters such a line. But it won't work now due to a bug. I suppose the output pcf file might be insufficient. I'll fix it soon ... Is it possible for you to try it on the latest master and tell me the output of MoMI-G script? I suppose the lines starting with Error line: would be output.

Sure I can try that:

(base) mtg@mtg-ThinkPad-P53:~/bin$ rm -rf MoMI-G/
(base) mtg@mtg-ThinkPad-P53:~/bin$ git clone https://github.com/MoMI-G/MoMI-G
Cloning into 'MoMI-G'...
remote: Enumerating objects: 43, done.
remote: Counting objects: 100% (43/43), done.
remote: Compressing objects: 100% (31/31), done.
remote: Total 918 (delta 28), reused 20 (delta 12), pack-reused 875
Receiving objects: 100% (918/918), 5.75 MiB | 1.12 MiB/s, done.
Resolving deltas: 100% (598/598), done.
(base) mtg@mtg-ThinkPad-P53:~/bin$ cd ~/projects/TEST-MOMIG
(base) mtg@mtg-ThinkPad-P53:~/projects/TEST-MOMIG$ sudo bash ~/bin/MoMI-G/scripts/vcf2xg.sh ~/projects/maddog/CNVnator/470.vcf momig-test-output ./vg c_elegans.PRJNA13758.WS265.genomic.fa.gz
[sudo] password for mtg: 
Traceback (most recent call last):
	3: from /home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:10:in `<main>'
	2: from /home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:10:in `each'
	1: from /home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:90:in `block in <main>'
/home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:90:in `+': no implicit conversion of nil into String (TypeError)
	3: from /home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:10:in `<main>'
	2: from /home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:10:in `each'
	1: from /home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:89:in `block in <main>'
/home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:92:in `rescue in block in <main>': Error line: I	4282001	470_CNVnator_dup_1		4295000	.	13000	DUP	GT:CN	./1:2 (RuntimeError)
+ input=.//momig-test-output.pcf
+ reference=c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ pref=.//momig-test-output.pcf
+ sort -t, -k 7n .//momig-test-output.pcf
+ sed -e 's/[ ]*//g'
+ awk -F '[,:]' '{print $1,"\t",$2,"\n",$4,"\t",$5}' /dev/fd/63
+ sort -k 1,1 -k 2,2n
++ sed 1d .//momig-test-output.pcf
+ uniq
+ '[' '!' -s c_elegans.PRJNA13758.WS265.genomic.fa.gz.fai ']'
++ dirname /home/mtg/bin/MoMI-G/scripts/vcf2gfa/pcf2gfa.sh
+ ruby /home/mtg/bin/MoMI-G/scripts/vcf2gfa/json_to_breakpoint_list.rb .//momig-test-output.pcf.bp.tsv c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ cat .//momig-test-output.pcf.bp.tsv breakpoint_list_tmp.tsv
+ sort -k 1,1 -k 2,2n
+ uniq
++ dirname /home/mtg/bin/MoMI-G/scripts/vcf2gfa/pcf2gfa.sh
+ ruby /home/mtg/bin/MoMI-G/scripts/vcf2gfa/gfa_generator.rb .//momig-test-output.pcf.bp.merged.tsv .//momig-test-output.pcf.output.pcf c_elegans.PRJNA13758.WS265.genomic.fa.gz

So this is the error output:

Error line: I 4282001 470_CNVnator_dup_1 4295000 . 13000 DUP GT:CN ./1:2 (RuntimeError)

moldach avatar May 27 '20 19:05 moldach

Okay so I now have a config.yaml file that looks like this:

bin:
  vg: "vg"
  vg_tmp: "vg"
  vg_volume_prefix: ""
  graphviz: "dot"
  fa22bit: "faToTwoBit"
  bigbed: "bedToBigBed"
reference:
  chroms: "static/GRCh.json"
  data:
    - name: "ce11"
      features:
        - name: 'gene_annotation'
          url: "static/c_elegans.PRJNA13758.WS265.annotations.gff3"
          chr_prefix: ""
data:
  - name: "maddog"
    desc: "2020-05-27"
    chr_prefix: ""
    ref_id: "ce11"
    source:
      xg: "static/Celegans.xg"
      csv: "static/Celegans.pcf"
      gam: ""
      gamindex: ""
    features: []
    static_files: []

However, I cannot find an example of GRCh.json file that includes the color and length for every chromosome. Could you please share an example so I can change for my species, thank you.

import json
data = {}
data['chromosome'] = []

data['chromosome'].append({
 'name': 'I",
 'length': 15072434,
 'color': '#4477AA'
})
data['chromosome'].append({
 'name': 'II",
 'length': ,15279421
 'color': '#66CCEE'
})

...

moldach avatar May 28 '20 00:05 moldach

Thank you for reporting an error. I'll check it. This is an example of chromosome length and color json file. https://raw.githubusercontent.com/MoMI-G/MoMI-G_backend/master/sample/chm1/GRCh.json

6br avatar May 28 '20 03:05 6br

I suspect this line is invalid. I 4282001 470_CNVnator_dup_1 4295000 . 13000 DUP GT:CN ./1:2

Usually, vcf file is described as a tab-delimited text as #CHROM POS ID REF ALT QUAL FILTER INFO, but the previous line does not have any REF field (because there are two tab characters between 470_CNVnator_dup_1 and 4295000), which makes the script fail. That's why MoMI-G tools cannot handle well in this VCF dialect. Do you know where is the specification of this VCF variant?

6br avatar May 28 '20 03:05 6br

I suspect this line is invalid. I 4282001 470_CNVnator_dup_1 4295000 . 13000 DUP GT:CN ./1:2

Luckily I have a number of variant call sets from a variety of tools so we can tell if this is unique to CNVnator or not:

  • Pindel: Error line: II5008800 . 5008801 . 1 GT:AD 0/0:32,2 (RuntimeError)
  • Delly: Another error (save this for another issue)
  • GRIDSS: Error line: I 643 gridss0b_1b 659.21 0 GT:ASQ:ASRP:ASSR:BANRP:BANRPQ:BANSR:BANSRQ:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF .:0.00:0:0:0:0.00:0:0.00:0.00:0:0:659.21:0:0.00:26:659.21:26:0.00:0:0.00:0.00:0.00:42:40:0:0.00:0:0.00:0 (RuntimeError)
  • Lumpy: Error line: I 862 555_1 . 0 V:1105109] GT:SU:PE:SR ./.:7:7:0 (RuntimeError)
  • Error line: I 229193 MantaBND:1:0:1:0:0:0:0 . 0 IV:6006459] (RuntimeError)
  • MindTheGap: For the othervariants.vcf file - Error line: I 15759 bkpt1 . 0 GT 1/1 (RuntimeError)
  • NGSep: Error line: I 855 . 27 0 GT:PL:GQ:DP:BSDP:ACN 0/1:252,193,2481:27:64:0,6,58,0:1,1 (RuntimeError)
  • Tardis: Error line: I 230619 vh_del_2 230730 255 112 DEL GT:CNVL:RP:SR 0/1:0.053729:10:6 (RuntimeError)

For all of the VCF files I have I seem to get an error.

Do you know where is the specification of this VCF variant?

No, I cannot find it on their documentation

moldach avatar May 28 '20 18:05 moldach

These errors might be caused due to different reasons. MoMI-G tools does not support Illumina SV callers explicitly. Is it possible to normalize each vcf file with https://github.com/fritzsedlazeck/SURVIVOR before running vcf2xg? (e.g. run SURVIVOR merge file_list.txt 0.1 1 1 1 1 50, and file_list.txt just includes the file name of vcf).

6br avatar May 29 '20 08:05 6br

I was just working with SURVIVOR the other day so it was quick to generate:

From file_list.txt that looks like:

breakdancer.vcf
cnvnator.vcf
deepVariant.vcf
delly.vcf
gridss.vcf
lumpy.vcf
manta.vcf
mindTheGap.vcf
ngsep.vcf
pindel.vcf
tardis.vcf

Merge:

/bin/SURVIVOR-master/Debug/SURVIVOR merge file_list.txt 0.1 1 1 1 1 50 survivor_merged.vcf
merging entries: 59
merging entries: 14
merging entries: 197
merging entries: 1194
merging entries: 2180
merging entries: 762
merging entries: 920
merging entries: 203
merging entries: 0
merging entries: 4917
merging entries: 747

Lot of errors from this:

(base):~/projects/TEST-MOMIG$ sudo bash ~/bin/MoMI-G/scripts/vcf2xg.sh survivor_merged.vcf survivor-concensus ./vg c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ input=.//survivor-concensus.pcf
+ reference=c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ pref=.//survivor-concensus.pcf
+ sort -t, -k 7n .//survivor-concensus.pcf
+ sed -e 's/[ ]*//g'
+ awk -F '[,:]' '{print $1,"\t",$2,"\n",$4,"\t",$5}' /dev/fd/63
+ sort -k 1,1 -k 2,2n
++ sed 1d .//survivor-concensus.pcf
+ uniq
+ '[' '!' -s c_elegans.PRJNA13758.WS265.genomic.fa.gz.fai ']'
++ dirname /home/mtg/bin/MoMI-G/scripts/vcf2gfa/pcf2gfa.sh
+ ruby /home/mtg/bin/MoMI-G/scripts/vcf2gfa/json_to_breakpoint_list.rb .//survivor-concensus.pcf.bp.tsv c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ cat .//survivor-concensus.pcf.bp.tsv breakpoint_list_tmp.tsv
+ sort -k 1,1 -k 2,2n
+ uniq
++ dirname /home/mtg/bin/MoMI-G/scripts/vcf2gfa/pcf2gfa.sh
+ ruby /home/mtg/bin/MoMI-G/scripts/vcf2gfa/gfa_generator.rb .//survivor-concensus.pcf.bp.merged.tsv .//survivor-concensus.pcf.output.pcf c_elegans.PRJNA13758.WS265.genomic.fa.gz
[W::fai_fetch] Reference chrI:0-838 not found in FASTA file, returning empty sequence
Failed to fetch sequence in chrI:0-838
[W::fai_fetch] Reference chrI:839-861 not found in FASTA file, returning empty sequence
Failed to fetch sequence in chrI:839-861
[W::fai_fetch] Reference chrI:862-990 not found in FASTA file, returning empty sequence
Failed to fetch sequence in chrI:862-990
[W::fai_fetch] Reference chrI:991-994 not found in FASTA file, returning empty sequence
Failed to fetch sequence in chrI:991-994
[W::fai_fetch] Reference chrI:995-1128 not found in FASTA file, returning empty sequence
Failed to fetch sequence in chrI:995-1128

Tried using SURVIVOR to just merge two files, delly and pindel, and I get the same error.

moldach avatar May 29 '20 18:05 moldach

SURVIVOR seems to add "chr" as a prefix of each reference genome to normalize, but the original reference genome is started with I, so chrI is failed to detect. Anyway I am looking for the way to normalize VCF file from various kinds of SV callers.

6br avatar May 30 '20 13:05 6br

Please update SURVIVOR.

I am not 100% sure which subcommand of SURIVOR is used, but I assume the merge . This behavior in merge has been updated a couple of months ago.

Thanks Fritz

fritzsedlazeck avatar Jun 12 '20 22:06 fritzsedlazeck

I also have similar error, like this:

+ input=.//genomes.23.pcf
+ reference=../westar.fa
+ pref=.//genomes.23.pcf
+ sort -t, -k 7n .//genomes.23.pcf
+ sed -e 's/[ ]*//g'
+ awk -F '[,:]' '{print $1,"\t",$2,"\n",$4,"\t",$5}' /dev/fd/63
+ sort -k 1,1 -k 2,2n
++ sed 1d .//genomes.23.pcf
+ uniq
+ '[' '!' -s ../westar.fa.fai ']'
++ dirname ./scripts/vcf2gfa/pcf2gfa.sh
+ ruby ./scripts/vcf2gfa/json_to_breakpoint_list.rb .//genomes.23.pcf.bp.tsv ../westar.fa
+ cat .//genomes.23.pcf.bp.tsv breakpoint_list_tmp.tsv
+ sort -k 1,1 -k 2,2n
+ uniq
++ dirname ./scripts/vcf2gfa/pcf2gfa.sh
+ ruby ./scripts/vcf2gfa/gfa_generator.rb .//genomes.23.pcf.bp.merged.tsv .//genomes.23.pcf.output.pcf ../westar.fa
./scripts/vcf2gfa/gfa_generator.rb:42: warning: Insecure world writable dir /public in PATH, mode 040777
[W::fai_get_val] Reference :0- not found in FASTA file, returning empty sequence
[faidx] Failed to fetch sequence in :0-
error:[vg view] Input GFA is not acceptable.
error:[gfa_to_handle_graph] Found edge record with missing source name
error[VPKG::load_one]: Correct input type not found while loading handlegraph::MutablePathDeletableHandleGraph
error[VPKG::load_one]: Correct input type not found while loading handlegraph::HandleGraph

And my vcf files were merged with bcftools merge.

Bobo22331 avatar Jul 14 '20 01:07 Bobo22331

Thank you for reporting. I assume some records in the vcf file are not supported in vcf2vg. I updated the way to report errors on ./scripts/vcf2gfa/gfa_generator.rb.

6br avatar Jul 14 '20 06:07 6br