vg vg autoindex needs a --buffer-size parameter for vg gbwt

vg autoindex needs a --buffer-size parameter for vg gbwt

Open brettChapman opened this issue 2 years ago • 86 comments

While running vg autoindex I get complaints about the sequence length being too long. I had the same problem when running vg gbwt, so I set the buffer size to 1000.

Could the same parameter be set with vg autoindex?

I'm running vg version 1.35.

Thanks.

Oct 20 '21 02:10 brettChapman

Is there a secret menu with autoindex where I can access these hidden parameters, like was mentioned here: https://github.com/vgteam/vg/issues/3303

Otherwise, how would I break down the steps in autoindex? Does autoindex provide a means to perform a dryrun by outputting all commands it will use?

Oct 27 '21 06:10 brettChapman

vg autoindex does some things that are not easily replicated manually, such as determining the number of parallel GBWT construction jobs based on estimated memory requirements and available CPU cores. Adding a buffer size parameter for GBWT construction (or estimating it from reference path lengths) would be straightforward, but it would break our current memory usage estimates.

If you want to build indexes for vg giraffe, the first step is building a GBZ file (graph + GBWT). This is usually the hardest part. The exact steps depend on your input and on whether you also want to build other graphs / indexes.

If you have a GFA file of a reasonable size with a reasonable number of haplotypes as P-lines and/or W-lines, you can do this with vg gbwt -g graph.gbz --gbz-format -G graph.gfa. This also determines a good buffer size automatically, chops the GFA segments into at most 1024 bp nodes, and stores a translation table between segment names and (ranges of) node ids in the GBZ file. Things are easiest when the GFA file has the reference genome as P-lines and other haplotypes as W-lines. If the GFA file is too large for a single GBWT construction job, there is no solution at the moment, as we have not seen such GFA files yet.

The GBZ graph can be converted into other graph formats with commands such as vg convert -x -Z graph.gbz > graph.xg.

If you built the GBZ graph from GFA with vg gbwt, it now serves as baseline graph. If you built the graph and the GBWT separately from another input type, those files are your baseline. You should never touch the input again, as other graphs / indexes built from the input may be incompatible with what you already have. All subsequent graphs and indexes should be descendants of the baseline.

Once you have the GBZ file, you can find snarls and build the distance index and the minimizer index with:

vg snarls -T graph.gbz > graph.snarls
vg index -s graph.snarls -j graph.dist graph.gbz
vg minimizer -o graph.min -d graph.dist graph.gbz

Snarl finding can use a lot of memory, while the other commands should have more reasonable requirements. You can reduce the memory usage by splitting the graph into multiple parts, each of them corresponding to one or more graph components. For example:

rm -f graph.snarls
for i in $(seq 1 22; echo X; echo Y); do
    vg snarls -T chr${i}.vg >> graph.snarls
done

Splitting the graph by component should be possible with vg chunk, but I have never used the command myself. You may also get a huge number of files if there are many small components/contigs in the graph.

Things get a bit more complicated if you have hundreds/thousands of haplotypes. You may then want to build a subsampled GBZ with vg gbwt -l and use that GBZ with vg minimizer and vg giraffe (but not in other commands).

Oct 28 '21 03:10 jltsiren

I tried generating a GBZ graph, generating snarls and then indexing. However my job gets killed at the indexing step:

line 25: 1527790 Killed                  singularity exec --bind ${PWD}:${PWD} ${VG_IMAGE} vg index -t 2 -b ${tmp_dir} -s ${SNARLS} -j ${DIST} ${PANGENOME_GBZ}
error[VPKG::load_one]: Could not open barley_pangenome_graph.dist while loading vg::MinimumDistanceIndex

I ran dmesg -T| grep -E -i -B100 'killed process' to see why, and I get this:

[Sat Oct 30 16:47:27 2021] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,global_oom,task_memcg=/user.slice/user-1001.slice/session-36.scope,task=vg,pid=1527810,uid=1001
[Sat Oct 30 16:47:27 2021] Out of memory: Killed process 1527810 (vg) total-vm:2297177832kB, anon-rss:1576888580kB, file-rss:44kB, shmem-rss:0kB, UID:1001 pgtables:3401188kB oom_score_adj:0

It looks like it's requesting 2.2TB of virtual memory, when I only have 1.4TB of RAM and 2GB of swap space. Apart from increasing my swap space to over 1TB, is there another way around indexing which isn't so taxing on memory? Thanks.

Nov 01 '21 01:11 brettChapman

To provide context. I only have pseudomolecules, so no contigs, only chromosomes 1 to 7, and I only have 20 haplotypes in the graph.

Nov 01 '21 05:11 brettChapman

It looks like the issue may be with the graph itself. You may want to visualize the graph to see if the overall structure looks reasonable or if there are large tangled subgraphs that would make distance index construction expensive. The PGGB team has spent a lot of effort trying to build useful human pangenome graphs, and they may be able to help you with parameter choices if you try rebuilding the graph.

Nov 01 '21 20:11 jltsiren

To reduce the bubble complexity, we can either use a longer segment length in mapping (wfmash -s) or use the pruning tools in vg or odgi to remove complex regions. For humans these are usually centromeric, and easy to find because they have very high depth in the graph.

On Mon, Nov 1, 2021, 21:35 Jouni Siren @.***> wrote:

It looks like the issue may be with the graph itself. You may want to visualize the graph to see if the overall structure looks reasonable or if there are large tangled subgraphs that would make distance index construction expensive. The PGGB team has spent a lot of effort trying to build useful human pangenome graphs, and they may be able to help you with parameter choices if you try rebuilding the graph.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/3455#issuecomment-956570103, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEPEN7EQP3DTWKLGF7TUJ32ZRANCNFSM5GKQCYGA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Nov 02 '21 07:11 ekg

Thanks @ekg

I used 1Mbp as the segment length, so it's already quite a long segment length.

Could I use vg prune like outlined in this old issue: https://github.com/vgteam/vg/issues/1879

Would it be advisable to run it on the entire graph (containing all 7 chromosomes?). I would prefer to prune over running the entire PGGB workflow again. Which vg prune parameters should be used? Just go with the default?

Nov 02 '21 07:11 brettChapman

I've increased the swap space on my machine now, so there should be 2.2TB of available virtual memory. I'm trying to index again and see how it goes.

I'm also pruning the graph using this strategy:

singularity exec --bind ${PWD}:${PWD} ${VG_IMAGE} vg ids -j -m mapping $(for i in $(seq 1 7); do echo barley_pangenome_graph_${i}H.pg; done)
cp mapping mapping.backup

for i in $(seq 1 7); do
   singularity exec --bind ${PWD}:${PWD} ${VG_IMAGE} vg prune -u -a -m mapping barley_pangenome_graph_${i}H.pg > barley_pangenome_graph_${i}H.pruned.vg
done

I'll compare how the performance is between pruned graph vs non-pruned. Generally from experience, has anyone found pruning to be an essential step? And what is the cost in terms of reduce the complexity of the graph for read alignment and variant calling? My down-stream analysis will be genome read alignment for variant calling and later generation of a splice graph for RNA-seq read alignment.

Nov 08 '21 07:11 brettChapman

In my experience, pruning is completely essential if you want to index with GCSA2 (used by vg map and vg mpmap), or else the exponential worst case can be a real killer. In those indexing pipelines, we only use the pruned graph for the indexing step, after which we discard it. In particular, we align to the full graph during read mapping. The only cost is that the GCSA2 index cannot query matches that cross some edges in complex regions, and I have never found a case where we generated an incorrect mapping because of this limitation.

Nov 08 '21 21:11 jeizenga

Thanks @jeizenga for your input.

I tried indexing the pruned graph but came up with an error:

I first create an XG of the whole graph from each pruned packed graph:

vg index -x barley_pangenome_graph.pruned.xg -b ${tmp_dir} $(for i in $(seq 1 7); do echo barley_pangenome_graph_${i}H.pruned.pg; done)

Then generate a GFA:

vg convert -t 2 barley_pangenome_graph.pruned.xg -f > barley_pangenome_graph.pruned.gfa

Then I attempt to create a GBZ but it gives me an error about paths or walks not in my pruned GFA:

vg gbwt -d ${tmp_dir} -g barley_pangenome_graph.pruned.gbz --gbz-format -G barley_pangenome_graph.pruned.gfa
check_gfa_file(): No paths or walks in the GFA file
error: [vg gbwt] GBWT construction from GFA failed

Nov 10 '21 05:11 brettChapman

I suspect you probably ran pruning without --restore-paths, which can lead it to remove edges that embedded paths take. When it does this, there's no unambiguous way for the graph to express the embedded path, so the path is dropped. The GBZ is really an index of paths, so without some source of paths it can't be constructed. However, one alternative is vg gbwt --path-cover, which tries to synthesize paths using local heuristics.

Is your end goal to use this GBZ for vg giraffe? If so, this might all be beside the point. Pruning is really intended for GCSA2 indexing, which vg giraffe doesn't use.

Nov 10 '21 21:11 jeizenga

Thanks @jeizenga

Yes, my intent is to use vg giraffe for genomic read alignment and vg mpmap for RNA-seq read alignment.

The main reason I'm now attempting the pruning approach is because the indexing step used a huge amount of RAM. Should I be pruning or not to get around the indexing problem? I've increased my swap space on my machine, so I may be able to index now without pruning, but it may take a lot longer to index.

Nov 11 '21 01:11 brettChapman

vg prune is only intended for pruning the graph for GCSA construction. It deliberately drops all paths in the graph, because paths are not needed for kmer generation and maintaining them can be slow and complicated. I'm not sure if odgi prune can split the paths instead of dropping them.

Nov 11 '21 06:11 jltsiren

@ekg @jltsiren @jeizenga I've tried odgi prune using these parameters:

odgi prune -i barley_pangenome_graph_1H.og -o barley_pangenome_graph_1H.pruned.og -c 3 -C 345 -T

Then converted to GFA, then PG:

odgi view -i barley_pangenome_graph_${i}H.pruned.og -g > barley_pangenome_graph_${i}H.pruned.gfa
vg convert -g barley_pangenome_graph_${i}H.pruned.gfa -p > barley_pangenome_graph_${i}H.pruned.pg

Then tried to index all graphs into one XG file:

vg index -x barley_pangenome_graph.pruned.xg -b ${tmp_dir} $(for i in $(seq 1 7); do echo barley_pangenome_graph_${i}H.pruned.pg; done)

I then tried to produce snarls and then index, but it fails at the index stage saying it cant read in the distance index.

singularity exec --bind ${PWD}:${PWD} ${VG_IMAGE} vg convert -t 2 ${PANGENOME_XG} -f > barley_pangenome_graph.pruned.gfa

singularity exec --bind ${PWD}:${PWD} ${VG_IMAGE} vg gbwt -d ${tmp_dir} -g ${PANGENOME_GBZ} --gbz-format -G ${PANGENOME_GFA}

singularity exec --bind ${PWD}:${PWD} ${VG_IMAGE} vg snarls -t 2 -T ${PANGENOME_GBZ} > ${SNARLS}
singularity exec --bind ${PWD}:${PWD} ${VG_IMAGE} vg index -t 2 -b ${tmp_dir} -s ${SNARLS} -j ${DIST} 
${PANGENOME_GBZ}
singularity exec --bind ${PWD}:${PWD} ${VG_IMAGE} vg minimizer -t 2 -o ${MIN} -d ${DIST} ${PANGENOME_GBZ}

Regardless of whether I've pruned the graph or not, I've gotten this same error before. It appears it was killed by OOM after requesting 9TB of virtual RAM.

I came across vg simplify but this is from an old wiki: https://github.com/vgteam/vg/wiki/Indexing-Huge-Datasets, so I'm not sure if it's something which should be used to radically reduce the complexity of the graph.

How am I supposed to prepare my graph to use with giraffe, apart from increasing my swap space to 10TB, which is an insane amount of swap space to use. Should I be even more stringent with odgi prune parameters? Perhaps reducing the -C parameter value? Is there a more efficient distance indexing approach? I remember there was mention of another branch of vg which was using a different approach to distance indexing? It's mentioned here by @xchang1 https://github.com/vgteam/vg/issues/3303

Dec 06 '21 08:12 brettChapman

You can try building the distance index with this branch. Master branch won't recognize this distance index so you'll have to run giraffe from this branch too https://github.com/vgteam/vg/tree/for-brett

The command for building the distance index is the same, except that it no longer takes the snarl file as input. Instead, -s is the size limit for snarls. I'm guessing your problem is that the snarls are too big, and -s will tell the distance index not to build the whole index for big snarls. The default value is 500, but I haven't tried this on snarls that big so it might need some tuning.

If you also run out of memory with that branch, you can try the one I mentioned earlier, https://github.com/vgteam/vg/tree/distance-big-snarls It would be better if you could get it working on the for-brett branch but if it only works with the distance-big-snarls branch then I can get that version working with giraffe too.

Do you know how big your snarls are? You can find it from vg stats -R, the net graph size is the value I'm interested in

Dec 06 '21 18:12 xchang1

Thanks @xchang1

I've been trying to install the 'for-brett' branch, but am struggling to get it installed. I've usually installed vg from docker.

I first clone the branch, cd into vg/ and then I install using the Dockerfile from the git repository.

I've been getting this error:

-type -std=c++14 -ggdb -g  -march=nehalem  -fopenmp -msse4.2 -MMD -MP -c -o obj/subcommand/join_main.o src/subcommand/join_main.cpp 
. ./source_me.sh && /usr/bin/g++ -I/vg/include -isystem /vg/include -I. -I/vg/src -I/vg/src/unittest -I/vg/src/subcommand -I/vg/include/dynamic -pthread -isystem /usr/include/cairo -isystem /usr/include/glib-2.0 -isystem /usr/lib/x86_64-linux-gnu/glib-2.0/include -isystem /usr/include/pixman-1 -isystem /usr/include/uuid -isystem /usr/include/freetype2 -isystem /usr/include/libpng16  -O3 -Werror=return-type -std=c++14 -ggdb -g  -march=nehalem  -fopenmp -msse4.2 -MMD -MP -c -o obj/subcommand/translate_main.o src/subcommand/translate_main.cpp 
. ./source_me.sh && /usr/bin/g++ -I/vg/include -isystem /vg/include -I. -I/vg/src -I/vg/src/unittest -I/vg/src/subcommand -I/vg/include/dynamic -pthread -isystem /usr/include/cairo -isystem /usr/include/glib-2.0 -isystem /usr/lib/x86_64-linux-gnu/glib-2.0/include -isystem /usr/include/pixman-1 -isystem /usr/include/uuid -isystem /usr/include/freetype2 -isystem /usr/include/libpng16  -O3 -Werror=return-type -std=c++14 -ggdb -g  -march=nehalem  -fopenmp -msse4.2 -MMD -MP -c -o obj/subcommand/giraffe_main.o src/subcommand/giraffe_main.cpp 
src/subcommand/giraffe_main.cpp:37:10: fatal error: valgrind/callgrind.h: No such file or directory
   37 | #include <valgrind/callgrind.h>
      |          ^~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make: *** [Makefile:769: obj/subcommand/giraffe_main.o] Error 1
make: *** Waiting for unfinished jobs....
The command '/bin/sh -c . ./source_me.sh && CXXFLAGS="$(if [ -z "${TARGETARCH}" ] || [ "${TARGETARCH}" = "amd64" ] ; then echo " -march=nehalem "; fi)" make -j $((THREADS < $(nproc) ? THREADS : $(nproc))) objs' returned a non-zero code: 2

I'm running vg stats -R on my pruned XG graph now. I'll get back to you when I know the snarl size. Thanks.

Dec 07 '21 03:12 brettChapman

vg stats -R has finished running on the pruned XG graph. The largest snarl is 215479 in size.

Dec 07 '21 06:12 brettChapman

Oof that's a big snarl. That is definitely causing you problems

We should have built a docker container for the branch automatically but it failed some tests and it didn't build properly. I'll rerun the build and then you should be able to find the container here

https://quay.io/repository/vgteam/vg?tab=tags

Dec 07 '21 18:12 xchang1

Thanks @xchang1

I checked the docker repository but couldn't see the build there. Would it have the 'for-brett' tag?

I also ran vg stats -R on the original non-pruned graph, and the largest snarl is 611408.

Dec 09 '21 01:12 brettChapman

Can you try building it again now? I fixed the compilation problem but I'm having trouble getting the tests to pass and I can't get the static binary working either

Dec 09 '21 02:12 xchang1

Sure, I'll try and build from the Dockerfile again. I'll let you know how it goes.

Dec 09 '21 02:12 brettChapman

The build failed:

/usr/bin/gcc -std=gnu11 -Wall -Wextra -Wsign-compare -Wundef -Wno-format-zero-length -Wpointer-arith -Wno-missing-braces -Wno-missing-field-initializers -pipe -g3 -fvisibility=hidden -Wimplicit-fallthrough -O3 -funroll-loops -I /vg/include -I /vg/include -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/witness.o src/witness.c
/usr/bin/g++ -std=c++14 -Wall -Wextra -g3 -fvisibility=hidden -Wimplicit-fallthrough -O3 -I /vg/include -I/vg/include/dynamic -O3 -Werror=return-type -std=c++14 -ggdb -g -march=nehalem -fopenmp -msse4.2 -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/jemalloc_cpp.o src/jemalloc_cpp.cpp
ar crus lib/libjemalloc_pic.a src/jemalloc.pic.o src/arena.pic.o src/background_thread.pic.o src/base.pic.o src/bin.pic.o src/bin_info.pic.o src/bitmap.pic.o src/buf_writer.pic.o src/cache_bin.pic.o src/ckh.pic.o src/counter.pic.o src/ctl.pic.o src/decay.pic.o src/div.pic.o src/ecache.pic.o src/edata.pic.o src/edata_cache.pic.o src/ehooks.pic.o src/emap.pic.o src/eset.pic.o src/exp_grow.pic.o src/extent.pic.o src/extent_dss.pic.o src/extent_mmap.pic.o src/fxp.pic.o src/hook.pic.o src/hpa.pic.o src/hpa_central.pic.o src/hpdata.pic.o src/inspect.pic.o src/large.pic.o src/log.pic.o src/malloc_io.pic.o src/mutex.pic.o src/mutex_pool.pic.o src/nstime.pic.o src/pa.pic.o src/pa_extra.pic.o src/pac.pic.o src/pages.pic.o src/peak_event.pic.o src/prof.pic.o src/prof_data.pic.o src/prof_log.pic.o src/prof_recent.pic.o src/prof_stats.pic.o src/prof_sys.pic.o src/psset.pic.o src/rtree.pic.o src/safety_check.pic.o src/sc.pic.o src/sec.pic.o src/stats.pic.o src/sz.pic.o src/tcache.pic.o src/test_hooks.pic.o src/thread_event.pic.o src/ticker.pic.o src/tsd.pic.o src/witness.pic.o src/jemalloc_cpp.pic.o
ar: `u' modifier ignored since `D' is the default (see `U')
/usr/bin/gcc -shared -Wl,-soname,libjemalloc.so.2  -o lib/libjemalloc.so.2 src/jemalloc.pic.o src/arena.pic.o src/background_thread.pic.o src/base.pic.o src/bin.pic.o src/bin_info.pic.o src/bitmap.pic.o src/buf_writer.pic.o src/cache_bin.pic.o src/ckh.pic.o src/counter.pic.o src/ctl.pic.o src/decay.pic.o src/div.pic.o src/ecache.pic.o src/edata.pic.o src/edata_cache.pic.o src/ehooks.pic.o src/emap.pic.o src/eset.pic.o src/exp_grow.pic.o src/extent.pic.o src/extent_dss.pic.o src/extent_mmap.pic.o src/fxp.pic.o src/hook.pic.o src/hpa.pic.o src/hpa_central.pic.o src/hpdata.pic.o src/inspect.pic.o src/large.pic.o src/log.pic.o src/malloc_io.pic.o src/mutex.pic.o src/mutex_pool.pic.o src/nstime.pic.o src/pa.pic.o src/pa_extra.pic.o src/pac.pic.o src/pages.pic.o src/peak_event.pic.o src/prof.pic.o src/prof_data.pic.o src/prof_log.pic.o src/prof_recent.pic.o src/prof_stats.pic.o src/prof_sys.pic.o src/psset.pic.o src/rtree.pic.o src/safety_check.pic.o src/sc.pic.o src/sec.pic.o src/stats.pic.o src/sz.pic.o src/tcache.pic.o src/test_hooks.pic.o src/thread_event.pic.o src/ticker.pic.o src/tsd.pic.o src/witness.pic.o src/jemalloc_cpp.pic.o  -lm -lstdc++ -pthread 
ln -sf libjemalloc.so.2 lib/libjemalloc.so
ar crus lib/libjemalloc.a src/jemalloc.o src/arena.o src/background_thread.o src/base.o src/bin.o src/bin_info.o src/bitmap.o src/buf_writer.o src/cache_bin.o src/ckh.o src/counter.o src/ctl.o src/decay.o src/div.o src/ecache.o src/edata.o src/edata_cache.o src/ehooks.o src/emap.o src/eset.o src/exp_grow.o src/extent.o src/extent_dss.o src/extent_mmap.o src/fxp.o src/hook.o src/hpa.o src/hpa_central.o src/hpdata.o src/inspect.o src/large.o src/log.o src/malloc_io.o src/mutex.o src/mutex_pool.o src/nstime.o src/pa.o src/pa_extra.o src/pac.o src/pages.o src/peak_event.o src/prof.o src/prof_data.o src/prof_log.o src/prof_recent.o src/prof_stats.o src/prof_sys.o src/psset.o src/rtree.o src/safety_check.o src/sc.o src/sec.o src/stats.o src/sz.o src/tcache.o src/test_hooks.o src/thread_event.o src/ticker.o src/tsd.o src/witness.o src/jemalloc_cpp.o
ar: `u' modifier ignored since `D' is the default (see `U')
make[1]: Leaving directory '/vg/deps/jemalloc'
Removing intermediate container 28c6291c21e2
 ---> c9e3ab6aa802
Step 21/42 : COPY include /vg/include
COPY failed: file not found in build context or excluded by .dockerignore: stat include: file does not exist

Dec 09 '21 02:12 brettChapman

Hmm I'm not very familiar with Docker, I'll ask someone who is. In the meantime, did you clone the branch with --recursive? It looks like the /include directory isn't in the GitHub repo so maybe you don't need it to compile vg initially and you can just remove that line from the Dockerfile?

Dec 09 '21 02:12 xchang1

Yeah, I clone the branch with --recursive

I do the following:

git clone --recursive --branch for-brett https://github.com/vgteam/vg.git
cd vg/
docker build -t local/vg .

Dec 09 '21 02:12 brettChapman

I'll try removing /include and run again

Dec 09 '21 02:12 brettChapman

It got further along now, but failed a series of tests:

graph: valid
graph: valid
graph: valid
graph: valid
t/53_clip.t ........... 
1..13
ok 1 - clipped graph is valid
ok 2 - every step in clipped graph belongs to reference path
ok 3 - clipped graph has same length as ref path
ok 4 - clipped graph is valid
ok 5 - Just one node filtered
ok 6 - clipped graph is valid
ok 7 - Just one edge filtered
ok 8 - clipped graph is valid
ok 9 - Just one node filtered
ok 10 - clipped graph is valid
ok 11 - clipping bad region changes nothing
ok 12 - clipped graph is valid
ok 13 - Just one node filtered
ok

Test Summary Report
-------------------
t/06_vg_index.t     (Wstat: 0 Tests: 55 Failed: 6)
  Failed tests:  50-55
t/33_vg_mpmap.t     (Wstat: 256 Tests: 19 Failed: 5)
  Failed tests:  15-19
  Non-zero exit status: 1
t/40_vg_gamcompare.t (Wstat: 0 Tests: 7 Failed: 1)
  Failed test:  4
t/46_vg_minimizer.t (Wstat: 0 Tests: 16 Failed: 2)
  Failed tests:  12-13
t/50_vg_giraffe.t   (Wstat: 0 Tests: 27 Failed: 8)
  Failed tests:  1-7, 10
t/52_vg_autoindex.t (Wstat: 0 Tests: 24 Failed: 8)
  Failed tests:  10-12, 14-18
Files=52, Tests=932, 286 wallclock secs ( 0.36 usr  0.06 sys + 521.47 cusr 110.05 csys = 631.94 CPU)
Result: FAIL
make: *** [Makefile:413: test] Error 1
The command '/bin/sh -c /bin/bash -e -c "export OMP_NUM_THREADS=$((THREADS < $(nproc) ? THREADS : $(nproc))); make test"' returned a non-zero code: 2

Dec 09 '21 03:12 brettChapman

Yeah, I haven't fully integrated some of my changes into vg yet so it'll fail some tests. Giraffe should work ok, things just aren't in the format the tests expect. Can you run it without the tests?

Dec 09 '21 03:12 xchang1

Because the tests fail the entire installation into Docker fails too. Is there a way to skip tests?

Dec 09 '21 06:12 brettChapman

Theres one line with make test in the Dockerfile. I'm skipping it now and rerunning.

Dec 09 '21 06:12 brettChapman

I managed to get it working now by commenting out the /include and the make test line in the Dockerfile.

I'm running vg index now with setting -s to the maximum snarl size of 215479

Dec 09 '21 07:12 brettChapman

vg vg copied to clipboard

vg autoindex needs a --buffer-size parameter for vg gbwt

vg
vg copied to clipboard