Ragout Alternative alignment tool

Good day, I'm currently working on the assembly of a large mammalian genome. I would like to perform the reference-assisted scaffolding using ragout. However, I cannot run progressive cactus on the genomes involved due to computational constraints, and the assembly are simply too large to be efficiently processed with sibelia.

My question is: is there a way to create alignments in a way that is suitable for Ragout2?

For example: pairwise lastz alignments -> combined multi-species MAF file -> maf2hal -> ragout2

Thank you in advance, Andrea

Mar 08 '19 12:03 RenzoTale88

Hi Andrea,

It is possible to use .maf as input (I was planning to retire this functionality, but it is still in the code) - but the problem is that Ragout relies on properties of the Cactus alignments. In particular, alignment blocks should be non-intersecting - which is not true for most of the other aligners (since it is in fact a very hard problem to generate such alignments).

What are your constraints? Cactus should take ~120 CPU days per mammalian genome, which translates into a few actual days on a single cluster node.

There is a recent Sibelia update (https://github.com/medvedevgroup/SibeliaZ) that in fact should handle large genomes and is capable of producing the right maf alignment. I have not tested it yet - but you might try to use it to generate alignment and run Ragout with it.

Best, Mikhail

Mar 12 '19 20:03 mikolmogorov

Hi Mikhail, Thank you for your reply. My problem with cactus is not related to computational power/time, but to incompatibilities with toil and the cluster, that prevent it from running properly. I think that using sibeliaZ seems a good solution. I guess that you would proceed as follow: sibeliaZ > maf > hal > ragout

Am I right? Thank you in advance, Andrea

Mar 12 '19 21:03 RenzoTale88

You can use maf directly by adding an extra recipe parameter .maf = path_to_maf, and using -s maf switch.

Mar 13 '19 02:03 mikolmogorov

I've just tried to run Ragout on the alignments produced using SibeliaZ (maf format) using the genomes provided in example/E.Coli folder. I've changed the header of the respective fasta so that it shows the organism in the format "dh1.gi", "ms1655.seq1", etc. The maf file is properly parsed, even if raising errors of overlapping blocks, and finally crashes while refining the assembly graph. I though that the problem was due to the presence of self-alignments within the same assembly. But even after filtering those out from the MAF file, the result doesn't change, still crashing while detecting chimeric adjacencies. Below, you can find the log produced, the error, the configuration file, the two fai with the names used and the top 100 lines of the sibeliaZ maf file. I hope these information can be of use to understand the problem. Thank you in advance for your help, Andrea

[16:09:31] DEBUG: >>With block size: 5000 [16:09:31] DEBUG: Checking 2 target adjacencies [16:09:31] DEBUG: Found 1 target-specific adjacencies, 1 broken as chimeric [16:09:31] DEBUG: >>With block size: 500 [16:09:31] DEBUG: Checking 16 target adjacencies [16:09:31] DEBUG: Found 3 target-specific adjacencies, 1 broken as chimeric [16:09:31] DEBUG: >>With block size: 100 [16:09:31] DEBUG: Checking 105 target adjacencies [16:09:31] DEBUG: Found 11 target-specific adjacencies, 3 broken as chimeric [16:09:31] DEBUG: >>With block size: 100 [16:09:31] DEBUG: Checking 105 target adjacencies [16:09:31] DEBUG: Found 11 target-specific adjacencies, 3 broken as chimeric Traceback (most recent call last): File "/exports/applications/apps/community/roslin/ragout/2.1.1/bin/ragout", line 35, in sys.exit(main()) File "/exports/eddie3_apps_local/apps/community/roslin/ragout/2.1.1/ragout/main.py", line 290, in main _run_ragout(args) File "/exports/eddie3_apps_local/apps/community/roslin/ragout/2.1.1/ragout/main.py", line 198, in _run_ragout chim_detect = ChimeraDetector(raw_bp_graphs, run_stages, target_sequences) File "/exports/eddie3_apps_local/apps/community/roslin/ragout/2.1.1/ragout/breakpoint_graph/chimera_detector.py", line 26, in init self._make_hierarchical_breaks() File "/exports/eddie3_apps_local/apps/community/roslin/ragout/2.1.1/ragout/breakpoint_graph/chimera_detector.py", line 62, in _make_hierarchical_breaks break_pos = self._optimal_break(seq_name, *adjusted_break) File "/exports/eddie3_apps_local/apps/community/roslin/ragout/2.1.1/ragout/breakpoint_graph/chimera_detector.py", line 70, in _optimal_break seq = self.target_seqs[seq_name] KeyError: 'seq13'

ragout.log testRagout.rcp.txt dh1.fasta.fai.txt mg1655.fasta.fai.txt top100.maf.txt

Mar 13 '19 16:03 RenzoTale88

Looks like there are some caveats with using SibeliaZ. I will be testing it in the near future.

Mar 17 '19 01:03 mikolmogorov

Hi, The error found was due to the input fasta files provided. Basically, the input fasta that I provided included the species name in each sequence header (e.g. >spp1.chr1), but ragout needed them as simple sequence name (e.g. >seq1). After using this fasta file, the program worked without issues.

The only thing that concerns me is this warning:

Too few overlaps (18) between contigs were detected -- refine procedure will be useless

Is there a way to improve this? Thank you in advance,

Andrea

Mar 17 '19 13:03 RenzoTale88

Great, I'm glad that you got it working! Did it complain on the synteny block coverage?

Too few overlaps (18) between contigs were detected -- refine procedure will be useless

This step is optional - it only works good with certain assemblers (like SPAdes) that output contigs with overlapping ends. We usually don't use --refine option for large genomes.

Mar 18 '19 18:03 mikolmogorov

hi, I have similar issues. I was using sibeliaz as well for the alignment of large genomes and now I am trying to run ragout with the maf file. I was following the instructions above, adding the recipe parameter .maf and the -s maf switch, and I also changed the headers in a simple format like that

scaffold4594 but ragout always breaks after some time with the following error ERROR: permutation ids do not follow naming convention: 'genome.chromosome'

then I tried to change the headers in a format like

WB42.scaffold1 WB42.scaffold6 WB42.scaffold9 WB42.scaffold13

but then that error appeared:KeyError: 'scaffold830'

Do I name the headers still in a wrong way or does anybody have any ideas how to solve that problem? Thank you in advance

if helpful or necessary the code I used for ragout: ragout -o ragout_out -s maf -t 10 recipe_file_for_ragout.txt

Aug 06 '19 13:08 Mar-y

Hi,

It's a bit tricky - the MAF alignment should have the genome prefixes, but the FASTA files - should not. I think in your case it will be enough to remove the prefixes from fasta, also make sure that genome names are matching with ones provided in the recipe.

Let me know if this helps. Hopefully in the near future I will properly integrate SibeliaZ into the pipeline.

Aug 07 '19 23:08 mikolmogorov

Hi Mikhail,

thank you so much for your reply! Ragout finished successfully after removing the prefixes from the fasta files.

Best regards

Aug 08 '19 09:08 Mar-y

Dear Mikhail!

I'm dealing with number of green algae genome. Ragout w/ Sibelia (single-threaded) is running for days. I tried to use SibeliaZ and got MAF alignments. My refences are from NCBI (GCF* GCA* FASTA files), target assembly made with SPAdes.

What I need to do with input files (FASTA/alignment) to use them with Ragout?

Dec 17 '19 02:12 ipetrushin

How to understand following error (debug don't give a hint)? ERROR: No sequences read for genome c2. Check recipe for correctness. File exists, recipe looks like:

.references = c2
.target = s68
.maf = sz_395/alignment.maf

s68.fasta = algs68.fasta
c2.fasta = GCA_001021125.1_ASM102112v1_genomic.fna

Dec 17 '19 03:12 ipetrushin

Ok, it seems that you have found the .maf option :)

The sequence headers in MAF should be formatted as genome_name.contig_name, and I believe SibeliaZ just outputs contig_name. So you'd need to add genome names to MAF (according to the naming in the recipe).

Sorry for the inconvenience! SibeliaZ support is in the plans, but I didn't get a chance to work on that yet.

Dec 19 '19 03:12 mikolmogorov

Ragout Ragout copied to clipboard

Alternative alignment tool

Ragout
Ragout copied to clipboard