atlas icon indicating copy to clipboard operation
atlas copied to clipboard

Is it possible to use Atlas in nanopore sequencing data?

Open maithemagalhaes opened this issue 5 years ago • 14 comments

maithemagalhaes avatar Jan 10 '20 14:01 maithemagalhaes

Thank you for the question. You can do a hybrid assembly (ilumina + nanopore) using spades.

I don't have experience of nanopore assembly alone. Maybe we can adapt atlas to work also for long reads alone.

E.g. If you have MAG predicted you can use atlas to do the taxonomic and functional annotation.

SilasK avatar Jan 10 '20 15:01 SilasK

is it possible to run ATLAS with Illumina mate-pair reads?

botellaflotante avatar Sep 21 '20 20:09 botellaflotante

Do you have only mate-pair libraries or in combination with normal-paired-end libraries? If you have only mate-pair libraries, as I understand, they can be mapped as normal paired-end libraries. It seems spades supports mate-pair libraries. So with a small adaption, Atlas could support mate-pair libraries. However, I don't know if you have to do some special quality control before.

SilasK avatar Sep 22 '20 03:09 SilasK

I actually have a single end set of reads and also a small mate pair set of reads for the same sample, but I don't know if I can change the config file to take these... I guess ATLAS is normally used for paired.end, right?

botellaflotante avatar Sep 23 '20 17:09 botellaflotante

Ok then you could do it the following:

Start Atlas with the single end read library atlas init .

Set spades_preset: normal in the config file to use normal spades.

Metaspades doesn't allow mate pairs nor single end reads, but I've heard that normal spades is almost as good as metaspades for metagenome assembly.

you can pass extra arguments to spades via the spades_extra keyword in the config file. See the documentation of spades for how to do this: https://github.com/ablab/spades#input-data

e.g with matepairs this would be something like:

spades_extra: " --mp1-1 path/to/matepair_R1.fastq --mp1-2 path/to/matepair_R2.fastq"

SilasK avatar Sep 24 '20 08:09 SilasK

It worked fine mostly, although there is some issue in the maxbin step... (so the assembly and genecatalog worked fine but I have no bins and no MAGs). Not sure where to spot the exact problem though. I think this may be because it is a sample of plasmid enriched DNA from different bacteria and cannot reconstruct any genome..

botellaflotante avatar Sep 24 '20 22:09 botellaflotante

Many users have encountered the problem that maxbin doesn't produce bins. Maybe the assembly is to complicated. Did metabat produce bins? Then you might just set final_binner: metabat

In your particular case, it might be due to the fact that only the SE reads are used for mapping and that you have less coverage for binning.

SilasK avatar Sep 25 '20 09:09 SilasK

yes, it worked. Thanks!

botellaflotante avatar Sep 25 '20 12:09 botellaflotante

Hi @SilasK, I was just wondering if you could clarify how you specify both long and short-read inputs during atlas init there doesn't seem to be an option to differentiate between long and short-read input (atlas v2.8.2 conda-forge) Cheers, Rhys

rhysnewell avatar Feb 15 '22 00:02 rhysnewell

Atlas can handle long reads + short reads. (see the docs)

But I'm thinking about developing something for long reads only, is that what you want?

SilasK avatar Feb 16 '22 20:02 SilasK

Cool, thanks nah I was looking for hybrid assembly options. Not long read independently

I guess I was hoping for a command-line option to specify my input reads. I've got a large amount of metagenomes to assemble that have both short and long reads, setting up config files for each of them is going to get tedious. That's okay, thanks for your response

rhysnewell avatar Feb 16 '22 22:02 rhysnewell

Do your long read files contain the sample name in the filename?

SilasK avatar Feb 17 '22 20:02 SilasK

No, they are in a folder that contains the sample name though. Like so: Interleaved illumina:

(base) n10853499:muffin$ ls ../../short_read/2017.12.04_18.45.54_sample_0/reads/
anonymous_reads.fq.gz  reads_mapping.tsv.gz

PacBio:

(base) n10853499:muffin$ ls ../../pacbio/2018.01.23_11.53.11_sample_0/reads/
anonymous_reads.fq.gz  reads_mapping.tsv.gz

rhysnewell avatar Feb 17 '22 23:02 rhysnewell

Hey @rhysnewell I made a function to add the long reads to the atlas sample table. This function should allow you to add the long reads to the sample table.

You might need to install

mamba install -y pathlib2

From within the atlas folder run import_long_reads.py ../../pacbio

I run a test and it worked for me. However your sample names become something very long with dots in it. I suggest you to replace them with something simpler.

Try it out, if it works, I add it to the init function.

SilasK avatar Feb 21 '22 14:02 SilasK

hey there, @SilasK :)

thanks for your work here!

I came across someone looking for a workflow suitable for solely nanopore data and found my way to this issue. It looks like maybe this has stopped here for now due to a lack of need/priority so far, but just wanna check with 2 quick questions:

  1. Has taking solely long reads as input been integrated into the main program yet as discussed above?
  2. If yes, when solely nanopore or pacbio reads are provided, is an assembler (and i guess read-mapper too) used that is specifically designed for dealing with them and their potentially higher error rates (e.g. like flye has settings for for assembly)?

thanks!

AstrobioMike avatar Dec 12 '22 19:12 AstrobioMike

@AstrobioMike It is not implemented in the main workflow, but I would be happy to help to make it happen.

As alternative for now I suggest MUFFIN.

SilasK avatar Dec 13 '22 10:12 SilasK

Oh excellent, thanks for the note about muffin, I will pass that along 🙂

No specific pressure from me to integrate a long-read specific path here, especially since you pointed out muffin seems to already have a way

Actually I just looked a bit and it seems muffin might require short reads also. I have a question in to them making sure, but if that is the case, maybe there is still a niche to fill for a general workflow starting with long reads only and it might be worth it to add some things in here for that capability if you make the time/find the motivation

Thanks again!

AstrobioMike avatar Dec 13 '22 18:12 AstrobioMike

Just to add a comment to this thread, I've been working on a snakemake workflow for microbial genome assembly/annotation using Nanopore data -- it can perform either a long-read only or a hybrid long/short read workflow. The basic framework could probably be adapted and added into ATLAS for long-read only metagenome assembly, if there were interest. See https://github.com/jmtsuji/rotary (still in development!)

Sorry for generally being slow to reply these days! Thanks again for all your work on ATLAS!

jmtsuji avatar Dec 14 '22 01:12 jmtsuji

There was no activity since some time. I hope your issue is solved in the mean time. This issue will automatically close soon if no further activity occurs.

Thank you for your contributions.

github-actions[bot] avatar Apr 06 '23 13:04 github-actions[bot]