bactopia icon indicating copy to clipboard operation
bactopia copied to clipboard

[question] Tutorial for having Nanopore reads only in Bactopia documentation

Open llk578496 opened this issue 3 years ago • 19 comments

Hi @rpetit3 ,

Thank you so much for upgrading Bactopia to v2 which included supports for only Nanopore reads! We have been waiting for a very long time!

May I know is there a tutorial part in the Bactopia documentation which will include the analysis for only Nanopore reads? (e.g. bactopia prepare for Nanopore only reads)

Thank you very much!

llk578496 avatar Dec 09 '21 16:12 llk578496

Totally! I'll put this on my list for updating the docs

rpetit3 avatar Dec 09 '21 16:12 rpetit3

Thank you so much!

llk578496 avatar Dec 09 '21 16:12 llk578496

Hi Robert, Is there any update regarding ONT reads? I'm processing some metagenomic samples that were sequenced through ONT. Each time I run bactopia the only output file generated is sample-genome-size.txt and the run fails. Then I run bactopia on a single sample, same output (barcode01-genome-size.txt) and the run fails again. The genome size is 14594200 lower than the default max. Here is commands I used bactopia --sample barcode01 --SE barcode01.fastq.gz --datasets my_directory/datasets/ --outdir ${today}bactopia1Samp_output --max_cpus $SLURM_CPUS_PER_TASK Note on short read samples that command runs properly. Thanks, TJ

tbazilegith avatar Feb 22 '22 20:02 tbazilegith

Next version I will be improving the documentation which will include better demos of nanopore data

rpetit3 avatar Feb 22 '22 20:02 rpetit3

Hi Robert, I have been trying the run bactopia on hydrid samples from illumina and minION All the samples are in the directory fastqs_hybrid ( 2 SE samples , and the 2 corresponding paired samples, 6 files in total) I used the following cammands: bactopia prepare fastqs_hybrid/ > fastqs_hybrid.txt

bactopia --fastqs fastqs_hybrid.txt --hybrid --datasets my_directory/datasets/ --outdir ${today}bactopia_outputHybrid --max_cpus ${SLURM_CPUS_PER_TASK} --cleanup_workdir

However I got this error message: ERROR: "CFI21000051" has paired and single-end FASTQs, please check. ERROR: "CFI21000216" has paired and single-end FASTQs, please check.

  • Each SE sample has the same root name as the corresponding paired, like CFI21000051.fastq.gz (CFI21000051_1.fastq.gz, CFI21000051_2.fastq.gz). Could that be the problem? I had the names different before, but SE only output genome_size.text files.
  • Are there any other options I should add to the command? Thanks, TJ

tbazilegith avatar Mar 23 '22 19:03 tbazilegith

Tack on a --long_reads to the bactopia prepare command

bactopia prepare fastqs_hybrid/  --long_reads > fastqs_hybrid.txt

Let me know how that works for you

rpetit3 avatar Mar 23 '22 19:03 rpetit3

Speaking of this, do you think you would ever have interest in running hybrid assemblies when the ONT reads are assembled then polished with Illumina?

rpetit3 avatar Mar 23 '22 19:03 rpetit3

Your point really makes sense. So, does bactopia allow that - polishing after the assembly? Indeed, my goal is the compare the results between hybrid and homogeneous assemblies. That's why I am investigating this. Thanks

tbazilegith avatar Mar 23 '22 19:03 tbazilegith

At the moment no, Bactopia only supports hybrid assembly via Unicycler (assemble with spades then polish with long reads).

However, I recent added support for the reverse (assemble with long reads, then polish with short reads) in Dragonflye (https://github.com/rpetit3/dragonflye/releases/tag/v1.0.9). I was planning to float this support up to Bactopia as well.

Was just curious on your end if its something you'd like to see added.

rpetit3 avatar Mar 23 '22 19:03 rpetit3

I welcome, the recommendation. Part of work will be assembling a complete or near-complete genomes for the organisms I am working on. So, when we do hydrid, does polishing become unnecessary? Or are they two options we can choose from? I'm building experience...

Thanks!

tbazilegith avatar Mar 23 '22 20:03 tbazilegith

In hybrid assemblies, you are either polishing with short reads or long reads. Polishing is also happening in standard assemblies as well. If you use Shovill or Dragonflye (Bactopia uses both), all assemblies are polished by default.

So yeah you would have two options for hybrid assembly:

  1. assemble with short reads, polish with long reads
  2. assemble with long reads. polish with short reads

I think you might have a fun little experiment on your hands! You could compare the outcomes of the two approaches, to see if one leads to better outcomes, or they are similar.

Currently Bactopia's --hybrid is short read first, polish with long reads (This is done using Unicycler). But if you are interested, I'll get a dev version for you that allows you to do the opposite (long read first, polish with short reads).

rpetit3 avatar Mar 23 '22 20:03 rpetit3

Oh, yes! This kind of hybridization (long read first, polish with short reads) sounds interesting. We are getting more and more long reads to process. I appreciated you getting back so quickly. Thanks a lot!

tbazilegith avatar Mar 23 '22 20:03 tbazilegith

Awesome! I'll be in touch in a few days with an update, excited to see what conclusions to come too

rpetit3 avatar Mar 23 '22 20:03 rpetit3

Hi Robert, When you run bactopia on one hydrid sample ( mysample.fastq and mysample_1.fastq, mysample_2.fastq). How many output directories to expect?

  • I imagine just one ( bactopia_output/mysample)
  • But I would like to make sure I am right. -Thanks, TJ

tbazilegith avatar Mar 24 '22 17:03 tbazilegith

Yeah just one should be right

rpetit3 avatar Mar 24 '22 18:03 rpetit3

In the documentation, I notice we can run bactopia on assembly files (.fasta). Is it a good idea to run bactopia with the assembly files(in the assembly sub-directory) generated by bactopia itself? Thanks!

tbazilegith avatar Mar 24 '22 19:03 tbazilegith

I would say no. This feature was really meant for samples from NCBI Assembly or when the FASTQs aren't available but an assembly is (e.g. some older studies/projects)

rpetit3 avatar Mar 24 '22 20:03 rpetit3

Hi Robert, Is there any update about running bactopia on assembly files? I ran bactopia to process some local .fasta files, but had no output. The error log file is empty though.
Here is the command: for s in $(cat samples.txt);do bactopia --sample ${s} --assembly assemblies_zipped/${s}.fasta.gz --assembly_pattern *.fasta.gz --datasets /color/my_directory/datasets/ --outdir ${today}bactopia_output --max_cpus $SLURM_CPUS_PER_TASK --cle
anup_workdir done Thanks! TJ

tbazilegith avatar Mar 29 '22 19:03 tbazilegith

@tbazilegith I released v2.1.0 (https://github.com/bactopia/bactopia/releases/tag/v2.1.0) which now allows long read assembly with short-read polishing (--short_polish). Let me know if there are any questions and issues!

Cheers, Robert

rpetit3 avatar Jun 08 '22 20:06 rpetit3