taxprofiler icon indicating copy to clipboard operation
taxprofiler copied to clipboard

Add alternative long-read (nanopore) preprocessing tools

Open sofstam opened this issue 2 years ago • 17 comments

Description of feature

Since Porechop is no longer supported, it is maybe useful to investigate Porechop_ABI as alternative.

Or https://github.com/epi2me-labs/pychopper

sofstam avatar Oct 13 '22 14:10 sofstam

Would need to investigate putting it on conda...

jfy133 avatar Oct 14 '22 16:10 jfy133

https://anaconda.org/bioconda/porechop_abi

There is a conda environment, I think that this can be added in the next release or what do you think?

sofstam avatar Oct 17 '22 07:10 sofstam

Oh my bad, I misunderstood their installation instructions 🤦‍♀️

This is fine for inclusion for first release while we wait for final Bracken/KrakenUniq!

jfy133 avatar Oct 17 '22 07:10 jfy133

https://github.com/bonsai-team/Porechop_ABI/issues/6

sofstam avatar Nov 01 '22 08:11 sofstam

I was thinking if we should only support porechop_abi and drop porechop at all.

sofstam avatar Nov 01 '22 13:11 sofstam

Is it equivalent? I don't have any feeling either way as I don't use the data, so I'm happy let you make the call :)

jfy133 avatar Nov 01 '22 13:11 jfy133

According to their github page: Note that Porechop_ABI is not designed to handle barcoded sequences adapters. Demultiplexing should be done using standard Porechop commands or other appropriate tools. It is not equivalent as it does not perform demultiplexing. However, demultiplexing is supported by Guppy and it is currenly preferred. @Midnighter have you worked with Nanopore data?

sofstam avatar Nov 01 '22 13:11 sofstam

that's fine, we don't suppport demultiplxiing either

jfy133 avatar Nov 01 '22 14:11 jfy133

Hi, i'm the main developper of porechop_abi. If you have any question on the project, feel free to ask me directly, i will be glad to answer.

Regarding the "equivalent or not" part:

We "only" added a step between the adapter database object creation and adapter ressearch in the reads. porechop_abi

The code base of porechop was left as original as possible, and all commands that used to run on porechop are unchanged. The behaviors of the two softwares are identical, as long as you don't use -abi or -go options. What porechop can do, porechop_abi can too. The part we can't handle (at least for now) is inferring barcoded adapters sequences from the reads only (hence the quoted sentence in your previous comment).

What could make a difference is the name of the executable. We had to change it to avoid installation conflicts and it may need to be changed in pipelines if you want to use our version.

TL;DR: It can work the same, but the name is different.

On the Demultiplexing part: Using porechop is prety much obsolete for demultiplexing. Even the dedicated tool (Deepbinner )developped by Ryan Wick (original author of porechop) is now deemed too old. Guppy (Nanopore basecaller) seems to be the "current standard" for demultiplexing.

qbonenfant avatar Nov 01 '22 15:11 qbonenfant

However, demultiplexing is supported by Guppy and it is currenly preferred. @Midnighter have you worked with Nanopore data?

I've only worked with FASTQ files that are the result of running guppy so far. I agree with @jfy133 that we should not include demultiplexing in this pipeline. We don't do it for short reads either.

Midnighter avatar Nov 01 '22 15:11 Midnighter

I've only worked with FASTQ files that are the result of running guppy so far. I agree with @jfy133 that we should not include demultiplexing in this pipeline. We don't do it for short reads either.

I agree, I was just trying to list the differences between porechop and porechop_abi , not adding demultiplexing steps.

sofstam avatar Nov 01 '22 15:11 sofstam

@qbonenfant Thank you for the detailed description 👍

sofstam avatar Nov 01 '22 16:11 sofstam

Or https://github.com/epi2me-labs/pychopper

jfy133 avatar Apr 20 '23 13:04 jfy133

Or https://github.com/wdecoster/chopper

jfy133 avatar Apr 20 '23 13:04 jfy133

It seems that Pychopper supports ONT long-read sequencing, whereas chopper is compatible with both PacBio and ONT sequences. Therefore, it might be more advantageous to utilize chopper in this context.

LilyAnderssonLee avatar Aug 11 '23 13:08 LilyAnderssonLee

Pychopper is for cDNA reads so not useful for us.

I will have to test first but to my understanding, we can use porechop_abi as optional step for adapter trimming and drop porechop. Regarding chopper, it can be added as an alternative to filtlong.

What do you think?

sofstam avatar Nov 22 '23 16:11 sofstam

So I am going to add porechop_abi to taxprofiler now as an alternative tool for adapter trimming of long reads

LilyAnderssonLee avatar Jul 11 '24 12:07 LilyAnderssonLee