taxprofiler
taxprofiler copied to clipboard
Add alternative long-read (nanopore) preprocessing tools
Description of feature
Since Porechop is no longer supported, it is maybe useful to investigate Porechop_ABI as alternative.
Or https://github.com/epi2me-labs/pychopper
Would need to investigate putting it on conda...
https://anaconda.org/bioconda/porechop_abi
There is a conda environment, I think that this can be added in the next release or what do you think?
Oh my bad, I misunderstood their installation instructions 🤦♀️
This is fine for inclusion for first release while we wait for final Bracken/KrakenUniq!
https://github.com/bonsai-team/Porechop_ABI/issues/6
I was thinking if we should only support porechop_abi
and drop porechop
at all.
Is it equivalent? I don't have any feeling either way as I don't use the data, so I'm happy let you make the call :)
According to their github page: Note that Porechop_ABI is not designed to handle barcoded sequences adapters. Demultiplexing should be done using standard Porechop commands or other appropriate tools.
It is not equivalent as it does not perform demultiplexing. However, demultiplexing is supported by Guppy and it is currenly preferred. @Midnighter have you worked with Nanopore data?
that's fine, we don't suppport demultiplxiing either
Hi, i'm the main developper of porechop_abi
.
If you have any question on the project, feel free to ask me directly, i will be glad to answer.
Regarding the "equivalent or not" part:
We "only" added a step between the adapter database object creation and adapter ressearch in the reads.
The code base of porechop
was left as original as possible, and all commands that used to run on porechop
are unchanged. The behaviors of the two softwares are identical, as long as you don't use -abi
or -go
options.
What porechop
can do, porechop_abi
can too.
The part we can't handle (at least for now) is inferring barcoded adapters sequences from the reads only (hence the quoted sentence in your previous comment).
What could make a difference is the name of the executable. We had to change it to avoid installation conflicts and it may need to be changed in pipelines if you want to use our version.
TL;DR: It can work the same, but the name is different.
On the Demultiplexing part:
Using porechop
is prety much obsolete for demultiplexing.
Even the dedicated tool (Deepbinner )developped by Ryan Wick (original author of porechop
) is now deemed too old.
Guppy (Nanopore basecaller) seems to be the "current standard" for demultiplexing.
However, demultiplexing is supported by Guppy and it is currenly preferred. @Midnighter have you worked with Nanopore data?
I've only worked with FASTQ files that are the result of running guppy so far. I agree with @jfy133 that we should not include demultiplexing in this pipeline. We don't do it for short reads either.
I've only worked with FASTQ files that are the result of running guppy so far. I agree with @jfy133 that we should not include demultiplexing in this pipeline. We don't do it for short reads either.
I agree, I was just trying to list the differences between porechop
and porechop_abi
, not adding demultiplexing steps.
@qbonenfant Thank you for the detailed description 👍
Or https://github.com/epi2me-labs/pychopper
Or https://github.com/wdecoster/chopper
It seems that Pychopper
supports ONT long-read sequencing, whereas chopper
is compatible with both PacBio and ONT sequences. Therefore, it might be more advantageous to utilize chopper
in this context.
Pychopper
is for cDNA reads so not useful for us.
I will have to test first but to my understanding, we can use porechop_abi
as optional step for adapter trimming and drop porechop
.
Regarding chopper
, it can be added as an alternative to filtlong
.
What do you think?
So I am going to add porechop_abi
to taxprofiler now as an alternative tool for adapter trimming of long reads