chopper icon indicating copy to clipboard operation
chopper copied to clipboard

v0.10.0b not available in conda

Open kevfengler227 opened this issue 3 months ago • 10 comments

Can you add the lastest version of chopper to conda? I have been looking for a tool that is capable of trimming high quality segments from ONT reads. There is a nefarious type of ONT read where the pore goes to sleep for awhile and creates a read with high-quality ends and a poor quality segment of low-complexity garbage in the middle. Sometimes these reads with get incorporated into hifiam assemblies in --ont mode. It can be challenging to find and fix these later, so much better to remove upfront. How do you think chopper will handle this type of read?

Thanks, KF

The following NEW packages will be INSTALLED:

binutils_impl_lin~ pkgs/main/linux-64::binutils_impl_linux-64-2.40-h5293946_0 chopper bioconda/linux-64::chopper-0.10.0-hcdda2d0_0 clang conda-forge/linux-64::clang-21.1.0-default_h36abe19_1 clang-21 conda-forge/linux-64::clang-21-21.1.0-default_h99862b1_1 kernel-headers_li~ conda-forge/noarch::kernel-headers_linux-64-3.10.0-he073ed8_18 libclang-cpp21.1 conda-forge/linux-64::libclang-cpp21.1-21.1.0-default_h99862b1_1 libgcc-devel_linu~ conda-forge/noarch::libgcc-devel_linux-64-15.1.0-h4c094af_105 libllvm21 conda-forge/linux-64::libllvm21-21.1.0-hecd9e04_0 sysroot_linux-64 conda-forge/noarch::sysroot_linux-64-2.17-h0157908_18

kevfengler227 avatar Sep 12 '25 14:09 kevfengler227

Hi,

I think v0.10.0b is functionally entirely the same as v0.10.0. It was just an issue when releasing the version/pushing the tag and I had to retrigger Github Actions. Or are you missing functionality with v0.10.0?

Regarding your specific question, I don't think that chopper can currently do out of the box. There is code to select the best segment from a read, but not to remove a low-quality segment. It is possible to support that, though. Do you know how low the quality scores get in such a low-quality segment? And you propose splitting that read into two higher-quality parts?

Wouter

wdecoster avatar Sep 17 '25 19:09 wdecoster

yes, chopper 0.10.0 from conda does not have the trimming options for the various -trim-approach values

Usage: chopper [OPTIONS]

Options: -q, --quality <MINQUAL> Sets a minimum Phred average quality score [default: 0] --maxqual <MAXQUAL> Sets a maximum Phred average quality score [default: 1000] -l, --minlength <MINLENGTH> Sets a minimum read length [default: 1] --maxlength <MAXLENGTH> Sets a maximum read length --headcrop <HEADCROP> Trim N nucleotides from the start of a read [default: 0] --tailcrop <TAILCROP> Trim N nucleotides from the end of a read [default: 0] -t, --threads <THREADS> Use N parallel threads [default: 4] -c, --contam <CONTAM> Filter contaminants against a fasta --inverse Output the opposite of the normal results -i, --input <INPUT> Input filename [default: read from stdin] --maxgc <MAXGC> Filter max GC content [default: 1] --mingc <MINGC> Filter min GC content [default: 0] --trim <TRIM> Set a q-score cutoff to trim read ends -h, --help Print help -V, --version Print version

kevfengler227 avatar Sep 18 '25 14:09 kevfengler227

Yes, splitting the read into two high quality segments would be ideal, but as long as it splits the read and returns one high-quality segment that would work too. In such cases of pore stuttering, the low-quality segments have very low scores <5. So I could envision functionality that keeps all segments above the min cut-off score that are then subjected to min length filtering.

Thanks, KF

kevfengler227 avatar Sep 18 '25 14:09 kevfengler227

Ah, I need to make a new release altogether. V0.10.0 b also doesn't have those new changes. I'll start with that and then look into implementing the read splitting approach.

wdecoster avatar Sep 19 '25 10:09 wdecoster

Copilot and I implemented your request and added a new trimming mode. Could you test if this solves your problem? I will release v0.11.0 now.

wdecoster avatar Sep 19 '25 11:09 wdecoster

Thanks! This is feature working as intended, but the issue appears more complicated than I originally thought. I never really visualized the quality scores of an ONT read before. It turns out that the quality scores are more variable than I imagined. Thus, this feature needs a minimum window length parameter to be useful. Otherwise the read gets split up into hundreds of subreads. For example, for the read below, I would want to split this read where the quality score dips below 15 for 500 or 1000 bp in order to excise this low-quality segment and output 2 high quality subreads.

kevfengler227 avatar Sep 19 '25 17:09 kevfengler227

Image

kevfengler227 avatar Sep 19 '25 18:09 kevfengler227

Right, that makes sense. I will look into it.

wdecoster avatar Sep 19 '25 19:09 wdecoster

This idea is interesting - is it integrated into the latest release and ready for testing ? Thanks.

Edit - I guess it was integrated in this release : https://github.com/wdecoster/chopper/releases/tag/v0.11.0

colindaven avatar Nov 04 '25 15:11 colindaven

It needs a little more work, not just a quality cutoff but also a window length

wdecoster avatar Nov 04 '25 15:11 wdecoster