cutadapt icon indicating copy to clipboard operation
cutadapt copied to clipboard

How many barcodes can cutadapt handle at once?

Open zuoyichen opened this issue 6 months ago • 2 comments

When using cutadapt to demultiplex barcodes from single-cell full-length sequencing, I encounter issues when trying to process a large number of barcodes (such as 17,000 barcodes).

[Errno 24] Too many open files This is cutadapt 5.1 with Python 3.12.11 Command line parameters: --revcomp -g file:sgRNA_scaffold.fasta -o ../fasta/{name}.fasta.gz --rename={header};sgRNA={adapter_name} -j 20 -e 0.2 -O 30 --action=none --untrimmed-output untrimmed.fasta.gz ../../raw_data/revio/refine1/P20ED251426326-1_r84069_20250514_081123_1_D01.hifi_reads.flnc.corrected.sorted.dup.fasta Processing single-end reads on 20 cores ...

zuoyichen avatar Jun 10 '25 12:06 zuoyichen

This has been reported earlier in #320, but I thought it had been fixed.

  • Can you please add --debug to your command and re-run it?
  • When you do this, do you see the message Too many open files, attempting to raise soft limit in the output?
  • Are you on Linux or macOS? How did you install Cutadapt?
  • What does ulimit -n output?
  • What does ulimit -H -n output?

marcelm avatar Jun 10 '25 14:06 marcelm

I'm having this issue as well, trying to run demultiplexing on a single core with a file that has ~79000 barcodes with anchored 5' adaptors. I'm on MacOS, and I installed cutadapt using Anaconda, running cutadapt 5.1 with python 3.12.9. The following was observed for a run where I had the soft limit at 256 and the hard limit at "unlimited." However, I tried raising the soft limit up to 10000000 and that didn't seem to change when the error was thrown. Based on the debug log, it seems like the "attempting to raise soft limit" appears initially after cutadapt opens the output fastq.gz files for several barcodes (plausibly 256?), and then appears after processing every 8 barcodes thereafter. At some point, attempting to raise the limit is followed by "DEBUG: Command line error", after which it no longer is able to open more files and throws the [Errno 24]. It seems to get stuck opening the file for the same barcode each time, which appears to be 3/4 of the way through the fasta file that contains the different adaptors/barcodes, assuming that it is going in that order. Happy to provide more information if needed!

bri-zhong avatar Jun 20 '25 05:06 bri-zhong