cutadapt icon indicating copy to clipboard operation
cutadapt copied to clipboard

UMI '--rename' regex/utility

Open joegeorgeson opened this issue 1 year ago • 1 comments

Hi cutadapt devs,

I want to use cutadapt --rename to extract the meaningful portion of the UMI to the read name, but process the read to cut the entirety of the UMI. If this is currently possible can you let me know how I might change the below to achieve this? If it's not possible, can you add this as a feature in a future version? TIA!

A simple example is where I want to cut the first 7bp from R1 but only add the first 6bp to the read name;

cutadapt -u 7 --rename='{id} {comment} $(echo {cut_prefix} | cut -c1-6)'

What I'm hoping to get is; 1:2101:13928:1000 1:N:0:GTCGCCTT+AAA/ACTAATT NTTTAT

But what is returned is; 1:2101:13928:1000 1:N:0:GTCGCCTT+AAA/ACTAATT $(echo NTTTATT | cut -c1-6)

joegeorgeson avatar Oct 21 '23 01:10 joegeorgeson

Hi, I agree this would be nice to have, but it’s currently not possible.

For the moment, you will have to postprocess your read names. Maybe something like this:

cutadapt -u 7 --rename '{header} {cut_prefix}' input.fastq.gz | \
  awk 'NR%4==1 {$3=substr($3,1,6)};1' | \
  gzip > output.fastq.gz

marcelm avatar Oct 21 '23 07:10 marcelm