cutadapt
cutadapt copied to clipboard
UMI '--rename' regex/utility
Hi cutadapt devs,
I want to use cutadapt --rename
to extract the meaningful portion of the UMI to the read name, but process the read to cut the entirety of the UMI. If this is currently possible can you let me know how I might change the below to achieve this? If it's not possible, can you add this as a feature in a future version? TIA!
A simple example is where I want to cut the first 7bp from R1 but only add the first 6bp to the read name;
cutadapt -u 7 --rename='{id} {comment} $(echo {cut_prefix} | cut -c1-6)'
What I'm hoping to get is;
1:2101:13928:1000 1:N:0:GTCGCCTT+AAA/ACTAATT NTTTAT
But what is returned is;
1:2101:13928:1000 1:N:0:GTCGCCTT+AAA/ACTAATT $(echo NTTTATT | cut -c1-6)
Hi, I agree this would be nice to have, but it’s currently not possible.
For the moment, you will have to postprocess your read names. Maybe something like this:
cutadapt -u 7 --rename '{header} {cut_prefix}' input.fastq.gz | \
awk 'NR%4==1 {$3=substr($3,1,6)};1' | \
gzip > output.fastq.gz