vsearch
vsearch copied to clipboard
Implement relabel @
The --relabel @
option will relabel sequences with a prefix constructed from the initial part of the file name, truncated at the first underscore or period.
Hi, has this been implemented yet in any release? We have been using vsearch pipelines for amplicon processing but use USEARCH for only the merge step as it gets easier labeling the sequences with the truncated filenames in one step (filename truncated at the first underscore.1, (filename truncated at the first underscore.2 and so on) with the --relabel @ option. An update on this would be great. thanks
Sorry for the late reply. No, this option has not been implemented yet. I hope to get more time for development from June and onwards.
Revisiting this potential new feature, I have a few questions:
How to deal with data streams (pipes or process substitutions)?
vsearch --derep_fulllength <(printf ">s1\nA\n") --minseqlength 1 --relabel @ --quiet --output -
Here the name of the input stream is /dev/fd/63
. A possibility could be to assign a default label string if the input file is a stream ("stream" for instance)? or to use an empty string?
How to deal with empty strings? Input file names can yield empty strings once truncated (if file name starts with a dot .
or an underscore). Should vsearch
allow empty label strings? (I think it should). Should vsearch
do that silently?
In the meantime, that feature can be emulated with a pattern substitution (bash):
INPUT="myfile.fasta"
printf ">s1\nA\n" > "${INPUT}"
vsearch \
--derep_fulllength "${INPUT}" \
--minseqlength 1 \
--relabel "${INPUT/[_.]*/}" \
--quiet \
--output -
rm "${INPUT}"
>myfile1
A