vsearch icon indicating copy to clipboard operation
vsearch copied to clipboard

Implement relabel @

Open torognes opened this issue 7 years ago • 3 comments

The --relabel @ option will relabel sequences with a prefix constructed from the initial part of the file name, truncated at the first underscore or period.

torognes avatar Sep 22 '16 12:09 torognes

Hi, has this been implemented yet in any release? We have been using vsearch pipelines for amplicon processing but use USEARCH for only the merge step as it gets easier labeling the sequences with the truncated filenames in one step (filename truncated at the first underscore.1, (filename truncated at the first underscore.2 and so on) with the --relabel @ option. An update on this would be great. thanks

bioinfo17 avatar Apr 28 '20 01:04 bioinfo17

Sorry for the late reply. No, this option has not been implemented yet. I hope to get more time for development from June and onwards.

torognes avatar May 15 '20 13:05 torognes

Revisiting this potential new feature, I have a few questions:

How to deal with data streams (pipes or process substitutions)?

vsearch --derep_fulllength <(printf ">s1\nA\n") --minseqlength 1 --relabel @ --quiet --output -

Here the name of the input stream is /dev/fd/63. A possibility could be to assign a default label string if the input file is a stream ("stream" for instance)? or to use an empty string?

How to deal with empty strings? Input file names can yield empty strings once truncated (if file name starts with a dot . or an underscore). Should vsearch allow empty label strings? (I think it should). Should vsearch do that silently?

In the meantime, that feature can be emulated with a pattern substitution (bash):

INPUT="myfile.fasta"
printf ">s1\nA\n" > "${INPUT}"
vsearch \
    --derep_fulllength "${INPUT}" \
    --minseqlength 1 \
    --relabel "${INPUT/[_.]*/}" \
    --quiet \
    --output -
rm "${INPUT}"
>myfile1
A

frederic-mahe avatar Dec 12 '23 14:12 frederic-mahe