seqkit icon indicating copy to clipboard operation
seqkit copied to clipboard

Renaming with key doesn't work

Open desmodus1984 opened this issue 1 month ago • 1 comments

Hi, I am trying to rename the sequences of a fasta file using a key-value file, and seqkit is reading the kv-file but not renaming the sequences. This is the original file:

> seqkit fx2tab -n -l LW-HiC.500k.fasta
LW.HiC.1        192520351
LW.HiC.2        178484844
LW.HiC.3        176502795
LW.HiC.4        170925829
LW.HiC.5        160064494
LW.HiC.6        138427787
LW.HiC.7        134806113
LW.HiC.8        131171632
LW.HiC.9        126580679
LW.HiC.10       122984188
LW.HiC.11       106638979
LW.HiC.12       95179498
LW.HiC.13       93457989
LW.HiC.14       81675979
LW.HiC.15       80788035
LW.HiC.16       51349449
LW.HiC.17       50579338

I want to rename all the sequences after LW.HiC.11, because this sequences is chromosome X, and the adjust the order of the other smaller fragments, and I made a kv file:

LW.HiC.11       LW.HiC.X
LW.HiC.17       LW.HiC.16
LW.HiC.16       LW.HiC.15
LW.HiC.14       LW.HiC.14
LW.HiC.15       LW.HiC.13
LW.HiC.13       LW.HiC.12
LW.HiC.12       LW.HiC.11

and after I tried renaming,

seqkit replace -p ' (.+)$' -r ' {kv}' -k LW-HiC.500k.rename LW-HiC.500k.fasta > LW-HiC.500k.ren.fasta

there is no change

seqkit fx2tab -n -l LW-HiC.500k.ren.fasta
LW.HiC.1        192520351
LW.HiC.2        178484844
LW.HiC.3        176502795
LW.HiC.4        170925829
LW.HiC.5        160064494
LW.HiC.6        138427787
LW.HiC.7        134806113
LW.HiC.8        131171632
LW.HiC.9        126580679
LW.HiC.10       122984188
LW.HiC.11       106638979
LW.HiC.12       95179498
LW.HiC.13       93457989
LW.HiC.14       81675979
LW.HiC.15       80788035
LW.HiC.16       51349449
LW.HiC.17       50579338

I even tried with --keep-key, and nothing.

seqkit replace -p ' (.+)$' -r ' {kv}' -k LW-HiC.500k.rename LW-HiC.500k.fasta --keep-key > LW-HiC.500k.ren.fasta
seqkit fx2tab -n -l LW-HiC.500k.ren.fasta

LW.HiC.1        192520351
LW.HiC.2        178484844
LW.HiC.3        176502795
LW.HiC.4        170925829
LW.HiC.5        160064494
LW.HiC.6        138427787
LW.HiC.7        134806113
LW.HiC.8        131171632
LW.HiC.9        126580679
LW.HiC.10       122984188
LW.HiC.11       106638979
LW.HiC.12       95179498
LW.HiC.13       93457989
LW.HiC.14       81675979
LW.HiC.15       80788035
LW.HiC.16       51349449
LW.HiC.17       50579338

Any reason why renaming is not working? I first used space as separator, but then I used a tab and seqkit accepted the new file but no change was made.

Thanks

desmodus1984 avatar Dec 08 '25 18:12 desmodus1984

The blank (" ") in the regular expression causes the error. The IDs do not have any leading spaces.

$ seqkit replace -p '^(.+)$' -r '{kv}' -K -k LW-HiC.500k.rename LW-HiC.500k.fasta | seqkit seq -n
[INFO] read key-value file: LW-HiC.500k.rename
[INFO] 7 pairs of key-value loaded
LW.HiC.1
LW.HiC.2
LW.HiC.3
LW.HiC.4
LW.HiC.5
LW.HiC.6
LW.HiC.7
LW.HiC.8
LW.HiC.9
LW.HiC.10
LW.HiC.X
LW.HiC.11
LW.HiC.12
LW.HiC.14
LW.HiC.13
LW.HiC.15
LW.HiC.16

shenwei356 avatar Dec 09 '25 08:12 shenwei356