bedtools2
bedtools2 copied to clipboard
something wrong in bedtools getfasta -name
BEDtools version 2.26.0. Example as follow: mm10-NlaIII.bed: chr1 0 3000185 HIC_chr1_1@chr1:0-3000185 chr1 3000185 3000316 HIC_chr1_2@chr1:3000185-3000316 chr1 3000316 3000850 HIC_chr1_3@chr1:3000316-3000850 chr1 3000850 3001659 HIC_chr1_4@chr1:3000850-3001659
Code one with parameter -name: bedtools getfasta -fi mm10.fasta -bed mm10-NlaIII.bed -name | fold -w 70 Result fastq header:
HIC_chr1_7512::chr1:4664567-4666090 HIC_chr1_7511::chr1:4664466-4664567
Code one without parameter -name: bedtools getfasta -fi mm10.fasta -bed mm10-NlaIII.bed | fold -w 70 Result fastq header:
chr1:5006220-5006371 chr1:5006142-5006220
Can you help me with this question? Best wish.
I find exactly the same thing with bedtools v2.29.2, seems like the functionality of --name
has changed. Is this now expected behaviour that fastq header should be e.g.:
chr1.tRNA1-ValCAC-::chr1:16725515-16725688(+)
instead of what is used to be (can't now remember the version of the old software I was using):
chr1.tRNA1-ValCAC-(+)
It's true that the -name function has been changed since 2.26.0. The output is not same like what said in the getfasta doc:
$ cat test.fa
>chr1
AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG
$ cat test.bed
chr1 5 10 myseq
$ bedtools getfasta -fi test.fa -bed test.bed -name
>myseq
AAACC
I am using 2.25.0 and it works like above. I really think the old one is what exactlly we need. However, I am not sure it's a bug or intentional, if the former, please fix it.
Sincerly thanks!
It can be fixed by piping through sed:
$ bedtools getfasta -fi test.fa -bed test.bed -name | sed 's/::.*//'
but I would prefer not to have to do the extra step.
It seems that at least as of v.2.30.0 there are new flags
-name now gives the name and coordinates together
-nameOnly does what -name used to do and gives just the name indicated in the .bed
See the man page: https://bedtools.readthedocs.io/en/latest/content/tools/getfasta.html