RagTag
RagTag copied to clipboard
`updategff` produces wrong output if parts of the scaffolds are used in the AGP file
Hi!
Not sure if that was the indended use of the updategff
command (or even the AGP format), but I was trying to update annotation based on an AGP file where I introduced a break into the chromosome and got corrupted GFF file. Minimal example follows:
echo $'## gff-version 3.1\nbibaboba\ttest\tgene\t5\t6\t.\t+\t.\tID=sas' > example.gff
echo $'## agp-version 2.1\nbiba\t1\t4\t1\tW\tbibaboba\t1\t4\t+\nboba\t1\t4\t1\tW\tbibaboba\t5\t8\t+' > example.agp
$cat example.gff
## gff-version 3.1
bibaboba test gene 5 6 . + . ID=sas
$cat example.agp
## agp-version 2.1
biba 1 4 1 W bibaboba 1 4 +
boba 1 4 1 W bibaboba 5 8 +
Obviously, positions 5 and 6 of bibaboba
should then be converted into positions 1 and 2 of boba
. But what I get is
$ ragtag.py updategff example.gff example.agp
Fri Sep 13 11:10:44 2024 --- VERSION: RagTag v2.1.0
Fri Sep 13 11:10:44 2024 --- CMD: ragtag.py updategff example.gff example.agp
## gff-version 3.1
boba test gene 5 6 . + . ID=sas
Fri Sep 13 11:10:44 2024 --- INFO: Goodbye
The coordinates do not get transformed.
For completeness, I attach the solution that leaves me with the correct coordinates (for BED files though):
#!/usr/bin/gawk -f
BEGIN{
help="\
This script updates a BED file using the transformation defined by an AGP file. \n\
That is, BED in coordinates of contigs gets transformed to coordinates of scaffolds. \n\
Pleaze bgzip and tabix the BED file before operation. \n\
EXAMPLE: updategff.awk file.agp file.bed.gz > file_transform.bed"
if (ARGC < 2) {print help; exit 1}
OFS="\t"
}
# Exploit tabix to query bed files
NR==FNR && !/^#/ && $5=="W" {
cmd="tabix " ARGV[2] " " $6":"$7"-"$8
while ( cmd | getline bedline > 0 ) {
split(bedline, bed)
if (bed[2] < $7 || bed[3] > $8) { next }
if ($9=="+"){
printf "%s\t%s\t%s", $1, bed[2]-$7+$2, bed[3]-$7+$2
} else {
printf "%s\t%s\t%s", $1, $8-bed[3]+$2, $8-bed[2]+$2 }
if (length(bed) > 3) {
for (i=4;i<=length(bed);i++) {
printf "\t%s", bed[i]
print ""
}
}
}
close(cmd)
}