Shaun Jackman
Shaun Jackman
The sequences output by `gt extractfeat` are named `>gene_$i`. Could they instead be named after the `Id` or `Name` field of the GFF entry?
@yeban Eep. `-retainids` was pretty obvious. Sorry that I didn't read the documentation more closely. That worked perfectly.
`-seqid` is a nice feature. Could it also include the coordinates of the feature, and possibly the orientation? i.e. ``` >gene_1 [seqid 1:501-1000] ```
Speaking of orientation, I don't see an option to see whether the orientation of the extracted sequence should agree with the FASTA file or the GFF file. [`bedtools getfasta`](http://bedtools.readthedocs.org/en/latest/content/tools/getfasta.html) has...
I can take a stab at this, but it won't be until next week.
Sorry. No time right now. Want to take a stab at a PR yourself, Patrice?
I think supporting both installations is a good idea: simply unzipping the source and running it, or running `make install`, and the latter should install the data in `$prefix/share/genometools`.
See [bedtools flank](http://bedtools.readthedocs.org/en/latest/content/tools/flank.html) for comparison.
I'd like to simply expand the coordinates of existing features. `bedtools flank` adds flanking features, which is useful too.
Here's the hacky sed script I'm using: ``` sh gsed -E '/^##|\tgene\t/!d;s/ID=[^;]*;//;s/Name=/ID=/' in.gff \ |gt extractfeat -type gene -matchdescstart -retainids -seqid -seqfile in.fa - >out.fa ```