pyfastx icon indicating copy to clipboard operation
pyfastx copied to clipboard

Full Fasta info object without index building

Open oschwengers opened this issue 9 months ago • 1 comments

Hi and thanks a lot for this super-fast python library. We'd like to use this in our tools like Bakta and Platon. Maybe I've overlooked something, but we need a way to parse FASTA files in the fastest possible way, i.e. w/o building an index, but with access to the sequence ID, description and sequence.

So, due to the readme there is:

import pyfastx
for name, seq in pyfastx.Fasta('test.fa.gz', build_index=False):
    print(name, seq)

and:

import pyfastx
for seq in pyfastx.Fasta('test.fa.gz'):
    print(seq.name)
    print(seq.seq)
    print(seq.description)

But what we actually need is:

import pyfastx
for seq in pyfastx.Fasta('test.fa.gz', build_index=False):
    print(seq.name)
    print(seq.seq)
    print(seq.description)

Also, it would be best if the description would already exclude the FASTA id. I think this usecase would be interesting for many other users, as well.

Thanks again and best regards!

oschwengers avatar Sep 19 '23 07:09 oschwengers