ProDy icon indicating copy to clipboard operation
ProDy copied to clipboard

MSA format as 'fasta' string

Open rtviii opened this issue 2 years ago • 4 comments

Am i blind or is there no method to output MSA as a fasta-formatted string? If there is a method on the class, it is not obvious. Had to resort to:

def format_msa(msa:MSA)->str:
    fasta = ''
    for seq in msa:
        fasta += '>{}\n{}\n'.format(seq.getLabel(), seq)
    return fasta

Purpose being piping a computed prody.MSA to another process that expects a fasta string/file. Would be a nice to have.

rtviii avatar May 03 '23 06:05 rtviii

You’re right. We don’t have anything like that and it would be nice.

ideally, we’d make one for each file format

thanks

jamesmkrieger avatar May 03 '23 07:05 jamesmkrieger

I'll try to throw some prs together then. Didn't find the converse path(from_string) intuitive either.

    out = ... # external process -> str
    msafile          = MSAFile(StringIO(out), format="fasta")
    seqs, descs  =  zip(*msafile._iterFasta())

    sequences    = [*map(lambda x : np.fromstring(x,dtype='S1'),seqs)]
    descriptions = [*descs]
    chararr      = np.array(sequences).reshape(len(sequences), len(sequences[0]))

    return MSA(chararr, labels=descriptions, title="Class {} profile extended.".format( poly_class))
    ```

rtviii avatar May 04 '23 00:05 rtviii

Thanks! That would be great

Only change what you need to change unless you’re very sure though. We have some c++ extensions for the sequence file parsing and writing and we have to be careful not to break the connection to them.

jamesmkrieger avatar May 04 '23 06:05 jamesmkrieger

Please also call your function formatMSA not format_msa to be consistent with how we name functions and methods in prody

jamesmkrieger avatar May 04 '23 06:05 jamesmkrieger