biopandas
biopandas copied to clipboard
Stream support for exporting pdbs not working with OTHERS record
Describe the bug
When trying to export pdb data with ATOM and OTHERS entries using .to_pdb_stream I always get a pandas.errors.IntCastingNaNError (cf. Steps/Code to Reproduce).
As I need to maintain the TER markers in the resulting pdb data, the content of the OTHERS frame is necessary.
When writing directly to a pdb file with .to_pdb there is no such issue. A possible approach in fixing could be an abstract base function for both methods or to specify the desired output (i.e. file or stream) in to_pdb as mentioned in #108
Steps/Code to Reproduce
Example:
from biopandas.pdb import PandasPdb
pdb_df = PandasPdb().fetch_pdb('1ou5')
out_string = pdb_df.to_pdb_stream(records=('ATOM', 'OTHERS'))
Expected Results
Stream containing the specified records in pdb format.
Actual Results
A pandas.errors.IntCastingNaNError stemming from Line 909 in pandas_pdb.py
df.residue_number = df.residue_number.astype(int)
which is executed on the entire concatenated DataFrame. As the OTHERS frame doesn't contain residue number entries, these cells are always NaN after concatenating.
Versions
biopandas 0.5.0dev
Linux-5.4.0-91-generic-x86_64-with-glibc2.31
Python 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0]
Scikit-learn 1.3.0
NumPy 1.23.5
SciPy 1.11.1