pdb2pqr icon indicating copy to clipboard operation
pdb2pqr copied to clipboard

How to handle charge and radii in mmCIF

Open intendo opened this issue 4 years ago • 16 comments

We can use CIF files as input to PDB2PQR but how do we handle the atom charge and radii?

Using the mmcif_pdbx package, we can load PDB (atom_site) data from a CIF file using something like the following:

# Example code of how to get the atom_site container from a mmCIF file 
from pdbx.reader import PdbxReader 

@pytest.mark.parametrize("input_cif", ["1kip.cif", "1ffk.cif"], ids=str)
def test_data_file(input_cif):    
    """Test data file input."""    
    input_path = DATA_DIR / Path(input_cif)
    with open(input_path, "rt") as input_file:
        reader = PdbxReader(input_file)
    data_list = []
    reader.read(data_list)
    for item in data_list:
        print(item.get_object("atom_site").print_it())

There are other dictionaries that have radius and charge.

For example, there is the chem_comp_atom.charge (integer) or chem_comp_atom.partial_charge(float) at (https://www.iucr.org/__data/iucr/cifdic_html/2/cif_mm.dic/index.html).

The question might be how to tie the atom_site(s) and the other dictionary sections together using _chem_comp_atom.atom_id to the _atom_site.label_atom_id.

intendo avatar Jan 25 '21 22:01 intendo

@speleo3 and @orbeckst -- do you see any use cases where PQR-like information would be useful in mmCIF format? If not, we'll probably treat this as low priority. Thanks!

sobolevnrm avatar Jan 26 '21 03:01 sobolevnrm

I'd be all for deprecating PQR and only using something mmCIF based instead. The use case would be that we could abandon PQR parsers :-)

That was my original request in https://github.com/Electrostatics/pdb2pqr/issues/34

Such a file could be a 100% valid mmCIF file with added radius and charge columns. I'm not sure though if _chem_comp_atom properties are a good fit, that would require for example different residue names for two HIS with different charge configuration. It would be much easier to add two custom columns to the _atom_site table, and/or propose adding such columns to one of the official dictionaries.

speleo3 avatar Jan 26 '21 07:01 speleo3

Oops -- sorry about that! I re-opened the original issue.


From: Thomas Holder [email protected] Sent: Monday, January 25, 2021 11:05 PM To: Electrostatics/pdb2pqr [email protected] Cc: Nathan Baker [email protected]; Assign [email protected] Subject: Re: [Electrostatics/pdb2pqr] How to handle charge and radii in mmCIF (#175)

I'd be all for deprecating PQR and only using something mmCIF based instead. The use case would be that we could abandon PQR parsers :-)

That was my original request in #34https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FElectrostatics%2Fpdb2pqr%2Fissues%2F34&data=04%7C01%7C%7C3b372e00093847dee3ec08d8c1c8d3cc%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637472415581901206%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=CAj5NoiQ4hg1CEuilabDdzM5CPvav7m0FDajpp4rSfA%3D&reserved=0

Such a file could be a 100% valid mmCIF file with added radius and charge columns. I'm not sure though if _chem_comp_atom properties are a good fit, that would require for example different residue names for two HIS with different charge configuration. It would be much easier to add two custom columns to the _atom_site table, and/or propose adding such columns to one of the official dictionaries.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FElectrostatics%2Fpdb2pqr%2Fissues%2F175%23issuecomment-767348259&data=04%7C01%7C%7C3b372e00093847dee3ec08d8c1c8d3cc%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637472415581901206%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=rP75Z9vVqjYYghVPHlGXihorWkX%2B7nTpuUcd3NruSeU%3D&reserved=0, or unsubscribehttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAOX7WFCX4AZVVBT4KMZQYLS3ZSVHANCNFSM4WSOMGBQ&data=04%7C01%7C%7C3b372e00093847dee3ec08d8c1c8d3cc%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637472415581911197%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=pNhyBMsRPxcLLvFLodUtIbo3NHW1w%2BD%2F8tYrSjFAeJ8%3D&reserved=0.

sobolevnrm avatar Jan 27 '21 03:01 sobolevnrm

@speleo3 I think we could add the custom fields in the _atom_site table but I didn't know if that would create non-standard mmCIF files that could not then be parsed by other mmCIF parsers like https://github.com/biopython/biopython/blob/master/Bio/PDB/MMCIFParser.py

That is why I was wondering if there is another section that could be used to hold the charge and radius that would be accessible to the mmcif_pdbx parser but not break other parsers.

My concern would be that a user would use APBS or PDB2PQR and end up creating a mmCIF output file that would be incompatible with other mmCIF parsers in their chaining/pipeline processing.

intendo avatar Feb 02 '21 19:02 intendo

Whats the status of the CIF output file?

danny305 avatar Mar 17 '21 02:03 danny305

I am working on it as quickly as I can. Would you like to help?

sobolevnrm avatar Mar 17 '21 02:03 sobolevnrm

Yes. I can probably start dedicating some serious time mid next week.

Can y'all catch me up over the next few days on the status, implementation design, and what needs to be done?

danny305 avatar Mar 17 '21 04:03 danny305

Sure! I have some initial code that I'll post in a few days. The PDB -> CIF translation works well but I was holding off releasing it to get the CIF -> PDB part done. I'll remedy that shortly.

Thanks!

On Tue, Mar 16, 2021 at 9:28 PM Danny Diaz @.***> wrote:

Yes. I can probably start dedicating some serious time mid next week.

Can y'all catch me up over the next few days on the status, implementation design, and what needs to be done?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/Electrostatics/pdb2pqr/issues/175#issuecomment-800784399, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOX7WHFQAZWUREMORVJ2OLTEAVVLANCNFSM4WSOMGBQ .

sobolevnrm avatar Mar 17 '21 12:03 sobolevnrm

Awesome. Let me know when you post the code for me to begin familiarizing myself.

Ill let you know when I finish up what I am working on and can transition over to this here in the next week or so.

danny305 avatar Mar 17 '21 18:03 danny305

For clarification, this is "few days" in COVID time: I'm still working on the code. I wrote most of it and then found a better way to do it so...

sobolevnrm avatar Mar 23 '21 01:03 sobolevnrm

I am preparing slides/code for a talk this Friday.

I am also implementing the writing of CIF files functionality in our other library dependency (freesasa).

So quite honestly, sometime next week will probably be more realistic on my end.

Writing a PQR CIF file is the last loose end in our tech stack so I definitely want to hammer this out in the near future.

Glad we are openly communicating our timelines.

danny305 avatar Mar 23 '21 01:03 danny305

Ready to start contributing. I'm guessing it's the nathan/cif branch?

danny305 avatar Apr 28 '21 21:04 danny305

@speleo3 @sobolevnrm did we ever decide on the two custom field names in the _atom_site table for the charge and radii?

intendo avatar Apr 29 '21 00:04 intendo

No, but we should probably address this in https://github.com/Electrostatics/pdb2cif.

@danny305 -- I was going to redirect you over there as well for this thread.

sobolevnrm avatar Apr 29 '21 01:04 sobolevnrm

Why don't we just use Gemmi to convert between the two?

danny305 avatar Apr 29 '21 02:04 danny305

Let's move this discussion to the other repo. Can you provide description of what Gemmi does over there? Thanks.

sobolevnrm avatar Apr 29 '21 03:04 sobolevnrm