graphein icon indicating copy to clipboard operation
graphein copied to clipboard

Fix incorrect node lookup in distance-based edge generation Fixes #418

Open elementare opened this issue 7 months ago • 1 comments

Previously, the node indices from the distance matrix were used to access rows in the full PDB DataFrame (G.graph["pdb_df"]), assuming their indices aligned. This caused incorrect residue pairings when the filtered DataFrame used to compute distances had a different row order or subset of residues.

This patch introduces an explicit mapping from filtered DataFrame indices back to the original node IDs, ensuring that edges are created between the correct residues in the correct chains.

This resolves issues where edges were created between spatially distant residues or between unrelated chains.

Reference Issues/PRs

Fixes #418

What does this implement/fix? Explain your changes

Fixes a mismatch between the distance matrix indices and the full PDB DataFrame (G.graph["pdb_df"]) during edge creation. Ensures that spatial edges are added between correct residue pairs (same chain, correct distance) by mapping filtered indices back to their original node IDs.

What testing did you do to verify the changes in this PR?

I manually validated the fix using the Titin protein as a test case. For randomly selected residues (such as residue 0), I inspected their neighbors in ChimeraX and compared them against the neighbors returned by the updated code. All observed interactions were spatially coherent and within the specified threshold of 7Å. I repeated this process for several residues and found no incorrect long-distance interactions, confirming that the fix produces physically valid edges.

Pull Request Checklist

  • [x] Ran python -m pytest tests/ and made sure that all unit tests pass
  • [x] Confirmed that the bug fix does not affect unrelated modules
  • [x] Verified that no incorrect edges are created across chains

elementare avatar Apr 09 '25 03:04 elementare