obnb icon indicating copy to clipboard operation
obnb copied to clipboard

OntologyGraph trivial hash using object count

Open RemyLau opened this issue 2 years ago • 0 comments

A recent fix #155 takes care of the issue of being unable to handle (propagate attributes) multiple (large) ontology graphs (#128). It does so by clearing up the lru_cache after each "context session". This context session is created for handling static ontology graphs where node and edge are not modified within the session (node attributes are fine).

The fundamental issue with not clearing up the cache is that the hash used by lru_cache encounters hash collision due to the trivial hash and is thus forced to call __eq__ to compare the two OntologyGraph. The comparison is inherited from idhandler and is quite costly as it has to go through all the node properties, including node attributes, node names, etc.

https://github.com/krishnanlab/NetworkLearningEval/blob/e575e5da572926d7ac2d56a71e28df651e2c7d15/src/NLEval/util/idhandler.py#L307-L323

New thoughts

This approach does work out in terms of reducing the overhead due to hash collision, but it also requires more mental overhead of remembering to use the cache_on_static context. Another thought that might also solve the collision issue while not having to use the cach_on_static context is to improve the trivial_hash by using the count of the number of instances of OntologyGraph. In this way, each OntologyGraph instance will have a unique trivial hash.

Useful readings

  • https://www.asmeurer.com/blog/posts/what-happens-when-you-mess-with-hashing-in-python/

RemyLau avatar Mar 10 '22 16:03 RemyLau