linkml-runtime icon indicating copy to clipboard operation
linkml-runtime copied to clipboard

Optimize `get_classes_by_slot` round 2

Open sneakers-the-rat opened this issue 11 months ago • 8 comments

Waiting for tests to run on the array branch and browsing through issues and PRs, noticed there was one for optimizing get_classes_by_slot https://github.com/linkml/linkml-runtime/pull/300 and any time there's an induced_slot perf thing i gotta check it out now.

For non-induced slots, #300 is a perf wash (for 100 rounds through all slots in biolink, 6.77 before and 6.81 after), but when i went to profile it with induced slots i was surprised at how long the estimated time for a single round through was supposed to take - roughly 3 hours (~160,000x slower vs non-induced slots).

Using sets is a great idea for removing the step of making something unique at the end. we can take advantage of that further by breaking after the first set addition (since future set additions will do nothing).

the problem is that we are still making basically the most expensive call you can do with schemaview, which is to call induced_slot on every combination of slot and class for a schema. induced_slot is expensive in itself, but it's usually fine to do because you're only calling it in the context of a few classes and the caching can handle how slow it is, but since the call is every slot in the context of every class, no caching can be done.

The fix is relatively simple - since get_classes_by_slot only cares about the existence of the slot on the class, not on its full induced form, there is no need to use induced_slot at all. class_induced_slots iterates over the slots given by class_slots: https://github.com/linkml/linkml-runtime/blob/7c311d975422ddf990ae524fe6d969326c1b9261/linkml_runtime/utils/schemaview.py#L1368

so any slots that class_induced_slots knows about, class_slots knows about. The implementation in this PR also avoids iterating over the classes twice, although that's also a perf wash because iterating through a list and comparing strings takes basically no time.

So for the simple timing function

from linkml_runtime.utils.schemaview import SchemaView
import cProfile

def get_slot_classes():
    sv = SchemaView('biolink.yaml')
    for i in range(100):
        for slot in sv.all_slots(imports=True).values():
            _ = sv.get_classes_by_slot(slot, include_induced=False)

def get_induced_slot_classes():
    sv = SchemaView('biolink.yaml')
    for i in range(1):
        for i, slot in enumerate(sv.all_slots(imports=True).values()):
            _ = sv.get_classes_by_slot(slot, include_induced=True)
            if i >= 4:
                break

cProfile.run('get_slot_classes()', 'before_stats.prof')
cProfile.run('get_induced_slot_classes()', 'before_induced_stats.prof')

where i break out of getting the induced slots after 5 slots because it takes so long,

you get:

version regular per-call induced per-call (s)
before 300 8.234e-05 21.59
after 300 8.092e-05 21.85
this PR 6.672e-05 0.01946

So i'll call the regular difference in the noise and the induced slot call roughly 1100x faster. A full run over biolink takes 12s vs ~3h.

attached profiling results where beforest is before 300, before is 300 and after is this PR.

profiling-get_classes_by_slot.zip

sneakers-the-rat avatar Mar 05 '24 04:03 sneakers-the-rat