flowpipe Slow call `get_hash` in `NodeEncoder`

trafficstars

Is your feature request related to a problem? Please describe. In our project some graphs are time critical, when profiling the code the NodeEncoder arose as a bottleneck.

Describe the solution you'd like Can we think of something else to serialize nodes without using the sha256 of the hashlib?

Describe alternatives you've considered Maybe we can use the __repr__, this method describe the object in a single line.

Do you guys have alternatives?

Jun 30 '20 06:06 Onandon11

It should not be too hard to make the hashing function used there configurable. There is no specific reason I picked the sha256 except for it being quite common and available in the python standard library. To me, one advantage of the neat representations of flowpipe graphs is to easily tell when to graphs are the same (after execution), so a simple __repr__ which falls back to the in-memory location of python objects that didn't define it would not fit the bill. But it could make sense as a default, since every object supports it!

The NodeEncoder is used to provide something json compatible from generic python objects. A quick look at the code shows me that it should only get called in the Node.list_repr() or Graph.list_repr() methods. I might have missed something, though.

Can you maybe elaborate a little bit on the circumstance in which you find the slowdown from the NodeEncoder? Maybe this is a pointer to something else that needs improvement.

Jun 30 '20 09:06 neuneck

I agree on why you picked the sha256 and I also agree that this is a fine way because it indeed shows if objects are the same.

If it should only be called by Node.list_repr() and Graph.list_repr() than indeed something is going on. I inly use the function when I'm debugging, but thereafter I never call them again. However the NodeEncoder shows up many times in my profiler when running whitout these calls.

I will look why they are called and where :) I keep you updated!

Jul 14 '20 11:07 Onandon11

So I profiled my code and saw that it's being called in:

https://github.com/PaulSchweizer/flowpipe/blob/125e5bddc7e2edcd14a28af20334bdf69f54db1b/flowpipe/plug.py#L60-L66

This function is called every time a value is updated. I don't think this is expected behavior isn't it?

Jul 14 '20 11:07 Onandon11

Oh, yes it is. This piece of logic makes sure that a plug/node stays clean if you set the same input again. Thay way, when you re-run the graph and only parts of the inputs change, you can simply set all of the values, but only the altered part of the graph re-runs.

Would a way to override the hash function used here help you out? Alternatively, I think you might be able to get a dirty quick fix by monkey-patching the utilities.get_hash function before importing anything else from fowpipe

import flowpipe.utilities
def null_hash(*args, **kwargs):
    return None
flowpipe.utilities.get_hash = null_hash

import flowpipe....

I haven't tried it though, it might not work at all.

Jul 14 '20 12:07 neuneck

Hmm, again, you're right :p I don't use it that frequently (updating an existing graph), so that's why I didn't thought about it I think.. Maybe it's due to my other issue #135, were I frequently re-create Graphs because they are inside Nodes.

Jul 14 '20 13:07 Onandon11

It would be easy to make that hash function a constant in the flowpipe.utilities package and make use of python's lazy evaluation for convenient overriding. I'm talking about

## flowpipe/utilitied.py
HASH_FUNC = hash_func
....
def hash_func(...):
    ...

and

## flowpipe/plug.py
import .utilities
...
    def _update_value(self, value):
        old_hash = utilities.HASH_FUNC(self.value)
        new_hash = utilities.HASH_FUNC(value)
...

This way, users like you, that are slowed down by this evaluation (or work with objects where the current hashing function doesn't work), can override the hash function conveniently, by just setting

flowpipe.utilities.HASH_FUNC = my_hash_func
... # rest of the code

Do you think that will help you? It feels pythonic to me, since it's easy to explain and explicit. Dynamically generating graphs is fine, and should be fine, too. Not nesting them in nodes will not help with this issue, at least not as far as I can see.

Jul 14 '20 14:07 neuneck

flowpipe flowpipe copied to clipboard

Slow call `get_hash` in `NodeEncoder`

flowpipe
flowpipe copied to clipboard