v6d icon indicating copy to clipboard operation
v6d copied to clipboard

Improve the implementation of `find_most_precise_match`

Open sighingnow opened this issue 3 years ago • 3 comments

Describe your problem

Improve the implementation of find_most_precise_match using a trie-like data structure, rather than traverse a sorted dict in reverse order every time.

https://github.com/v6d-io/v6d/blob/43537d34a3bd598da66ac5fc28d5d400503a1594/python/vineyard/core/utils.py?_pjax=%23js-repo-pjax-container%2C%20div%5Bitemtype%3D%22http%3A%2F%2Fschema.org%2FSoftwareSourceCode%22%5D%20main%2C%20%5Bdata-pjax-container%5D#L20-L36

Additional context

Add any other context about the problem here.

sighingnow avatar Dec 23 '21 06:12 sighingnow

Maybe I can try this. After this week of final exam, ε=ε=ε=┏(゜ロ゜;)┛

Y-jiji avatar Jun 27 '22 13:06 Y-jiji

Maybe I can try this. After this week of final exam, ε=ε=ε=┏(゜ロ゜;)┛

Hi @Y-jiji Thanks for being interested in this issue! I would add more detail about the issue later today.

sighingnow avatar Jun 28 '22 01:06 sighingnow

The find_most_precise_match in the following way:

Image we have two object types vineyard::python::Tensor and vineyard::python::DataFrame (note that they have the same prefix/namespace vineyard::python). Now we have a generic python value resolver that has the following signature:

def python_value_resolver(obj: vineyard.Object) -> object:

And has been registered as

get_resolver_context().register('vineyard::python', python_value_resolver)

Now we have a specialized resolver which takes a vineyard::python::Tensor and resolves it to pytorch.Tensor, then we could have the following resolver:

def pytorch_tensor_resolver(obj: vineyard.Object) -> object:

get_resolver_context().register('vineyard::python::Tensor', pytorch_tensor_resolver)

Consider we have a vineyard object with type vineyard::python::Tensor, then we need to find the most specified resolver for it (the pytorch_tensor_resolver in this case). We currently achieve the goal by use a sorted dict as the registry, and traverse the dict in the reversed order to find the first matched resolver (the resolver key is the prefix of the object type name), that is exactly how current find_most_precise_match works.

You can see that there would be many redundant comparison in the process and it can be optimzied by some string processing/search tricks to make it faster.

sighingnow avatar Jun 29 '22 07:06 sighingnow