arkouda icon indicating copy to clipboard operation
arkouda copied to clipboard

`GroupBy` to Server

Open Ethan-DeBandi99 opened this issue 2 years ago • 5 comments

@pierce314159 and I discussed this while working #1681. SegArray initializes with a grouping property that is not currently being moved server side because GroupBy does not exist on the server. However, most of the initialization for GroupBy is there within UniqueMsg. I believe it would be worthwhile to look into moving GroupBy to a server side object.

@pierce314159 do you have any further thoughts on this? @reuster986 do you have any input?

Ethan-DeBandi99 avatar Sep 01 '22 00:09 Ethan-DeBandi99

I think this is a good idea and definitely necessary to move segarray to the server as it stands today. I also want to link #1737 which is moving categorical to the server to aid in this transition

stress-tess avatar Sep 01 '22 14:09 stress-tess

Sorry for the delayed response! Yes, in principle I support making GroupBy a server-side object, with an important caveat: I think the pdarray attributes -- .permutation, .segments, and .unique_keys -- should be accessible from Python and not opaque like the segments and bytes are for Strings. Making these attributes opaque would break a ton of code, and I think it's worth the extra complexity of exposing them to the user.

reuster986 avatar Sep 14 '22 19:09 reuster986

@reuster986 - I completely agree. I believe there is a few ways that we could achieve this. One would be similar to how SegArray currently functions (due to the object being server side and most functionality still on the client) where we init the object as if everything is on the server. This uses lazy initialization one the values and segments properties and adds a @property function to access these from the server when called. This is only done once as the resulting pdarray is cached. An alternative method would be to update the attrib return from the server to return the create for the GroupBySymEntry as well as the pdarrays listed above. This is something I am also looking at as an option for SegArray.

Ethan-DeBandi99 avatar Sep 15 '22 12:09 Ethan-DeBandi99

I think I'm tracking with those two general approaches (although I'm hazy on the details). Would it make sense to standardize the way we handle all "complex" objects? I.e. make Strings, SegArray, and GroupBy use the same general patterns for initialization and attribute access?

reuster986 avatar Sep 15 '22 13:09 reuster986

That would be my plan. As I am moving SegArray to the server, I am brainstorming approaches. There are advantages/disadvantages to both avenues. Personally, I am in favor of sending back the details on the initial build to eliminate later server calls. With that said, I am 100% on board with standardizing. I am hoping to get some time with @pierce314159 to discuss the more intricate details today as I am currently working through some of this in SegArray. Once we do that, we can provide a detailed plan here for everyone to review before moving forward.

Ethan-DeBandi99 avatar Sep 15 '22 13:09 Ethan-DeBandi99

GroupBy object initial move to the server has been completed.

Ethan-DeBandi99 avatar Nov 21 '22 21:11 Ethan-DeBandi99