arkouda
arkouda copied to clipboard
`GroupBy` to Server
@pierce314159 and I discussed this while working #1681. SegArray
initializes with a grouping
property that is not currently being moved server side because GroupBy
does not exist on the server. However, most of the initialization for GroupBy is there within UniqueMsg
. I believe it would be worthwhile to look into moving GroupBy
to a server side object.
@pierce314159 do you have any further thoughts on this? @reuster986 do you have any input?
I think this is a good idea and definitely necessary to move segarray to the server as it stands today. I also want to link #1737 which is moving categorical to the server to aid in this transition
Sorry for the delayed response! Yes, in principle I support making GroupBy
a server-side object, with an important caveat: I think the pdarray attributes -- .permutation
, .segments
, and .unique_keys
-- should be accessible from Python and not opaque like the segments and bytes are for Strings
. Making these attributes opaque would break a ton of code, and I think it's worth the extra complexity of exposing them to the user.
@reuster986 - I completely agree. I believe there is a few ways that we could achieve this. One would be similar to how SegArray
currently functions (due to the object being server side and most functionality still on the client) where we init the object as if everything is on the server. This uses lazy initialization one the values
and segments
properties and adds a @property
function to access these from the server when called. This is only done once as the resulting pdarray
is cached. An alternative method would be to update the attrib
return from the server to return the create for the GroupBySymEntry
as well as the pdarray
s listed above. This is something I am also looking at as an option for SegArray.
I think I'm tracking with those two general approaches (although I'm hazy on the details). Would it make sense to standardize the way we handle all "complex" objects? I.e. make Strings
, SegArray
, and GroupBy
use the same general patterns for initialization and attribute access?
That would be my plan. As I am moving SegArray
to the server, I am brainstorming approaches. There are advantages/disadvantages to both avenues. Personally, I am in favor of sending back the details on the initial build to eliminate later server calls. With that said, I am 100% on board with standardizing. I am hoping to get some time with @pierce314159 to discuss the more intricate details today as I am currently working through some of this in SegArray. Once we do that, we can provide a detailed plan here for everyone to review before moving forward.
GroupBy object initial move to the server has been completed.