ibis icon indicating copy to clipboard operation
ibis copied to clipboard

refactor: fix terminology used in internal operator APIs

Open cpcloud opened this issue 4 years ago • 4 comments

There are a number of places where the terminology of objects with respect to the tree structure in Ibis's internal API isn't correct, particularly around parent/child and root/non-root objects.

  1. root_tables: typically these are referring to leaf tables. The root of an expression tree is the output expression and the leaves are the data sources (tables, literals, etc). For example, in t.a + 1, the root is the + expression. This should be renamed to children.
    1. Similarly when constructing a new expression, it might make sense to rename the table Arg in operations to child to indicate the direction in which the expression tree is being built.
  2. parent: I haven't found a case in the library where parent is actually referring to the parent of any node, it's always referring to a child expression.

I think it's important to fix this so that it becomes easier to understand what the relationship between the objects inside of ibis is.

cpcloud avatar Sep 03 '21 21:09 cpcloud

@cpcloud thanks for opening this. Honestly this is one of the most confusing thing in ibis that "parent" refers to the "child" table (in the tree sense). And I spent quite long time to understand what root tables are so I agree this is an issue.

However, I am less excited about this change now because many of our variable naming, documentation, comments are around the existing root_tables and parent terminology and I am concerned about the potential confusion to change this now (and I don't know how other advanced user of ibis feel about this change).

Another point I'd like to make is that the root_tables and parent terminology does make sense in the sense that the parent is the table expr that the current expr is build on top of, so from a logical point of view, it feels more natural. The tree structure is reversed and forces people to think backwards.

IMHO it's more valuable to document things like root_tables than to change the name, I think the fact that there is no formal definition of what is root table is probably larger problem then the names themselves.

icexelloss avatar Sep 08 '21 19:09 icexelloss

@icexelloss I get what you're saying, but I'm a bit stuck on this bit:

the table expr that the current expr is build on top of

To me, the "on top of" invokes an image of something like this:

image

where join is on top of table1 and table2.

Would you agree that join is the parent of table1 and table2, and that those are the children of join?

The tree structure is reversed and forces people to think backwards.

Isn't this is something entirely different? You can't reverse the arrows and retain the same meaning of the words root and parent. Since ibis is producing trees, I think moving to industry-standard terminology at least with respect to trees is the way forward.

I think we should avoid incorrect and non-standard terminology where possible, even if it requires some annoying changes for developers (users aren't exposed to these APIs).

In the long run, I think this will reduce contributor friction because folks will not have to unlearn anything to understand the data structures in ibis.

I don't think this is high priority for the project, but I wanted to write it down before forgetting :sweat_smile:

cpcloud avatar Sep 08 '21 20:09 cpcloud

I wonder if there's a less disruptive way to execute this renaming? Maybe a short guide from a -> b in the contributing docs?

cpcloud avatar Sep 08 '21 20:09 cpcloud

Yeah I think what you saying is correct. Just concerns about the disruption that's all.

icexelloss avatar Sep 09 '21 13:09 icexelloss

I'm going to close this out for now. I'd like to address this at some point, but keeping an issue open for it is just noise.

cpcloud avatar Sep 18 '22 11:09 cpcloud