refactor: deprecate `NA`/`fillna`/`dropna` in favor of name that uses `null` instead of `na`
Ibis has a few APIs that make use of the name "na".
-
Table.dropnafor dropping rows containing null values -
Table.fillnafor replacing null values with specific values -
ibis.NAas a null-value expression singleton
Nowhere else in ibis do we make use of the term "NA". Also, since some backends (e.g. pandas) conflate NaN and NULL, some users may expect Table.fillna to also replace NaN values in all backends (it does not). IMO we should deprecate these APIs in favor of ones that use the word "null" instead.
A few options:
Table.dropna
-
Table.dropnull? -
Table.drop_null? -
Table.drop_nulls? (this is the namepolarsuses)
My preference would be for Table.drop_null.
Table.fillna
-
Table.fillnull? -
Table.fill_null? (this is the namepolarsuses) -
Table.fill_nulls? -
Table.ifnull? (since we haveColumn.nullif)
My preference would be for Table.fill_null.
ibis.NA
This one I'm tempted to remove entirely.
- None should be valid as an input everywhere
ibis.NAis - There's always
ibis.null()for the rare cases where you need to construct a null value expression - If we still want a constant null, we could name it
ibis.NULLinstead?
I am definitely in favor of removing ibis.NA and moving away from the na name in the fill_ and drop_ method.
I'm good with drop_null and fill_null -- I am not on board with having one of them be plural and the other singular.
related: #7147
I agree with everything you have here. I would lean towards fillnull and dropnull, there is plenty of existing precedent within ibis for names without underscores, and it is easier to type and readable enough. But if you went with underscores I wouldn't be furious :)
+1 on consistency with plurals, no strong opinions on the separating underscore. I do find drop_null ever so slightly easier to read, but I'd also be okay with dropnull. Whoever implements this can have the final say :)
I'm also in favor of the deprecations and have a slight preference towards underscore-less names.
Folks, looking to get started with this. Couple of question on how to go about a deprecation cycle.
- Do we basically mark the existing API with de
@deprecated(as_of="10.0", instead="use new api message")and copy the behavior with the new names? - Do we duplicate the testing portion too?
- I think so, yeah
- In the past we haven't duplicated the tests (just ported them to use the new APIs), but then added a single expression-level test (in
ibis/tests/expr/...) that checks the expressions are equivalent and the deprecated one raises a warning.