ibis
ibis copied to clipboard
docs: Case Statement Tutorial Lacks Clarity
In the following documentation section: https://ibis-project.org/docs/3.2.0/tutorial/04-More-Value-Expressions/#case-if-then-else-expressions, it is shown that if a case falls through all cases and an else, a NULL/None will be given as a result. However, to show this, there is first a statement with comprehensive cases, i.e. requiring no else(), having an else() that does nothing, followed by an else()-less expression that lacks one case. Thus, it can be difficult to see why removing else() has any effect, because it does nothing in the first expression, and would have in the second.
Alternatives could include lacking the Antarctica case in both expressions, using an else() to catch it in the first and offering no else() in the second. Regardless, as is, this is unclear.
I think I finally understand what you're saying:
...there is first a statement with comprehensive cases, i.e. requiring no else(), having an else() that does nothing...
you mean this one, where else_(...)
does nothing/has no effect:
expr = (
countries.continent.case()
.when('AF', 'Africa')
.when('AN', 'Antarctica')
.when('AS', 'Asia')
.when('EU', 'Europe')
.when('NA', 'North America')
.when('OC', 'Oceania')
.when('SA', 'South America')
.else_(countries.continent)
.end()
.name('continent_name')
)
expr.value_counts()
followed by an else()-less expression that lacks one case
you mean this one, that doesn't have an else_(...)
AND that doesn't include the Antarctica case:
expr = (
countries.continent.case()
.when('AF', 'Africa')
.when('AS', 'Asia')
.when('EU', 'Europe')
.when('NA', 'North America')
.when('OC', 'Oceania')
.when('SA', 'South America')
.end()
.name('continent_name_with_nulls')
)
expr.value_counts()
And what you're suggesting is to _remove the Antarctica case FROM THE FIRST statement so that else_(...)
has some effect.
So you're suggesting the FIRST statement to look something like this:
expr = (
countries.continent.case()
.when('AF', 'Africa')
# .when('AN', 'Antarctica')
.when('AS', 'Asia')
.when('EU', 'Europe')
.when('NA', 'North America')
.when('OC', 'Oceania')
.when('SA', 'South America')
.else_(countries.continent)
.end()
.name('continent_name')
)
expr.value_counts()
Tutorials are important, but for me the case statement tutorial seemed clear - although I admit I didn't pay much attention to what those statements returned, I mainly looked at the code snippet.
Side-note: given what Pyodide made possible and that ibis is pure python, IMO tutorials (at least for the pandas backend) should be also offered directly in the browser via https://github.com/jupyterlite or something similar -> like to be able to try it in 20 seconds... even from your phone.
Hey @ksuarez1423, I think this would be helpful any which way. Do you mind putting together a PR with the change in the notebook and get the CI build to pass?
At least personally, reviewing outputs and the meanings thereof is important to me when consuming a tutorial.
I'll work on a PR in the next week, and figure out how I would want to fix it -- probably just removing Antartica from the first case, but I want to give the due diligence of sitting down and thinking about it.
Let's leave Antarctica in for now. It's possible the data have some other data in there that are "unknown" that need to be handled (I know they don't in this example, but in the real world this happens with great regularity).
That said, @ksuarez1423 if you want to put up a PR that adjusts to notebook to call attention to these issues we would happily accept it!