ibis icon indicating copy to clipboard operation
ibis copied to clipboard

docs: Case Statement Tutorial Lacks Clarity

Open ksuarez1423 opened this issue 2 years ago • 2 comments

In the following documentation section: https://ibis-project.org/docs/3.2.0/tutorial/04-More-Value-Expressions/#case-if-then-else-expressions, it is shown that if a case falls through all cases and an else, a NULL/None will be given as a result. However, to show this, there is first a statement with comprehensive cases, i.e. requiring no else(), having an else() that does nothing, followed by an else()-less expression that lacks one case. Thus, it can be difficult to see why removing else() has any effect, because it does nothing in the first expression, and would have in the second.

Alternatives could include lacking the Antarctica case in both expressions, using an else() to catch it in the first and offering no else() in the second. Regardless, as is, this is unclear.

ksuarez1423 avatar Sep 16 '22 19:09 ksuarez1423

I think I finally understand what you're saying:

...there is first a statement with comprehensive cases, i.e. requiring no else(), having an else() that does nothing...

you mean this one, where else_(...) does nothing/has no effect:

expr = (
    countries.continent.case()
    .when('AF', 'Africa')
    .when('AN', 'Antarctica')
    .when('AS', 'Asia')
    .when('EU', 'Europe')
    .when('NA', 'North America')
    .when('OC', 'Oceania')
    .when('SA', 'South America')
    .else_(countries.continent)
    .end()
    .name('continent_name')
)

expr.value_counts()

followed by an else()-less expression that lacks one case

you mean this one, that doesn't have an else_(...) AND that doesn't include the Antarctica case:

expr = (
    countries.continent.case()
    .when('AF', 'Africa')
    .when('AS', 'Asia')
    .when('EU', 'Europe')
    .when('NA', 'North America')
    .when('OC', 'Oceania')
    .when('SA', 'South America')
    .end()
    .name('continent_name_with_nulls')
)

expr.value_counts()

And what you're suggesting is to _remove the Antarctica case FROM THE FIRST statement so that else_(...) has some effect.

So you're suggesting the FIRST statement to look something like this:

expr = (
    countries.continent.case()
    .when('AF', 'Africa')
    # .when('AN', 'Antarctica')
    .when('AS', 'Asia')
    .when('EU', 'Europe')
    .when('NA', 'North America')
    .when('OC', 'Oceania')
    .when('SA', 'South America')
    .else_(countries.continent)
    .end()
    .name('continent_name')
)

expr.value_counts()

Tutorials are important, but for me the case statement tutorial seemed clear - although I admit I didn't pay much attention to what those statements returned, I mainly looked at the code snippet.

Side-note: given what Pyodide made possible and that ibis is pure python, IMO tutorials (at least for the pandas backend) should be also offered directly in the browser via https://github.com/jupyterlite or something similar -> like to be able to try it in 20 seconds... even from your phone.

ams1 avatar Sep 20 '22 05:09 ams1

Hey @ksuarez1423, I think this would be helpful any which way. Do you mind putting together a PR with the change in the notebook and get the CI build to pass?

saulpw avatar Sep 20 '22 20:09 saulpw

At least personally, reviewing outputs and the meanings thereof is important to me when consuming a tutorial.

I'll work on a PR in the next week, and figure out how I would want to fix it -- probably just removing Antartica from the first case, but I want to give the due diligence of sitting down and thinking about it.

ksuarez1423 avatar Sep 23 '22 17:09 ksuarez1423

Let's leave Antarctica in for now. It's possible the data have some other data in there that are "unknown" that need to be handled (I know they don't in this example, but in the real world this happens with great regularity).

That said, @ksuarez1423 if you want to put up a PR that adjusts to notebook to call attention to these issues we would happily accept it!

cpcloud avatar Jan 30 '23 15:01 cpcloud