ibis
ibis copied to clipboard
feat(repr): add `show_count` option to interactive repr
Resolves #10231
Doesn't have tests yet. I want to get feedback on the semantics before I do that. I assume CI is going to fail all over the place because all the docstrings etc are going to change.
CHANGELOG
- left-justified the dimensions per review suggestion
Currently looks like
import ibis
ibis.options.repr.interactive.max_rows = 3
ibis.options.repr.interactive.show_count
# True
t = ibis.memtable({"foo": range(1_000), "bar": range(1_000), "baz": range(1_000)})
t
# 3 cols by 1_000 rows
# ┏━━━━━━━┳━━━━━━━┳━━━━━━━┓
# ┃ foo ┃ bar ┃ baz ┃
# ┡━━━━━━━╇━━━━━━━╇━━━━━━━┩
# │ int64 │ int64 │ int64 │
# ├───────┼───────┼───────┤
# │ 0 │ 0 │ 0 │
# │ 1 │ 1 │ 1 │
# │ 2 │ 2 │ 2 │
# │ … │ … │ … │
# └───────┴───────┴───────┘
t.preview(show_count=False)
# 3 cols by … rows
# ┏━━━━━━━┳━━━━━━━┳━━━━━━━┓
# ┃ foo ┃ bar ┃ baz ┃
# ┡━━━━━━━╇━━━━━━━╇━━━━━━━┩
# │ int64 │ int64 │ int64 │
# ├───────┼───────┼───────┤
# │ 0 │ 0 │ 0 │
# │ 1 │ 1 │ 1 │
# │ 2 │ 2 │ 2 │
# │ … │ … │ … │
# └───────┴───────┴───────┘
t.foo
# 1_000
# rows
# ┏━━━━━━━┓
# ┃ foo ┃
# ┡━━━━━━━┩
# │ int64 │
# ├───────┤
# │ 0 │
# │ 1 │
# │ 2 │
# │ … │
# └───────┘
# t.foo.preview(show_count=False)
# … rows
# ┏━━━━━━━┓
# ┃ foo ┃
# ┡━━━━━━━┩
# │ int64 │
# ├───────┤
# │ 0 │
# │ 1 │
# │ 2 │
# │ … │
# └───────┘
I made the config default be show_count=True. I can change this with pushback, but this is what I would start with. I would love to brainstorm a few benchmark test cases to run to see the perf difference. Some ideas:
ibis.duckdb.connect("mydb.db").table("my_table")(I expect ~0 difference)ibis.duckdb.connect().read_parquet("t.pq")(I expect~0 difference)ibis.duckdb.connect().read_csv("thousand_rows.csv")(I expect small difference)ibis.duckdb.connect().read_csv("billion_rows.csv")(I expect large difference)ibis.duckdb.connect().read_csv("thousand_rows.csv").some_expensive computation()(I expect a difference, size depends on semantics of the function)
I considered adding a .repr_options attribute to expressions as described in https://github.com/ibis-project/ibis/issues/10231#issuecomment-2462961711, but I decided that was too complicated.
I considered showing the table name in the repr, eg with table.get_name(), but that is a separate question.
Currently, the option has the semantics of show_count: bool, and we ALWAYS show the column count. I considered other encodings such as show_shape: Literal["rows", "cols", "both", None], but I thought that was overkill.
I considered adding the row count to the bottom of the table, eg something like
┏━━━━━━━┓
┃ foo ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 0 │
│ 1 │
│ 2 │
│ … │
│ 1_000 total rows│
└───────┘
but then if you set a high max_rows it would be hard to see. Plus then this info would need to be repeated in every column. Anyway, if you have other ideas on the graphic design of where to present the counts, I'm all ears.
This looks fantastic!
I tried it out in a Jupyter kernel with a fairly wide table, which appears as a horizontally scrollable output using VSCode's interactive window. The dimensions appear but are center-justified, so the user has to scroll horizontally quite a bit to see them. I would suggest making it left-justified.
@NickCrews I'll try and fix this up and get it into 10.6
This didn't make it into 10.6, but it will be in the next release.
In 92922e3 (#10518) I added an optional feature I can remove if you don't like it: the first time you call to_rich_table(), if the show_count is False, we append a help message of " (set `ibis.options.repr.interactive.show_count=True` to show)" to the end of the "6 cols by ... rows"
@cpcloud if I get a +1 to this behavior and implementation, I can do the grunt work of transitioning all the docstrings.
If you know of a way to automate that migration for those few hundred failures, I'm all ears lol