ibis icon indicating copy to clipboard operation
ibis copied to clipboard

feat(repr): add `show_count` option to interactive repr

Open NickCrews opened this issue 1 year ago • 2 comments

Resolves #10231

Doesn't have tests yet. I want to get feedback on the semantics before I do that. I assume CI is going to fail all over the place because all the docstrings etc are going to change.

CHANGELOG

  1. left-justified the dimensions per review suggestion

Currently looks like

import ibis

ibis.options.repr.interactive.max_rows = 3
ibis.options.repr.interactive.show_count
# True

t = ibis.memtable({"foo": range(1_000), "bar": range(1_000), "baz": range(1_000)})
t
# 3 cols by 1_000 rows
# ┏━━━━━━━┳━━━━━━━┳━━━━━━━┓
# ┃ foo   ┃ bar   ┃ baz   ┃
# ┡━━━━━━━╇━━━━━━━╇━━━━━━━┩
# │ int64 │ int64 │ int64 │
# ├───────┼───────┼───────┤
# │     0 │     0 │     0 │
# │     1 │     1 │     1 │
# │     2 │     2 │     2 │
# │     … │     … │     … │
# └───────┴───────┴───────┘
t.preview(show_count=False)
# 3 cols by … rows
# ┏━━━━━━━┳━━━━━━━┳━━━━━━━┓
# ┃ foo   ┃ bar   ┃ baz   ┃
# ┡━━━━━━━╇━━━━━━━╇━━━━━━━┩
# │ int64 │ int64 │ int64 │
# ├───────┼───────┼───────┤
# │     0 │     0 │     0 │
# │     1 │     1 │     1 │
# │     2 │     2 │     2 │
# │     … │     … │     … │
# └───────┴───────┴───────┘
t.foo
# 1_000
# rows
# ┏━━━━━━━┓
# ┃ foo   ┃
# ┡━━━━━━━┩
# │ int64 │
# ├───────┤
# │     0 │
# │     1 │
# │     2 │
# │     … │
# └───────┘
# t.foo.preview(show_count=False)
# … rows
# ┏━━━━━━━┓
# ┃ foo   ┃
# ┡━━━━━━━┩
# │ int64 │
# ├───────┤
# │     0 │
# │     1 │
# │     2 │
# │     … │
# └───────┘

I made the config default be show_count=True. I can change this with pushback, but this is what I would start with. I would love to brainstorm a few benchmark test cases to run to see the perf difference. Some ideas:

  1. ibis.duckdb.connect("mydb.db").table("my_table") (I expect ~0 difference)
  2. ibis.duckdb.connect().read_parquet("t.pq") (I expect~0 difference)
  3. ibis.duckdb.connect().read_csv("thousand_rows.csv") (I expect small difference)
  4. ibis.duckdb.connect().read_csv("billion_rows.csv") (I expect large difference)
  5. ibis.duckdb.connect().read_csv("thousand_rows.csv").some_expensive computation() (I expect a difference, size depends on semantics of the function)

I considered adding a .repr_options attribute to expressions as described in https://github.com/ibis-project/ibis/issues/10231#issuecomment-2462961711, but I decided that was too complicated.

I considered showing the table name in the repr, eg with table.get_name(), but that is a separate question.

Currently, the option has the semantics of show_count: bool, and we ALWAYS show the column count. I considered other encodings such as show_shape: Literal["rows", "cols", "both", None], but I thought that was overkill.

I considered adding the row count to the bottom of the table, eg something like

┏━━━━━━━┓
┃ foo   ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     0 │
│     1 │
│     2 │
│     … │
│ 1_000 total rows│
└───────┘

but then if you set a high max_rows it would be hard to see. Plus then this info would need to be repeated in every column. Anyway, if you have other ideas on the graphic design of where to present the counts, I'm all ears.

NickCrews avatar Nov 21 '24 15:11 NickCrews

This looks fantastic!

I tried it out in a Jupyter kernel with a fairly wide table, which appears as a horizontally scrollable output using VSCode's interactive window. The dimensions appear but are center-justified, so the user has to scroll horizontally quite a bit to see them. I would suggest making it left-justified.

lboller-pwbm avatar Nov 21 '24 18:11 lboller-pwbm

@NickCrews I'll try and fix this up and get it into 10.6

cpcloud avatar Jun 15 '25 10:06 cpcloud

This didn't make it into 10.6, but it will be in the next release.

cpcloud avatar Jun 25 '25 16:06 cpcloud

In 92922e3 (#10518) I added an optional feature I can remove if you don't like it: the first time you call to_rich_table(), if the show_count is False, we append a help message of " (set `ibis.options.repr.interactive.show_count=True` to show)" to the end of the "6 cols by ... rows"

NickCrews avatar Oct 17 '25 04:10 NickCrews

@cpcloud if I get a +1 to this behavior and implementation, I can do the grunt work of transitioning all the docstrings.

If you know of a way to automate that migration for those few hundred failures, I'm all ears lol

NickCrews avatar Oct 17 '25 04:10 NickCrews