skrub icon indicating copy to clipboard operation
skrub copied to clipboard

Discussion: apply column filter to similarity table in TableReport

Open jeromedockes opened this issue 1 year ago • 8 comments

As @lesteve rightly pointed out, the fact that the column filter doesn't apply to the table of column similarities is surprising. This PR filters the similarity table: the row containing the similarity of column A and column B only appears if either A or B matches the column filter. There is also a special case for the "Columns with high similarity" filter: only rows corresponding to the high similarities appear

I also noted in the code there is a mix of "interactions", "associations" and "similarities" as our preferred term for this has evolved, I'll address that in a separate PR

jeromedockes avatar Jul 24 '24 06:07 jeromedockes

I think that I would prefer graying out the selector on this tab than applying it. The reason is that I can see someone moving to the tab and not seeing columns that are highly similar

On Jul 24, 2024, 08:00, at 08:00, "Jérôme Dockès" @.***> wrote:

As @lesteve rightly pointed out, the fact that the column filter doesn't apply to the table of column similarities is surprising. This PR filters the similarity table: the row containing the similarity of column A and column B only appears if either A or B matches the column filter. There is also a special case for the "Columns with high similarity" filter: only rows corresponding to the high similarities appear

I also noted in the code there is a mix of "interactions", "associations" and "similarities" as our preferred term for this has evolved, I'll address that in a separate PR You can view, comment on, or merge this pull request online at:

https://github.com/skrub-data/skrub/pull/1009

-- Commit Summary --

  • apply column filter to similarity table

-- File Changes --

M skrub/_reporting/_data/templates/dataframe-columns.css (2) M skrub/_reporting/_data/templates/dataframe-interactions.html (93) M skrub/_reporting/_data/templates/no-filter-matches.html (4) M skrub/_reporting/_data/templates/skrub-report.js (46) M skrub/_reporting/_summarize.py (2) M skrub/_reporting/js_tests/cypress/e2e/column-filters.cy.js (14)

-- Patch Links --

https://github.com/skrub-data/skrub/pull/1009.patch https://github.com/skrub-data/skrub/pull/1009.diff

-- Reply to this email directly or view it on GitHub: https://github.com/skrub-data/skrub/pull/1009 You are receiving this because you are subscribed to this thread.

Message ID: @.***>

GaelVaroquaux avatar Jul 24 '24 06:07 GaelVaroquaux

@GaelVaroquaux is this what you have in mind?

I'm not sure which I like better, from the user's point of view it's not obvious why the filter they chose would be deactivated when they visit this tab.

OTOH applying the filter isn't super useful, I don't really see a use case for "all similarities involving a numeric column" for example

not seeing columns that are highly similar

I see what you mean but arguably that's precisely the goal of the filter: not to see columns that don't match it. so it might make sense to have it behave consistently across all 3 tabs. Note the default is to show all columns and this only changes if the user manually goes and selects something else.

maybe yet another option could be to gray out the columns that don't match the filter. I'm not sure I like it either: it might not be immediately obvious why some cells are grayed out, and we lose the benefit of saving space by not showing them

:thinking:

jeromedockes avatar Jul 24 '24 08:07 jeromedockes

In the future we might add more tabs for which filtering columns doesn't make sense; that would be an argument for the "gray-out rather than apply" option.

jeromedockes avatar Jul 24 '24 08:07 jeromedockes

also, instead of graying it out we could hide it altogether, that might hint better that it doesn't apply (rather than is disabled), and also if users don't see the dropdown they are less likely to notice it and wonder why it's disabled

jeromedockes avatar Jul 24 '24 08:07 jeromedockes

instead of graying it out we could hide it altogether

you can check #1011 to see what that option would look like

here is the example.

I think this is my preferred option of the 3. WDYT @GaelVaroquaux ?

jeromedockes avatar Jul 24 '24 08:07 jeromedockes

Personally as a very naive user that clicks around and see what happens, I agree that not having a dialog box at all (rather than one that is greyed out) seems the best option, i.e. https://github.com/skrub-data/skrub/pull/1011

lesteve avatar Jul 24 '24 14:07 lesteve

I like the option of having the filter, it could be useful for (i) users that want the same columns they were seeing in the "Table preview" and "Column summaries" tabs when they go to "Column similarities" and (ii) users that have dataframes with many columns. If we're concerned about a user missing very similar columns, we could add a "beware: some very similar columns are not shown here" button

If we go with the other option of disabling this though, I agree with @lesteve that I'd rather have nothing than something greyed out

TheooJ avatar Jul 24 '24 15:07 TheooJ

thanks for following up! ok let's merge the one that completely hides the dialog for now, and I'll leave this one open in case we want to discuss more / revisit when we add some more useful filters

jeromedockes avatar Jul 24 '24 15:07 jeromedockes

it seems we're fine with the current status (hiding the select when in this tab) so i'll close this issue

jeromedockes avatar Oct 23 '24 12:10 jeromedockes