Results 396 comments of Jérôme Dockès

it should be "if 'fit' not in mode" as you can see on the last if statement -- there are other modes like "score" or even "transform" where there must...

so it would be something like ```python def _should_subsample(mode, environment): enable_subsampling = get_config()["enable_subsampling"] if "fit" not in mode: return False if enable_subsampling == "disable": return False if mode == "preview":...

this still doesn't plot regardless of the number of columns: ```python import skrub import polars as pl df = pl.DataFrame({f'col_{i}': list(range(10)) for i in range(50)}) skrub.TableReport(df, max_plot_columns=None).open() ``` IMO "no...

thanks @JulietteBgl! For distinguishing between the default and None I suggest adding some sentinel values like this in the config or utils module: ```python class Config(enum.Enum): max_plot_columns = enum.auto() max_association_columns...

Note ATM there is a circular import between the `_reporting` and `_config` modules, as just importing `_config` patches the pandas and polars displays, so the enum would probably have to...

> In general, is there a reason for using a threshold to decide which columns should use the missingness indicator, rather than using it everywhere? I am not familiar with...

> What do you think? @honghanhh @jeromedockes I agree with your point about the distinction between cleaning / removing poorly-filled columns and encoding / finding a good representation for columns...

thanks @MarieSacksick , can you share the full example? The error message I get does suggest .skb might be missing: ``` >>> import numpy as np >>> import pandas as...

> In the DataOps case instead, what I have in mind is a dataframe that is being modified, so I am just expecting to have access to the functions directly...

One thing that could be considered, if the attributes and methods added by skrub are accessed much more than those of the wrapped object, is to swap the roles, have...