Jérôme Dockès
Jérôme Dockès
yes: - a small change to be compatible with the current version of the tablereport (columns that match a filter must now be given by their indices not column names)...
yeah I think so; superseded by #1233
if the bug is in scikit-learn and this is not critical, could we just wait for it to be fix in scikit-learn without vendoring the htmlmixin class? also once it...
@glemaitre could you do a ``` git commit --allow-empty -m '[doc build]' && git push ``` so we can see the new TableVectorizer display in the examples? thanks!
thanks @MarieSacksick !!
the ken embedding tests download 864M of data and takes over a minute to run locally so we probably want to start with those. A good part of the data...
I am also wondering about the way files are downloaded: the remote file is opened with pyarrow, then chunks of it are loaded into pandas dataframes and written in separate...
probably the easiest is to have a `Hash` (no min) transformer that hashes the full entry with several seeds, then this can be aggregated with the 'min' operation in the...
thanks for reporting this bug. Indeed, InterpolationJoiner does not yet have support for polars, although that should be added soon. in the meanwhile it should be documented and provide a...
could there also be situations where this helps narrow down the nearest neighbor search and thus reduce computation & memory? in the example you give above we would only compute...