LeoGrin
LeoGrin
Compute the min hash transform method in parallel, as suggested by @alexis-cvetkov. We no longer use the self.hash_dict attribute, so the fit method does nothing now.
I was able to quickly upload datasets in the past, but now they seem to stay stuck "in preparation". Checking all uploaded datasets, it seems that those created since 26/06/2022...
Fix #1756 Use the `namespace` argument of `Pinecone.from_exisiting_index` to set the default value of `namespace` for other methods. Leads to more expected behavior and easier integration in chains. For the...
# Quick summary Using the `namespace` argument in the function `Pinecone.from_existing_index` has no effect. Indeed, it is passed to `pinecone.Index`, which has no `namespace` argument. # Steps to reproduce a...
Improve the metric we use to threshold matches in `fuzzy_join`, to make it easier to tune for the user, and more correlated with actual matches. Right now, the `match_score` we...
### Describe the bug I cannot run the Skrub examples using Jupyterlite (https://skrub-data.org/stable/lite/lab/) (#625 #633). It's stuck on the first cell I try to run, and the kernel status is...
- Random search - Simple tests for the monitor function - joblib support (I think this won't work for memory monitoring, but should work for time monitoring. Would love feedback...
### Describe the bug Using `deduplicate` on a list with not enough unique values makes it fail in an uninformative way. ### Steps/Code to Reproduce ``` from skrub import deduplicate...
### Problem Description Trying to understand better why the GapEncoder can very slow (#342), I found that it is at its slowest when dealing with "id" columns, which contain a...