LeoGrin

Results 11 issues of LeoGrin

Compute the min hash transform method in parallel, as suggested by @alexis-cvetkov. We no longer use the self.hash_dict attribute, so the fit method does nothing now.

I was able to quickly upload datasets in the past, but now they seem to stay stuck "in preparation". Checking all uploaded datasets, it seems that those created since 26/06/2022...

Fix #1756 Use the `namespace` argument of `Pinecone.from_exisiting_index` to set the default value of `namespace` for other methods. Leads to more expected behavior and easier integration in chains. For the...

# Quick summary Using the `namespace` argument in the function `Pinecone.from_existing_index` has no effect. Indeed, it is passed to `pinecone.Index`, which has no `namespace` argument. # Steps to reproduce a...

Improve the metric we use to threshold matches in `fuzzy_join`, to make it easier to tune for the user, and more correlated with actual matches. Right now, the `match_score` we...

enhancement

### Describe the bug I cannot run the Skrub examples using Jupyterlite (https://skrub-data.org/stable/lite/lab/) (#625 #633). It's stuck on the first cell I try to run, and the kernel status is...

bug

- Random search - Simple tests for the monitor function - joblib support (I think this won't work for memory monitoring, but should work for time monitoring. Would love feedback...

no changelog needed
benchmarks

### Describe the bug Using `deduplicate` on a list with not enough unique values makes it fail in an uninformative way. ### Steps/Code to Reproduce ``` from skrub import deduplicate...

bug

### Problem Description Trying to understand better why the GapEncoder can very slow (#342), I found that it is at its slowest when dealing with "id" columns, which contain a...

enhancement