LeoGrin issues

Results 11 issues of


                                            LeoGrin

Min hash parallel

Compute the min hash transform method in parallel, as suggested by @alexis-cvetkov. We no longer use the self.hash_dict attribute, so the fit method does nothing now.

Datasets created since 26/06 are still in preparation

I was able to quickly upload datasets in the past, but now they seem to stay stuck "in preparation". Checking all uploaded datasets, it seems that those created since 26/06/2022...

use namespace argument in Pinecone constructor

Fix #1756 Use the `namespace` argument of `Pinecone.from_exisiting_index` to set the default value of `namespace` for other methods. Leads to more expected behavior and easier integration in chains. For the...

namespace argument not taken into account when creating Pinecone index

# Quick summary Using the `namespace` argument in the function `Pinecone.from_existing_index` has no effect. Indeed, it is passed to `pinecone.Index`, which has no `namespace` argument. # Steps to reproduce a...

Better threshold metric for fuzzy_join

Improve the metric we use to threshold matches in `fuzzy_join`, to make it easier to tune for the user, and more correlated with actual matches. Right now, the `match_score` we...

enhancement

Jupyterlite kernel fails to launch

### Describe the bug I cannot run the Skrub examples using Jupyterlite (https://skrub-data.org/stable/lite/lab/) (#625 #633). It's stuck on the first cell I try to run, and the kernel status is...

bug

Benchmark improvements

- Random search - Simple tests for the monitor function - joblib support (I think this won't work for memory monitoring, but should work for time monitoring. Would love feedback...

no changelog needed

benchmarks

LeoGrin