Frédéric Branchaud-Charron
Frédéric Branchaud-Charron
Example using Cuthill-Mckee algorithm. I put a threshold at .05, it seems to help. ```python from scipy.sparse import csr_matrix from scipy.sparse.csgraph import reverse_cuthill_mckee graph = csr_matrix(CONFUSION_MATRIX > 0.05) output =...
Unfortunatly, prettier messes up admonitions: ```bash !!! info "How to temporarily disable pre-commits?" - For a specific commit, you can avoid running the pre-commits with the flag `--no-verify`. If you...
Will probably get to it soon. I can't assign myself, but we could add the `self-assign` action from HuggingFace https://github.com/huggingface/datasets/blob/master/.github/workflows/self-assign.yaml
@christyler3030 I'm not an expert on NGINX, do you have some pointers? I'm not sure how our users will be able to enter "their own" backend. Assuming a proxy file:...
From our discussions, we are trying to solve two problems: 1. Map resources (Ex: local => `http://localhost:8091`, remote => `mycoolbackend.com`) 2. Use NGINX to handle HTTPS, malformed requests, probing attack...
Hello, could we leverage [`pandas.read_sql`](https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html) for this? This would be basically the same as [`CSVBuilder`](https://github.com/huggingface/datasets/blob/7380140accf522a4363bb56c0b77a4190f49bed6/src/datasets/packaged_modules/csv/csv.py#L127) , but uses `pandas.read_sql(..., chunksize=1)` instead of `pandas.read_csv(..., iterator=True)` I'm happy to work on this...
Hello, I'm also interested in this feature. Has there been progress on this issue? Could we use a similar trick as above, but with a better hashing algorithm like SHA?...
For reference, we can get a solution fairly easily if we assume that we can hold in memory all unique values. ```python from datasets import Dataset from itertools import cycle...
Ah CI runs with `pandas=1.3.5` which doesn't return the number of row inserted.
@lhoestq I'm getting error in integration tests, not sure if it's related to my PR. Any help would be appreciated :) ``` if not self._is_valid_token(token): > raise ValueError("Invalid token passed!")...