Sylvain Lesage
Sylvain Lesage
See https://huggingface.slack.com/archives/C039P47V1L5/p1713172703779839 > Am I correct in assuming that if you specify a "config" in a dataset, only the given config is downloaded, but if you specify a split, all...
It was useful to do management of the NFS, then of the EFS. But now, it only accesses the parquet metadata EFS. Not sure we need it anymore, we can...
Reported here: https://huggingface.co/datasets/Asap7772/persona_gpt4_paired_margin1_allsplit/discussions/1#66187c367d8c2cd6fbce4a19 Error is: ``` {"error": "Could not read the parquet files: Some index in row_group_indices is 0, which is either < 0 or >= num_row_groups(0)"} ``` I think...
It would remove a single point of failure. Also: we could have one healthcheck per service (API, admin). Currently the `ELB-HealthChecker/2.0` healthcheck is responded by the API (/healthcheck), and it...
We need to change: - here - in moon-landing - in datasets? - in blog posts, observable, notion, google colabs?...
Now that we call the project "Dataset viewer", we should make it clear in the docs and READMEs that: - that the project powers the Hub's dataset viewer - people...
Today, it's 8. Let's try increasing it and see if it speeds up the backfill job. The current throughput is 577 datasets/minute.
From https://huggingface.slack.com/archives/C04HZ32QV17/p1711698013265029 > Is there a time-out for webhooks? I take about 1 min to respond and I'm getting 500. > yes 30 seconds > you should respond immediately and...
See the `schema` column on https://huggingface.co/datasets/motherduckdb/duckdb-text2sql-25k. Clicking on any of the 'classes' leads to an error The erroneous URL is: https://datasets-server.huggingface.co/filter?dataset=motherduckdb%2Fduckdb-text2sql-25k&config=default&split=train&offset=0&length=100&where=schema%3D%27CREATE+TABLE+%22venue%22+%28%0A++%22venueId%22+INTEGER+NOT+NULL%2C%0A++%22venueName%22+VARCHAR%28100%29%2C%0A++%22venueInfo%22+JSON%2C%0A++PRIMARY+KEY+%28%22venueId%22%29%0A%29%3B%0A%0ACREATE+TABLE+%22author%22+%28%0A++%22authorId%22+INTEGER+NOT+NULL%2C%0A++%22authorName%22+VARCHAR%2850%29%2C%0A++%22authorPublications%22+INT%5B%5D%2C%0A++PRIMARY+KEY+%28%22authorId%22%29%0A%29%3B%0A%0ACREATE+TABLE+%22dataset%22+%28%0A++%22datasetId%22+INTEGER+NOT+NULL%2C%0A++%22datasetName%22+VARCHAR%2850%29%2C%0A++%22datasetInfo%22+STRUCT%28v+VARCHAR%2C+i+INTEGER%29%2C%0A++PRIMARY+KEY+%28%22datasetId%22%29%0A%29%3B%0A%0ACREATE+TABLE+%22journal%22+%28%0A++%22journalId%22+INTEGER+NOT+NULL%2C%0A++%22journalName%22+VARCHAR%28100%29%2C%0A++%22journalInfo%22+MAP%28INT%2C+DOUBLE%29%2C%0A++PRIMARY+KEY+%28%22journalId%22%29%0A%29%3B%0A%0ACREATE+TABLE+%22keyphrase%22+%28%0A++%22keyphraseId%22+INTEGER+NOT+NULL%2C%0A++%22keyphraseName%22+VARCHAR%2850%29%2C%0A++%22keyphraseInfo%22+VARCHAR%2850%29%5B%5D%2C%0A++PRIMARY+KEY+%28%22keyphraseId%22%29%0A%29%3B%0A%0ACREATE+TABLE+%22paper%22+%28%0A++%22paperId%22+INTEGER+NOT+NULL%2C%0A++%22title%22+VARCHAR%28300%29%2C%0A++%22venueId%22+INTEGER%2C%0A++%22year%22+INTEGER%2C%0A++%22numCiting%22+INTEGER%2C%0A++%22numCitedBy%22+INTEGER%2C%0A++%22journalId%22+INTEGER%2C%0A++%22paperInfo%22+UNION%28num+INT%2C+str+VARCHAR%29%2C%0A++PRIMARY+KEY+%28%22paperId%22%29%2C%0A++FOREIGN+KEY%28%22journalId%22%29+REFERENCES+%22journal%22%28%22journalId%22%29%2C%0A++FOREIGN+KEY%28%22venueId%22%29+REFERENCES+%22venue%22%28%22venueId%22%29%0A%29%3B%0A%0ACREATE+TABLE+%22cite%22+%28%0A++%22citingPaperId%22+INTEGER+NOT+NULL%2C%0A++%22citedPaperId%22+INTEGER+NOT+NULL%2C%0A++%22citeInfo%22+INT%5B%5D%2C%0A++PRIMARY+KEY+%28%22citingPaperId%22%2C%22citedPaperId%22%29%2C%0A++FOREIGN+KEY%28%22citedpaperId%22%29+REFERENCES+%22paper%22%28%22paperId%22%29%2C%0A++FOREIGN+KEY%28%22citingpaperId%22%29+REFERENCES+%22paper%22%28%22paperId%22%29%0A%29%3B%0A%0ACREATE+TABLE+%22paperDataset%22+%28%0A++%22paperId%22+INTEGER%2C%0A++%22datasetId%22+INTEGER%2C%0A++%22paperDatasetInfo%22+JSON%2C%0A++PRIMARY+KEY+%28%22datasetId%22%2C+%22paperId%22%29%0A%29%3B%0A%0ACREATE+TABLE+%22paperKeyphrase%22+%28%0A++%22paperId%22+INTEGER%2C%0A++%22keyphraseId%22+INTEGER%2C%0A++%22paperKeyphraseInfo%22+JSON%2C%0A++PRIMARY+KEY+%28%22keyphraseId%22%2C%22paperId%22%29%2C%0A++FOREIGN+KEY%28%22paperId%22%29+REFERENCES+%22paper%22%28%22paperId%22%29%2C%0A++FOREIGN+KEY%28%22keyphraseId%22%29+REFERENCES+%22keyphrase%22%28%22keyphraseId%22%29%0A%29%3B%0A%0ACREATE+TABLE+%22writes%22+%28%0A++%22paperId%22+INTEGER%2C%0A++%22authorId%22+INTEGER%2C%0A++%22writesInfo%22+JSON%2C%0A++PRIMARY+KEY+%28%22paperId%22%2C%22authorId%22%29%2C%0A++FOREIGN+KEY%28%22paperId%22%29+REFERENCES+%22paper%22%28%22paperId%22%29%2C%0A++FOREIGN+KEY%28%22authorId%22%29+REFERENCES+%22author%22%28%22authorId%22%29%0A%29%3B%27 ```json {"error":"Parameter 'where' contains invalid symbols"} ``` It's because...
Requires https://github.com/huggingface/datasets/issues/6438, to support GeoParquet. We could support more formats. Possibly requires geopandas as a dependency.