MLDatasets.jl icon indicating copy to clipboard operation
MLDatasets.jl copied to clipboard

[Discussion] Moving to HuggingFace for some databases

Open Dsantra92 opened this issue 5 months ago • 2 comments

Some of the (graph) databases that we are trying to support might have either of the following problems:

  1. Hosted in university servers or a non-trusted source which cannot provide proper download speeds though out the globe.
  2. Datasets that aren't hosted anywhere and come with a license
  3. Datasets stored as python formats.

HuggingFace has now good set of community maintained graph datasets. If we come across any of these above issues for a dataset, we can try to add these datasets to HF and then pull from HF and then process as required. This I believe will largely reduce code for integrating and testing new datasets. I am not sure about the planned support for https://github.com/FluxML/HuggingFaceApi.jl but this seems to me like a better idea than relying on links that can fail without warning.

cc: @CarloLucibello

Dsantra92 avatar Aug 30 '24 16:08 Dsantra92