vortex icon indicating copy to clipboard operation
vortex copied to clipboard

Vortex should support reading from the Hugging Face Datasets API

Open danking opened this issue 1 month ago • 1 comments

For example, the following should Just Work.

import vortex as vx
url = "hf://datasets/danking00/statpopgen-benchmark/10000/vortex-file-compressed/gnomad.genomes.v3.1.2.hgdp_tgp.chr21.vortex"

f = vx.open(url)
arrays = list(f.scan())

array = vx.io.read_url(url)

This works fine for Parquet files:

import pyarrow.dataset as ds
import pyarrow.parquet as pq

table = pq.read_table("hf://datasets/danking00/statpopgen-benchmark/10000/parquet/gnomad.genomes.v3.1.2.hgdp_tgp.chr21.parquet")

dataset = ds.dataset(
    "hf://datasets/danking00/statpopgen-benchmark/10000/parquet/gnomad.genomes.v3.1.2.hgdp_tgp.chr21.parquet",
    format="parquet",
)
scanned = dataset.to_table()

See also

https://github.com/huggingface/datasets/issues/7863

danking avatar Nov 18 '25 15:11 danking

Ideally this would also work with Polars, DataFusion & DuckDB.

danking avatar Nov 18 '25 15:11 danking