xorbits icon indicating copy to clipboard operation
xorbits copied to clipboard

FEAT: how xorbits datastes export to json file

Open simplew2011 opened this issue 2 years ago • 5 comments

Is your feature request related to a problem? Please describe

  • how xorbits datastes export to json file
  • in current, only support to arrow format file

Describe the solution you'd like

  • Add an interface for converting to the huggingface dataset
  • Or add an interface for export xorbits dataset to json format

simplew2011 avatar Dec 13 '23 08:12 simplew2011

需要实现如下接口 xorbits.datasets.to_huggingface xorbits.datasets.Dataset.from_dataframe xorbits.datasets.export_json

simplew2011 avatar Dec 13 '23 11:12 simplew2011

需要向dataset.Dataset中新增一列用于记录中间值,如何处理,只看到__getitem__,没有实现__setitem__

simplew2011 avatar Dec 13 '23 11:12 simplew2011

dataset.Dataset如何进行过滤,

类似于huggingface.dataset:https://github.com/huggingface/datasets/blob/ef0f986518bd252c5314a7e3a419dedcbb166630/src/datasets/arrow_dataset.py#L5061

simplew2011 avatar Dec 14 '23 04:12 simplew2011

@codingl2k1 看下这个问题。

@simplew2011 你有兴趣来贡献吗?

qinxuye avatar Dec 15 '23 01:12 qinxuye

dataset.Dataset如何进行过滤,

类似于huggingface.dataset:https://github.com/huggingface/datasets/blob/ef0f986518bd252c5314a7e3a419dedcbb166630/src/datasets/arrow_dataset.py#L5061

Currently, xorbits dataframe can export the dataframe to csv, parquet, sql, and dataframe apply may be able to meet your needs. xorbits dataset can map data and convert the dataset to dataframe, but the filter is not implemented.

Could you provide some example code?

codingl2k1 avatar Dec 15 '23 03:12 codingl2k1