[Feature Request] add load/save function in StaticDataset
Required prerequisites
- [x] I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
- [ ] Consider asking first in a Discussion.
Motivation
Requested for the Loong project: please add load and save functions for StaticDataset. I can see that currently the load function is enabled, please add a save function, either to local or huggingface hub. This helps Loong teams align with the dataset format. I am wondering if having something like load_dataset or load_loong_dataset as a standalone function to mimic the convention of huggingface is worth considering.
Please also take care of the JSON serialization problem.
Solution
Possible reference code: https://github.com/camel-ai/loong_private/blob/master/domain/logic/loong_logic/data.py
Alternatives
No response
Additional context
No response
hey @Lawhy the link is private or unavailable
Hi @JINO-ROHIT, thanks for mentioning that. I was thinking this should be tackled by someone in the Loong project. But anyway, I will share the relevant code here:
from camel.datasets import StaticDataset
from datasets import load_dataset
import json
def load_loong_dataset(dataset_path: str):
"""Load loong dataset.
Args:
dataset_path (str): Path to the dataset.
Returns:
StaticDataset: The loaded dataset.
"""
# Note that this will incur a problem like transforming `data_created` entry into a datetime object
return StaticDataset(load_dataset("json", data_files=dataset_path)["train"])
def save_loong_dataset(dataset: StaticDataset, dataset_path: str):
"""Save loong dataset.
Args:
dataset (StaticDataset): The dataset to save.
dataset_path (str): Path to save the dataset.
"""
with open(dataset_path, "w") as f:
for dp in dataset:
# somehow load_loong_dataset will make the string into datetime project, need to transform back
dp = dp.to_dict() if not isinstance(dp, dict) else dp
# TODO: to take care of serialisation problem
f.write(json.dumps(dp) + "\n")
oh okay alright, no worries then.
Hi @JINO-ROHIT. You are welcome to join the project as well if you are interested in it.