How to make a custom LeRobotDataset with v2?
Hi folks, thanks for the amazing open source work!
I am trying to make a custom dataset to use with the LeRobotDataset format.
The readme says to copy the example scripts here which I've done, and I have a working format script of my own.
https://github.com/huggingface/lerobot/blob/8e7d6970eaf5a64b8af6ec45586d201b8ca9ef16/README.md?plain=1#L323
but when it comes time to create the dataset, the push_dataset_to_hub.py uses LeRobotDataset.from_preloaded which is no longer supported in dataset V2
https://github.com/huggingface/lerobot/blob/8e7d6970eaf5a64b8af6ec45586d201b8ca9ef16/lerobot/scripts/push_dataset_to_hub.py#L216
So I'm just wondering what the proper way of loading your own custom local dataset is?
Thank you in advance for your help!
okay so I've found a work around for now. I initialize an empty dataset and add the frames to it, and then I can load it after calling dataset.consolidate(). If this is a proper way to do it, pls lmk and I'll make a PR with updates to the docs.
Otherwise please let me know what the right way to do this is. Thank you! I'll update this issue with my code once I've cleaned it up.
I encountered the same issue.
@aliberts i also got the same issue, the documentation on how to generate custom dataset is not up to date now (the code doesn't run anymore). could you please up the instruction and relevant scripts for custom dataset generation? thanks
Hey there,
Yes, all the push_to_hub script are deprecated in favor of scripts in examples/port_datasets (just one for now).
Basically, you need to create a new empty dataset using LeRobotDataset.create(), then add individual frames using add_frame(), then save the added frames into an episode using save_episode() (which actually saves data).
Then at the end you need to call the consolidate() method to handle a few more things (we will try to get rid of this step in future iterations) before finally calling the push_to_hub() method.
You can find more info about the changes of this new api in the PR (#461)
We will remove push_to_hub.py scripts in the future after adding more equivalent scripts like the one mentioned above in the examples section. Hope this helps!
Will update the Readme soon!
The following script is generated by AI Agent to help reproduce the issue:
# lerobot/reproduce.py
import os
import pytest
from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
def test_custom_lerobot_dataset():
try:
repo_id = "custom_repo"
hf_dataset = None # This should be replaced with actual dataset object
episode_data_index = None # This should be replaced with actual episode data index
info = None # This should be replaced with actual info
videos_dir = "/path/to/videos" # This should be replaced with actual videos directory
# Attempt to create a LeRobotDataset using the from_preloaded method
lerobot_dataset = LeRobotDataset.from_preloaded(
repo_id=repo_id,
hf_dataset=hf_dataset,
episode_data_index=episode_data_index,
info=info,
videos_dir=videos_dir,
)
raise AssertionError("Test failed: from_preloaded method did not throw an error as expected.")
except AttributeError as e:
raise AssertionError(e)
except Exception as e:
raise AssertionError(e)
if __name__ == "__main__":
test_custom_lerobot_dataset()
How to run:
python3 lerobot/reproduce.py
Expected Result:
Traceback (most recent call last):
File "lerobot/reproduce.py", line 14, in test_custom_lerobot_dataset
lerobot_dataset = LeRobotDataset.from_preloaded(
AttributeError: type object 'LeRobotDataset' has no attribute 'from_preloaded'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "lerobot/reproduce.py", line 28, in <module>
test_custom_lerobot_dataset()
File "lerobot/reproduce.py", line 23, in test_custom_lerobot_dataset
raise AssertionError(e)
AssertionError: type object 'LeRobotDataset' has no attribute 'from_preloaded'
Thank you for your valuable contribution to this project and we appreciate your feedback! Please respond with an emoji if you find this script helpful. Feel free to comment below if any improvements are needed.
Best regards from an AI Agent!
@aliberts, I stumbled upon this problem and could not find it in the readme? Could you make it more explicit?
can anybody give us the code to to do that with a static aloha dataset format
@AbdElrahmanMostafaRifaat1432, I have not used the aloha format but I created a script to convert datasets using the format from the diffusion policy codebase. THe process for the aloha format will be very similar, I believe.
@tlpss I really appreciate your response but I have found an easier solution, I took the script used by openpi to push aloha dataset
you can find it here: convert_aloha_to_lerobot
I wonder if there is a solution to convert data immediately locally instead of pushing it to hugging face and taking it again from hugging face. this would help me a lot actually
@AbdElrahmanMostafaRifaat1432, that is convenient!
I wonder if there is a solution to convert data immediately locally instead of pushing it to hugging face and taking it again from hugging face.
If you set this variable to False, the dataset will only be stored locally. Don't forget to set the LEROBOT_HOME env variable to the dir where you want to store the dataset.
@tlpss thanks for your response
but the problem is actually when I train the model it cannot see my local files where the data is stored
when I use the following script python lerobot/scripts/train.py --policy.type=pi0 --dataset.repo_id=Abdorifaat/basket --dataset.local_files_only=true
I have the following error
INFO 2025-02-10 03:21:53 ts/train.py:206 Creating dataset
Traceback (most recent call last):
File "/home/SSD1/abdelrahman_refaat/.conda/envs/rifaatrobotics/lib/python3.10/site-packages/lerobot/lerobot/scripts/train.py", line 565, in
could you try it please and tell me how to solve it
Hi all, I have release a script about the conversion from OpenX(RLDS) to LeRobotV2.1, see this.
You can easily modify the code to make a custom LeRobotDataset, by just changing the source and everything is done. 🥳
This issue has been automatically marked as stale because it has not had recent activity (6 months). It will be closed if no further activity occurs. Thank you for your contributions.