lerobot icon indicating copy to clipboard operation
lerobot copied to clipboard

How to make a custom LeRobotDataset with v2?

Open alik-git opened this issue 1 year ago • 13 comments

Hi folks, thanks for the amazing open source work!

I am trying to make a custom dataset to use with the LeRobotDataset format.

The readme says to copy the example scripts here which I've done, and I have a working format script of my own.

https://github.com/huggingface/lerobot/blob/8e7d6970eaf5a64b8af6ec45586d201b8ca9ef16/README.md?plain=1#L323

but when it comes time to create the dataset, the push_dataset_to_hub.py uses LeRobotDataset.from_preloaded which is no longer supported in dataset V2

https://github.com/huggingface/lerobot/blob/8e7d6970eaf5a64b8af6ec45586d201b8ca9ef16/lerobot/scripts/push_dataset_to_hub.py#L216

So I'm just wondering what the proper way of loading your own custom local dataset is?

Thank you in advance for your help!

alik-git avatar Dec 04 '24 08:12 alik-git

okay so I've found a work around for now. I initialize an empty dataset and add the frames to it, and then I can load it after calling dataset.consolidate(). If this is a proper way to do it, pls lmk and I'll make a PR with updates to the docs.

Otherwise please let me know what the right way to do this is. Thank you! I'll update this issue with my code once I've cleaned it up.

alik-git avatar Dec 04 '24 09:12 alik-git

I encountered the same issue.

Robert-hua avatar Dec 05 '24 02:12 Robert-hua

@aliberts i also got the same issue, the documentation on how to generate custom dataset is not up to date now (the code doesn't run anymore). could you please up the instruction and relevant scripts for custom dataset generation? thanks

taochenshh avatar Dec 08 '24 01:12 taochenshh

Hey there, Yes, all the push_to_hub script are deprecated in favor of scripts in examples/port_datasets (just one for now).

Basically, you need to create a new empty dataset using LeRobotDataset.create(), then add individual frames using add_frame(), then save the added frames into an episode using save_episode() (which actually saves data). Then at the end you need to call the consolidate() method to handle a few more things (we will try to get rid of this step in future iterations) before finally calling the push_to_hub() method.

You can find more info about the changes of this new api in the PR (#461)

We will remove push_to_hub.py scripts in the future after adding more equivalent scripts like the one mentioned above in the examples section. Hope this helps!

aliberts avatar Dec 11 '24 09:12 aliberts

Will update the Readme soon!

aliberts avatar Dec 11 '24 09:12 aliberts

The following script is generated by AI Agent to help reproduce the issue:

# lerobot/reproduce.py
import os
import pytest
from lerobot.common.datasets.lerobot_dataset import LeRobotDataset

def test_custom_lerobot_dataset():
    try:
        repo_id = "custom_repo"
        hf_dataset = None  # This should be replaced with actual dataset object
        episode_data_index = None  # This should be replaced with actual episode data index
        info = None  # This should be replaced with actual info
        videos_dir = "/path/to/videos"  # This should be replaced with actual videos directory

        # Attempt to create a LeRobotDataset using the from_preloaded method
        lerobot_dataset = LeRobotDataset.from_preloaded(
            repo_id=repo_id,
            hf_dataset=hf_dataset,
            episode_data_index=episode_data_index,
            info=info,
            videos_dir=videos_dir,
        )
        raise AssertionError("Test failed: from_preloaded method did not throw an error as expected.")
    except AttributeError as e:
        raise AssertionError(e)
    except Exception as e:
        raise AssertionError(e)

if __name__ == "__main__":
    test_custom_lerobot_dataset()

How to run:

python3 lerobot/reproduce.py

Expected Result:

Traceback (most recent call last):
  File "lerobot/reproduce.py", line 14, in test_custom_lerobot_dataset
    lerobot_dataset = LeRobotDataset.from_preloaded(
AttributeError: type object 'LeRobotDataset' has no attribute 'from_preloaded'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "lerobot/reproduce.py", line 28, in <module>
    test_custom_lerobot_dataset()
  File "lerobot/reproduce.py", line 23, in test_custom_lerobot_dataset
    raise AssertionError(e)
AssertionError: type object 'LeRobotDataset' has no attribute 'from_preloaded'

Thank you for your valuable contribution to this project and we appreciate your feedback! Please respond with an emoji if you find this script helpful. Feel free to comment below if any improvements are needed.

Best regards from an AI Agent!

reproduce-bot avatar Dec 26 '24 08:12 reproduce-bot

@aliberts, I stumbled upon this problem and could not find it in the readme? Could you make it more explicit?

tlpss avatar Feb 07 '25 13:02 tlpss

can anybody give us the code to to do that with a static aloha dataset format

@AbdElrahmanMostafaRifaat1432, I have not used the aloha format but I created a script to convert datasets using the format from the diffusion policy codebase. THe process for the aloha format will be very similar, I believe.

tlpss avatar Feb 09 '25 16:02 tlpss

@tlpss I really appreciate your response but I have found an easier solution, I took the script used by openpi to push aloha dataset

you can find it here: convert_aloha_to_lerobot

I wonder if there is a solution to convert data immediately locally instead of pushing it to hugging face and taking it again from hugging face. this would help me a lot actually

@AbdElrahmanMostafaRifaat1432, that is convenient!

I wonder if there is a solution to convert data immediately locally instead of pushing it to hugging face and taking it again from hugging face.

If you set this variable to False, the dataset will only be stored locally. Don't forget to set the LEROBOT_HOME env variable to the dir where you want to store the dataset.

tlpss avatar Feb 09 '25 17:02 tlpss

@tlpss thanks for your response

but the problem is actually when I train the model it cannot see my local files where the data is stored

when I use the following script python lerobot/scripts/train.py --policy.type=pi0 --dataset.repo_id=Abdorifaat/basket --dataset.local_files_only=true

I have the following error

INFO 2025-02-10 03:21:53 ts/train.py:206 Creating dataset Traceback (most recent call last): File "/home/SSD1/abdelrahman_refaat/.conda/envs/rifaatrobotics/lib/python3.10/site-packages/lerobot/lerobot/scripts/train.py", line 565, in train() File "/home/SSD1/abdelrahman_refaat/.conda/envs/rifaatrobotics/lib/python3.10/site-packages/lerobot/lerobot/configs/parser.py", line 120, in wrapper_inner response = fn(cfg, *args, **kwargs) File "/home/SSD1/abdelrahman_refaat/.conda/envs/rifaatrobotics/lib/python3.10/site-packages/lerobot/lerobot/scripts/train.py", line 207, in train offline_dataset = make_dataset(cfg) File "/home/SSD1/abdelrahman_refaat/.conda/envs/rifaatrobotics/lib/python3.10/site-packages/lerobot/lerobot/common/datasets/factory.py", line 86, in make_dataset ds_meta = LeRobotDatasetMetadata(cfg.dataset.repo_id, local_files_only=cfg.dataset.local_files_only,root=cfg.dataset.root_dir) File "/home/SSD1/abdelrahman_refaat/.conda/envs/rifaatrobotics/lib/python3.10/site-packages/lerobot/lerobot/common/datasets/lerobot_dataset.py", line 87, in init self.pull_from_repo(allow_patterns="meta/") File "/home/SSD1/abdelrahman_refaat/.conda/envs/rifaatrobotics/lib/python3.10/site-packages/lerobot/lerobot/common/datasets/lerobot_dataset.py", line 98, in pull_from_repo snapshot_download( File "/home/SSD1/abdelrahman_refaat/.conda/envs/rifaatrobotics/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn return fn(*args, **kwargs) File "/home/SSD1/abdelrahman_refaat/.conda/envs/rifaatrobotics/lib/python3.10/site-packages/huggingface_hub/_snapshot_download.py", line 216, in snapshot_download raise LocalEntryNotFoundError( huggingface_hub.errors.LocalEntryNotFoundError: Cannot find an appropriate cached snapshot folder for the specified revision on the local disk and outgoing traffic has been disabled. To enable repo look-ups and downloads online, pass 'local_files_only=False' as input.

could you try it please and tell me how to solve it

Hi all, I have release a script about the conversion from OpenX(RLDS) to LeRobotV2.1, see this.

You can easily modify the code to make a custom LeRobotDataset, by just changing the source and everything is done. 🥳

Tavish9 avatar Feb 19 '25 10:02 Tavish9

This issue has been automatically marked as stale because it has not had recent activity (6 months). It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Oct 05 '25 02:10 github-actions[bot]