lerobot icon indicating copy to clipboard operation
lerobot copied to clipboard

How to merge multiple recorded datasets?

Open mydhui opened this issue 1 year ago • 8 comments

Hi, Thank you so much for the automatic resume during data recording,sometimes ubstable camera issues or other situations (e.g. do not have enough time to finish recording) might cause process stopping.

I was wondering is there anyway to merge multiple recorded datasets? for instance I have two datasets 'cube grabbing' and 'cylinder grabbing' which were both recorded 50 episodes each and in the save environment, do you have tutorial about how to merge them into a 100-episode larger datasets?

BTW, another reason for merging datasets is because storage usage is extremely high before video encoding, and record large datasets at once can be limited by storage. but merge several encoded datasets can mitigate this problem.

Thanks

mydhui avatar Nov 28 '24 01:11 mydhui

For me,I simply put the episodes together and run dataset.consolidate()

Vacuame avatar Dec 02 '24 14:12 Vacuame

@Vacuame

For me,I simply put the episodes together and run dataset.consolidate()

I searched the repo for "consolidate" and can't find this. Can you be more specific? :)

philipbutler avatar Dec 13 '24 21:12 philipbutler

@mydhui I looked into lerobot/common/datasets/lerobot_dataset.py and see there's MultiLeRobotDataset(torch.utils.data.Dataset) class that can be instantiated with multiple LeRobot dataset repo IDs. Want to give that a try?

philipbutler avatar Dec 13 '24 21:12 philipbutler

Something simple like this should work from lerobot.common.datasets.lerobot_dataset import MultiLeRobotDataset

multi_dataset = MultiLeRobotDataset( repo_ids=["dataset1_repo_id", "dataset2_repo_id"], split="train", image_transforms=None,
delta_timestamps=None
)

srik901 avatar Dec 16 '24 01:12 srik901

@philipbutler

For me,I simply put the episodes together and run dataset.consolidate()

I searched the repo for "consolidate" and can't find this. Can you be more specific? :)

search "dataset.consolidate" in control_robot.py

Vacuame avatar Dec 25 '24 12:12 Vacuame

@vacuame I have no idea how I missed that, thanks

philipbutler avatar Dec 25 '24 20:12 philipbutler

@Vacuame I have no idea how I missed that, thanks

Now i think MultiLeRobotDataset is the true solution, but i don't know how to use it

Vacuame avatar Dec 27 '24 09:12 Vacuame

Hello, I also struggled with this issue for a long time, but I've resolved it. The solution is efficient and verified to work correctly. You can use the merge.py script from my PR (#924) to solve your problem. I hope this helps you, and wish you all the best.

TangGuohh avatar Apr 01 '25 07:04 TangGuohh

@TangGuohh unfortunately it doesn't work

richardrl avatar Sep 19 '25 11:09 richardrl

Resolved: https://github.com/search?q=repo%3Ahuggingface%2Flerobot%20MultiLeRobotDataset&type=code

thanks to HF team for releasing this!

richardrl avatar Sep 19 '25 12:09 richardrl

We will finish more dataset editing tools soon, which will then also be available from the hub

pkooij avatar Oct 08 '25 08:10 pkooij