lerobot
lerobot copied to clipboard
How to merge multiple recorded datasets?
Hi, Thank you so much for the automatic resume during data recordingļ¼sometimes ubstable camera issues or other situations (e.g. do not have enough time to finish recording) might cause process stopping.
I was wondering is there anyway to merge multiple recorded datasets? for instance I have two datasets 'cube grabbing' and 'cylinder grabbing' which were both recorded 50 episodes each and in the save environment, do you have tutorial about how to merge them into a 100-episode larger datasets?
BTW, another reason for merging datasets is because storage usage is extremely high before video encoding, and record large datasets at once can be limited by storage. but merge several encoded datasets can mitigate this problem.
Thanks
For me,I simply put the episodes together and run dataset.consolidate()
@Vacuame
For me,I simply put the episodes together and run dataset.consolidate()
I searched the repo for "consolidate" and can't find this. Can you be more specific? :)
@mydhui I looked into lerobot/common/datasets/lerobot_dataset.py and see there's MultiLeRobotDataset(torch.utils.data.Dataset) class that can be instantiated with multiple LeRobot dataset repo IDs.
Want to give that a try?
Something simple like this should work from lerobot.common.datasets.lerobot_dataset import MultiLeRobotDataset
multi_dataset = MultiLeRobotDataset(
repo_ids=["dataset1_repo_id", "dataset2_repo_id"],
split="train",
image_transforms=None,
delta_timestamps=None
)
@philipbutler
For me,I simply put the episodes together and run dataset.consolidate()
I searched the repo for "consolidate" and can't find this. Can you be more specific? :)
search "dataset.consolidate" in control_robot.py
@vacuame I have no idea how I missed that, thanks
@Vacuame I have no idea how I missed that, thanks
Now i think MultiLeRobotDataset is the true solution, but i don't know how to use it
Hello, I also struggled with this issue for a long time, but I've resolved it. The solution is efficient and verified to work correctly. You can use the merge.py script from my PR (#924) to solve your problem. I hope this helps you, and wish you all the best.
@TangGuohh unfortunately it doesn't work
Resolved: https://github.com/search?q=repo%3Ahuggingface%2Flerobot%20MultiLeRobotDataset&type=code
thanks to HF team for releasing this!
We will finish more dataset editing tools soon, which will then also be available from the hub