lerobot icon indicating copy to clipboard operation
lerobot copied to clipboard

Is there any documentation to create a custom dataset?

Open HiroIshida opened this issue 1 year ago • 8 comments
trafficstars

lerobot/examples elaborates on how to load and train using the existing dataset on the hugging face repos. Rather I'd like to know how to turn self-collected data into the dataset. So, I'd like to if there is some documentation for that.

HiroIshida avatar Jul 04 '24 17:07 HiroIshida

Same here. I have my data ready but the datasets class seem rather complex to instantiate, so one (or more depending on the number of camera for examples) examples would be very nice.

RochMollero avatar Jul 09 '24 08:07 RochMollero

same here!I really need an example.

TheArtificialOutsider avatar Jul 09 '24 14:07 TheArtificialOutsider

same question!

zwbx avatar Jul 10 '24 12:07 zwbx

@TheArtificialOutsider @zwbx @RochMollero Got it! We will address this issue very soon, and simplify stuff ;)

Any chance you could provide a very short sample of the datasets in the comment?

In the meantime, a few pointers and ressources:

README:

  • https://github.com/huggingface/lerobot?tab=readme-ov-file#the-lerobotdataset-format

See how we use from_preloaded:

  • https://github.com/huggingface/lerobot/blob/01f8cede0b5f1c16330205b35f4391939e11cb3e/lerobot/scripts/control_robot.py#L336-L344 (see dataset.stats = stats to use it directly after)
  • https://github.com/huggingface/lerobot/blob/main/lerobot/scripts/push_dataset_to_hub.py#L247-L258

See the content of these files to instantiate the hf_dataset, encode the videos, or store frames, etc.

  • from hdf5: https://github.com/huggingface/lerobot/blob/main/lerobot/common/datasets/push_dataset_to_hub/aloha_hdf5_format.py#L209-L211
  • from zarr: https://github.com/huggingface/lerobot/blob/main/lerobot/common/datasets/push_dataset_to_hub/pusht_zarr_format.py#L257-L259
  • from parquet: https://github.com/huggingface/lerobot/blob/main/lerobot/common/datasets/push_dataset_to_hub/dora_parquet_format.py#L213-L215
  • from pickle: https://github.com/huggingface/lerobot/blob/main/lerobot/common/datasets/push_dataset_to_hub/xarm_pkl_format.py#L176-L178

Cadene avatar Jul 10 '24 21:07 Cadene

cc @michel-aractingi for visibility ;)

Cadene avatar Jul 10 '24 21:07 Cadene

Hi, thanks for your attention to this matter. I am using RLbench dataset now. I have raw data now, containing image observations, actions. how can I organize them and transfer them to the hf dataset?

Get Outlook for iOShttps://aka.ms/o0ukef


From: Remi @.> Sent: Thursday, July 11, 2024 7:08:41 AM To: huggingface/lerobot @.> Cc: Wenbo Zhang @.>; Mention @.> Subject: Re: [huggingface/lerobot] Is there any documentation to create a custom dataset? (Issue #304)

CAUTION: External email. Only click on links or open attachments from trusted senders.


cc @michel-aractingihttps://github.com/michel-aractingi for visibility ;)

— Reply to this email directly, view it on GitHubhttps://github.com/huggingface/lerobot/issues/304#issuecomment-2221560936, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJFDSL3NMJK2265XNQKPZDDZLWSWDAVCNFSM6AAAAABKL436HWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRRGU3DAOJTGY. You are receiving this because you were mentioned.Message ID: @.***>

zwbx avatar Jul 11 '24 01:07 zwbx

It would be even better if there were tutorials on how to train using custom data from the gym simulation environment.

x2ss avatar Aug 27 '24 09:08 x2ss

Hey there, we are still working on a simplification of the dataset class + upload to hub + tutorial! :) Sorry if it's taking some time!

Unfortunately, the only option now is to get familiar with: https://github.com/huggingface/lerobot/blob/main/lerobot/scripts/push_dataset_to_hub.py

See some example commands in header. You can eventually adapt one of them to your dataset format. If you have issue understanding the code, reach out to us on discord #help channel

Cadene avatar Aug 27 '24 20:08 Cadene

Closing this as we now have a tutorial to easily record and push your own datasets. Feel free to reopen if need be ;)

aliberts avatar Sep 02 '24 19:09 aliberts

As said in the tutorial “If you don't want to push to hub, use --push-to-hub 0.”, where "--push-to-hub 0" should be use? Replace "--repo-id ${HF_USER}/koch_test " with "--push-to-hub 0"?

Also, after reading the tutorial I still feel puzzled how make a dataset with image observations, actions, and use them to run lerobot.

It seems to me that a useful example would be as ACT, where the dataset is saved in a clear way.

Thanks and best regards!

x2ss avatar Sep 03 '24 08:09 x2ss

--push-to-hub 0 is an option of the lerobot/scripts/control_robot.py script. This is simply to deactivate uploading your dataset to the hub when using the record function.

As for the tutorial, it teaches you — amongst other things — how to record a LeRobotDataset with the Koch v1.1 arm (although it can be adapted to other robots, we are working on it).

It seems to me that a useful example would be as ACT, where the dataset is saved in a clear way.

Could you elaborate? What other scenario do you have in mind?

aliberts avatar Sep 03 '24 10:09 aliberts

Hi, thanks for your attention to this matter. I am using RLbench dataset now. I have raw data now, containing image observations, actions. how can I organize them and transfer them to the hf dataset?

Get Outlook for iOShttps://aka.ms/o0ukef

Hi, have you figure out how to combine RLBench and Lerobot and organize data to lerobot dataset? Thanks

yzzueong avatar Jun 11 '25 20:06 yzzueong

Closing this as we now have a tutorial to easily record and push your own datasets. Feel free to reopen if need be ;)

Hi! It seems that this file is missing. I'm still looking for a tutorial to guide how to record a custom dataset with robots out of officially supported ones (franka panda to be specifically). Could you please suggest some material?

FANG-Zhiwei avatar Jun 18 '25 15:06 FANG-Zhiwei

Closing this as we now have a tutorial to easily record and push your own datasets. Feel free to reopen if need be ;)

Hi! It seems that this file is missing. I'm still looking for a tutorial to guide how to record a custom dataset with robots out of officially supported ones (franka panda to be specifically). Could you please suggest some material?

Hello @FANG-Zhiwei,

I'm not sure if the documents Record a dataset and Bring Your Own Hardware are what you're looking for, but you can check them out.

Hope this helps!

tc-huang avatar Jun 18 '25 15:06 tc-huang

I havent tried yet, but this openpi code may help https://github.com/Physical-Intelligence/openpi/blob/main/examples/libero/convert_libero_data_to_lerobot.py#L46-L93

HiroIshida avatar Jul 24 '25 16:07 HiroIshida

@TheArtificialOutsider @zwbx @RochMollero Got it! We will address this issue very soon, and simplify stuff ;)

Any chance you could provide a very short sample of the datasets in the comment?

In the meantime, a few pointers and ressources:

README:

  • https://github.com/huggingface/lerobot?tab=readme-ov-file#the-lerobotdataset-format

See how we use from_preloaded:

  • lerobot/lerobot/scripts/control_robot.py

        Lines 336 to 344
      in
      [01f8ced](/huggingface/lerobot/commit/01f8cede0b5f1c16330205b35f4391939e11cb3e)
    
    
    
    
    
    
    
           lerobot_dataset = LeRobotDataset.from_preloaded( 
    
    
    
    
               repo_id=repo_id, 
    
    
    
    
               hf_dataset=hf_dataset, 
    
    
    
    
               episode_data_index=episode_data_index, 
    
    
    
    
               info=info, 
    
    
    
    
               videos_dir=videos_dir, 
    
    
    
    
           ) 
    
    
    
    
           stats = compute_stats(lerobot_dataset) if run_compute_stats else {} 
    
    
    
    
           lerobot_dataset.stats = stats 
    

    (see dataset.stats = stats to use it directly after)

  • https://github.com/huggingface/lerobot/blob/main/lerobot/scripts/push_dataset_to_hub.py#L247-L258

See the content of these files to instantiate the hf_dataset, encode the videos, or store frames, etc.

  • from hdf5: https://github.com/huggingface/lerobot/blob/main/lerobot/common/datasets/push_dataset_to_hub/aloha_hdf5_format.py#L209-L211
  • from zarr: https://github.com/huggingface/lerobot/blob/main/lerobot/common/datasets/push_dataset_to_hub/pusht_zarr_format.py#L257-L259
  • from parquet: https://github.com/huggingface/lerobot/blob/main/lerobot/common/datasets/push_dataset_to_hub/dora_parquet_format.py#L213-L215
  • from pickle: https://github.com/huggingface/lerobot/blob/main/lerobot/common/datasets/push_dataset_to_hub/xarm_pkl_format.py#L176-L178

these data format convert scripts are not available now. Can you give a new url for these?

lin-whale avatar Aug 18 '25 11:08 lin-whale