TORCH_USE_CUDA_DSA on policy act
System Info
Windows, CUDA 12.8, torch Nightly 2.8.0, using 1*So100 dataset include 3 ep/example of simple "pick and place".
Here for additional information of my full installation.
- System Info:
python lerobot/scripts/display_sys_info.py&python -m torch.utils.collect_env&python -c "import torch; print(torch.cuda.is_available())" && nvcc -V:
- 1*so100 fully assembled from wowrobo. 2*USB-C & power adapter for arms + 1*USB camera.
- `lerobot` version: 0.1.0
- Platform: Windows-10-10.0.22621-SP0
- Python version: 3.10.8
- Huggingface_hub version: 0.30.1
- Dataset version: 3.5.0
- Numpy version: 2.1.2
- PyTorch version (GPU?): 2.8.0.dev20250327+cu128 (True)
- Cuda version: 12080
PyTorch version: 2.8.0.dev20250327+cu128
Is debug build: False
CUDA used to build PyTorch: 12.8
OS: Microsoft Windows 11 Pro (10.0.22621 64-bit)
CMake version: version 4.0.0
Is CUDA available: True
Nvidia driver version: 572.83
cuDNN version: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\bin\cudnn_ops64_9.dll
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==2.1.2
[pip3] torch==2.8.0.dev20250327+cu128
[pip3] torchaudio==2.6.0.dev20250401+cu128
[pip3] torchcodec==0.0.0.dev0
[pip3] torchvision==0.22.0.dev20250401+cu128
True
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:42:46_Pacific_Standard_Time_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0
- Set up Python environments:
git clone https://github.com/huggingface/lerobot.git && cd lerobot
python -m venv venv && venv\Scripts\activate
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
pip install av poetry-core # Fix Windows
pip install -e ".[feetech]"
- Find motors bus port
(venv) C:\lerobot>python lerobot/scripts/find_motors_bus_port.py
Finding all available ports for the MotorsBus.
Ports before disconnecting: ['COM3', 'COM4']
Remove the USB cable from your MotorsBus and press Enter when done.
The port of this MotorsBus is 'COM4'
Reconnect the USB cable.
-
and find camera index
python lerobot/common/robot_devices/cameras/opencv.py --images-dir outputs/images_from_opencv_cameras. -
Use steps 3,4 to modify accordingly
lerobot\common\robot_devices\robots\configs.py:
class So100RobotConfig(ManipulatorRobotConfig):
...
leader_arms: dict[str, MotorsBusConfig] = field(
...
+ port="COM3",
...
follower_arms: dict[str, MotorsBusConfig] = field(
...
+ port="COM4",
...
cameras: dict[str, CameraConfig] = field(
default_factory=lambda: {
"laptop": OpenCVCameraConfig(
+ camera_index=2,
...
)
}
)
- Follow the steps from
calibrate, Fix unexpected rotation by recalibrating one arm by adding--control.arms=[\"main_leader\"]:
python lerobot/scripts/control_robot.py --robot.type=so100 --control.type=calibrate
- Verify calibration with
teleoperate:
python lerobot/scripts/control_robot.py --robot.type=so100 --control.type=teleoperate
- Creating a dataset with teleoperation;
7_get_started_with_real_robot.mdRecommend at least 50 episodes, 10ep× 5 different starting location. (Stopped at 3):
python lerobot/scripts/control_robot.py --robot.type=so100 --control.type=record --control.fps=30 --control.single_task="Pick and place task" --control.repo_id=local/so100_dataset --control.root=C:/lerobot/training_data --control.push_to_hub=false --control.warmup_time_s=5 --control.episode_time_s=30 --control.reset_time_s=15 --control.num_episodes=50 --control.video=true
- Training model with
diffusionPolicy, orpi0,tdmpc; but--policy.type=actdoes not work for me. 200k steps or until the loss started plateauing.
python lerobot/scripts/train.py --dataset.repo_id=local/so100_dataset --dataset.root=C:/lerobot/training_data --policy.type=diffusion --output_dir=C:/lerobot/model_diffusion --job_name=diffusion_training --policy.device=cuda --policy.use_amp=true --batch_size=8 --steps=200000 --num_workers=4 --wandb.enable=false --dataset.video_backend=pyav --seed=1234
- Evaluation/Run model:
python lerobot/scripts/control_robot.py --robot.type=so100 --control.type=record --control.fps=30 --control.single_task="Diffusion policy evaluation" --control.repo_id=local/eval_diffusion --control.root=C:/lerobot/diffusion_eval --control.push_to_hub=false --control.warmup_time_s=5 --control.episode_time_s=30 --control.reset_time_s=15 --control.num_episodes=10 --control.video=true --control.policy.path=C:/lerobot/model_diffusion/checkpoints/last/pretrained_model --control.num_image_writer_processes=1
Information
- [x] One of the scripts in the examples/ folder of LeRobot
- [x] My own task or dataset (give details below)
Reproduction
(venv) C:\lerobot>python lerobot/scripts/train.py --dataset.repo_id=local/so100_quick --dataset.root=C:/lerobot/quick_demo --policy.type=act --output_dir=C:/lerobot/quick_model_act --job_name=quick_demo_act --policy.device=cuda --policy.use_amp=true --batch_size=1 --steps=5000 --num_workers=4 --wandb.enable=false --dataset.video_backend=pyav --policy.chunk_size=10 --policy.n_action_steps=10 --policy.n_heads=4 --policy.dim_feedforward=1024 --policy.n_encoder_layers=2
INFO ts\train.py:111 {'batch_size': 1,
'dataset': {'episodes': None,
'image_transforms': {'enable': False,
'max_num_transforms': 3,
'random_order': False,
'tfs': {'brightness': {'kwargs': {'brightness': [0.8,
1.2]},
'type': 'ColorJitter',
'weight': 1.0},
'contrast': {'kwargs': {'contrast': [0.8,
1.2]},
'type': 'ColorJitter',
'weight': 1.0},
'hue': {'kwargs': {'hue': [-0.05,
0.05]},
'type': 'ColorJitter',
'weight': 1.0},
'saturation': {'kwargs': {'saturation': [0.5,
1.5]},
'type': 'ColorJitter',
'weight': 1.0},
'sharpness': {'kwargs': {'sharpness': [0.5,
1.5]},
'type': 'SharpnessJitter',
'weight': 1.0}}},
'repo_id': 'local/so100_quick',
'revision': None,
'root': 'C:/lerobot/quick_demo',
'use_imagenet_stats': True,
'video_backend': 'pyav'},
'env': None,
'eval': {'batch_size': 50, 'n_episodes': 50, 'use_async_envs': False},
'eval_freq': 20000,
'job_name': 'quick_demo_act',
'log_freq': 200,
'num_workers': 4,
'optimizer': {'betas': [0.9, 0.999],
'eps': 1e-08,
'grad_clip_norm': 10.0,
'lr': 1e-05,
'type': 'adamw',
'weight_decay': 0.0001},
'output_dir': 'C:\\lerobot\\quick_model_act',
'policy': {'chunk_size': 10,
'device': 'cuda',
'dim_feedforward': 1024,
'dim_model': 512,
'dropout': 0.1,
'feedforward_activation': 'relu',
'input_features': {},
'kl_weight': 10.0,
'latent_dim': 32,
'n_action_steps': 10,
'n_decoder_layers': 1,
'n_encoder_layers': 2,
'n_heads': 4,
'n_obs_steps': 1,
'n_vae_encoder_layers': 4,
'normalization_mapping': {'ACTION': <NormalizationMode.MEAN_STD: 'MEAN_STD'>,
'STATE': <NormalizationMode.MEAN_STD: 'MEAN_STD'>,
'VISUAL': <NormalizationMode.MEAN_STD: 'MEAN_STD'>},
'optimizer_lr': 1e-05,
'optimizer_lr_backbone': 1e-05,
'optimizer_weight_decay': 0.0001,
'output_features': {},
'pre_norm': False,
'pretrained_backbone_weights': 'ResNet18_Weights.IMAGENET1K_V1',
'replace_final_stride_with_dilation': False,
'temporal_ensemble_coeff': None,
'type': 'act',
'use_amp': True,
'use_vae': True,
'vision_backbone': 'resnet18'},
'resume': False,
'save_checkpoint': True,
'save_freq': 20000,
'scheduler': None,
'seed': 1000,
'steps': 5000,
'use_policy_training_preset': True,
'wandb': {'disable_artifact': False,
'enable': False,
'entity': None,
'mode': None,
'notes': None,
'project': 'lerobot',
'run_id': None}}
INFO ts\train.py:117 Logs will be saved locally.
INFO ts\train.py:127 Creating dataset
INFO ts\train.py:138 Creating policy
INFO ts\train.py:144 Creating optimizer and scheduler
INFO ts\train.py:156 Output dir: C:\lerobot\quick_model_act
INFO ts\train.py:159 cfg.steps=5000 (5K)
INFO ts\train.py:160 dataset.num_frames=2512 (3K)
INFO ts\train.py:161 dataset.num_episodes=3
INFO ts\train.py:162 num_learnable_params=27271942 (27M)
INFO ts\train.py:163 num_total_params=27271984 (27M)
INFO ts\train.py:202 Start offline training on a fixed dataset
C:\lerobot\venv\lib\site-packages\torch\autograd\graph.py:824: UserWarning: Ignoring invalid value for boolean flag CUDA_LAUNCH_BLOCKING: 1 valid values are 0 or 1. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\pytorch\c10\util\env.cpp:91.)
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
File "C:\lerobot\lerobot\scripts\train.py", line 288, in <module>
train()
File "C:\lerobot\lerobot\configs\parser.py", line 227, in wrapper_inner
response = fn(cfg, *args, **kwargs)
File "C:\lerobot\lerobot\scripts\train.py", line 212, in train
train_tracker, output_dict = update_policy(
File "C:\lerobot\lerobot\scripts\train.py", line 73, in update_policy
grad_scaler.scale(loss).backward()
File "C:\lerobot\venv\lib\site-packages\torch\_tensor.py", line 648, in backward
torch.autograd.backward(
File "C:\lerobot\venv\lib\site-packages\torch\autograd\__init__.py", line 353, in backward
_engine_run_backward(
File "C:\lerobot\venv\lib\site-packages\torch\autograd\graph.py", line 824, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA error: too many resources requested for launch
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
However i do not think vram is issue since i can train --scheduler.type diffuser just fine.
Expected behavior
I have modified the act cmd to ensure low VRAM but without success.
I had the same issue with you. My gpu is rtx5080 and torch version is 2.8.0.dev20250331+cu128 But after upgrading torch version, it did work. pip install --upgrade --pre torch torchvision torchaudio torchcodec --index-url https://download.pytorch.org/whl/nightly/cu128 Current torch version is 2.8.0.dev20250407+cu128.