autoware.universe
autoware.universe copied to clipboard
Improve CI efficiency on build-and-test-differential jobs
Checklist
- [X] I've read the contribution guidelines.
- [X] I've searched other issues and no duplicate issues were found.
- [X] I've agreed with the maintainers that I can plan this task.
Description
We have so many build-and-test-differential
jobs queued up.
18 Jobs waiting in queue
We currently have 2 runners just dedicated to run them.
- ec2 c6a.xlarge: 4c4t (EPYC V3) 8gb
- leo-copper pc: 4c8t (7820HK) 32gb
And these machines are just dedicated to run this build-and-test-differential
and clang-tidy-differential
actions only.
clang-tidy-differential
is not a required job, maybe we can think about disabling it too, we can discuss this in another issue.
But still, they take so much time on build action.
A good example to analyze is this job: feat(crosswalk): suppress chattering of GO/STOP decision
Here, in Get modified packages the only package to be built is: behavior_velocity_crosswalk_module
It's the only modified package.
In the build job, it then runs colcon build --packages-above-and-dependencies behavior_velocity_crosswalk_module
This is a big problem because it will now build the following packages :astonished::
autoware$ colcon list --packages-above-and-dependencies behavior_velocity_crosswalk_module --names-only
186 packages
- ad_api_adaptors
- automatic_pose_initializer
- autoware_ad_api_specs
- autoware_adapi_v1_msgs
- autoware_adapi_version_msgs
- autoware_auto_common
- autoware_auto_control_msgs
- autoware_auto_geometry
- autoware_auto_geometry_msgs
- autoware_auto_mapping_msgs
- autoware_auto_perception_msgs
- autoware_auto_planning_msgs
- autoware_auto_system_msgs
- autoware_auto_tf2
- autoware_auto_vehicle_msgs
- autoware_cmake
- autoware_common_msgs
- autoware_control_msgs
- autoware_external_api_msgs
- autoware_iv_external_api_adaptor
- autoware_iv_internal_api_adaptor
- autoware_launch
- autoware_lint_common
- autoware_map_msgs
- autoware_perception_msgs
- autoware_planning_msgs
- autoware_point_types
- autoware_testing
- autoware_utils
- awapi_awiv_adapter
- behavior_path_planner
- behavior_velocity_blind_spot_module
- behavior_velocity_crosswalk_module
- behavior_velocity_detection_area_module
- behavior_velocity_intersection_module
- behavior_velocity_no_drivable_lane_module
- behavior_velocity_no_stopping_area_module
- behavior_velocity_occlusion_spot_module
- behavior_velocity_out_of_lane_module
- behavior_velocity_planner
- behavior_velocity_planner_common
- behavior_velocity_run_out_module
- behavior_velocity_speed_bump_module
- behavior_velocity_stop_line_module
- behavior_velocity_traffic_light_module
- behavior_velocity_virtual_traffic_light_module
- behavior_velocity_walkway_module
- compare_map_segmentation
- component_interface_specs
- component_interface_utils
- component_state_monitor
- costmap_generator
- crosswalk_traffic_light_estimator
- cuda_utils
- default_ad_api
- detected_object_feature_remover
- detected_object_validation
- detection_by_tracker
- dummy_perception_publisher
- eagleye_coordinate
- eagleye_fix2pose
- eagleye_gnss_converter
- eagleye_msgs
- eagleye_navigation
- eagleye_rt
- ekf_localizer
- emergency_handler
- euclidean_cluster
- external_cmd_converter
- external_cmd_selector
- external_velocity_limit_selector
- fake_test_node
- fault_injection
- freespace_planner
- freespace_planning_algorithms
- global_parameter_loader
- grid_map_utils
- ground_segmentation
- gyro_odometer
- image_projection_based_fusion
- image_transport_decompressor
- interpolation
- kalman_filter
- lane_departure_checker
- lanelet2_extension
- lidar_apollo_instance_segmentation
- lidar_centerpoint
- llh_converter
- map_based_prediction
- map_height_fitter
- map_loader
- map_tf_generator
- mission_planner
- motion_utils
- motion_velocity_smoother
- mpc_lateral_controller
- multi_object_tracker
- mussp
- ndt_omp
- ndt_scan_matcher
- object_merger
- object_range_splitter
- object_recognition_utils
- object_velocity_splitter
- obstacle_avoidance_planner
- obstacle_cruise_planner
- obstacle_stop_planner
- occupancy_grid_map_outlier_filter
- osqp_interface
- path_distance_calculator
- perception_utils
- pid_longitudinal_controller
- planning_evaluator
- planning_test_utils
- planning_validator
- pointcloud_preprocessor
- pointcloud_to_laserscan
- pose_initializer
- probabilistic_occupancy_grid_map
- pure_pursuit
- radar_crossing_objects_noise_filter
- radar_fusion_to_detected_object
- radar_object_clustering
- radar_object_tracker
- raw_vehicle_cmd_converter
- route_handler
- rtc_auto_mode_manager
- rtc_interface
- rtklib_msgs
- scenario_selector
- shape_estimation
- shift_decider
- signal_processing
- simple_planning_simulator
- surround_obstacle_checker
- system_error_monitor
- system_monitor
- tensorrt_classifier
- tensorrt_common
- tensorrt_yolo
- tensorrt_yolox
- tier4_api_msgs
- tier4_api_utils
- tier4_auto_msgs_converter
- tier4_autoware_api_launch
- tier4_autoware_utils
- tier4_control_launch
- tier4_control_msgs
- tier4_debug_msgs
- tier4_external_api_msgs
- tier4_localization_launch
- tier4_localization_msgs
- tier4_map_launch
- tier4_map_msgs
- tier4_pcl_extensions
- tier4_perception_launch
- tier4_perception_msgs
- tier4_planning_launch
- tier4_planning_msgs
- tier4_rtc_msgs
- tier4_sensing_launch
- tier4_simulation_msgs
- tier4_simulator_launch
- tier4_system_launch
- tier4_system_msgs
- tier4_v2x_msgs
- tier4_vehicle_launch
- tier4_vehicle_msgs
- topic_state_monitor
- traffic_light_arbiter
- traffic_light_classifier
- traffic_light_fine_detector
- traffic_light_map_based_detector
- traffic_light_multi_camera_fusion
- traffic_light_occlusion_predictor
- traffic_light_utils
- traffic_light_visualization
- trajectory_follower_base
- trajectory_follower_node
- vehicle_cmd_gate
- vehicle_info_util
- yabloc_common
- yabloc_image_processing
- yabloc_monitor
- yabloc_particle_filter
- yabloc_pose_initializer
And these 186 packages include the infamous behavior_path_planner
and almost all of the behavior_velocity_x_module
s.
This is a waste of time and we can avoid all these and only build the single package. I will explain the plan below.
Purpose
- To reduce the time wasted in CI machines
- To reduce the queue in CI
- To help users get feedback from CI faster
Possible approaches
It's simple.
Approach 1, upload and download the build artifact
After a commit is made to the main branch,
- Update the packages to whatever latest, similar to the build
- Build the entire Autoware at that time (basically build-and-test job)
- Upload the compressed
autoware
folder as an artifact. (These are stored for 90 days by default, we can also increase the duration.) - In this
build-and-test-differential
job, after pulling the build environment, we run download-workflow-artifact to download the artifact from that workflow. - We decompress the
autoware
folder and checkout to this repository's branch inautoware.universe
. - Now that we have everything already built,
- Instead of
colcon build --packages-above-and-dependencies
- We can run
colcon build --packages-select
- For the example given above, we will only build a single package.
- Instead of
Approach 2 (preferred), push the built version as a docker image
Another approach can be to build a docker image in the build-and-test
with following layers:
- base image
- the changes made to update the system
- the changes made to build the autoware And push it to the registry.
build-and-test-differential
job can use this image as the base image.
This is much simpler and can be achieved faster probably.
With these approaches, feat(crosswalk): suppress chattering of GO/STOP decision will take like 4m instead of 40m
Definition of done
Approach 1:
- [ ] Modify the build-and-test.yaml to upload the artifact after it successfully builds.
- [ ] Modify the build-and-test-differential.yaml to:
- [ ] Download the artifact using download-workflow-artifact
- [ ] Extract and use it as the
autoware
folder instead of pulling - [ ] Also pull the latest branch in the
autoware.universe
repository - [ ] Build only the packages modified, not their dependencies
Approach 2 (preferred):
- [ ] Modify the build-and-test.yaml to build and push the
prebuilt
image of theautoware
. - [ ] Modify the build-and-test-differential.yaml to:
- [ ] Pull the
prebuilt
image instead - [ ] Also pull the latest branch in the
autoware.universe
repository - [ ] Build only the packages modified, not their dependencies
- [ ] Pull the
cc. @mitsudome-r @shmpwk @esteve
I didn't take --packages-above into consideration.
--packages-above [PKG_NAME [PKG_NAME …]] Select the packages with the passed names as well as packages which recursively depend on them.
So we should use this parameter.
I will also check how many packages are there with --packages-above
, compared to --packages-above-and-dependencies
Edit:
$ colcon list --packages-above behavior_velocity_crosswalk_module --names-only
behavior_velocity_crosswalk_module
behavior_velocity_planner
behavior_velocity_walkway_module
There was only 3 packages. So, we would still be saving 186-3=183
packages by implementing this proposal.
Related: https://github.com/orgs/autowarefoundation/discussions/3469#discussioncomment-5965946
Thank you for the proposal @xmfcx. Let me ask a minor question that came up in my mind:
How do you think we can guarantee that all the build-and-test-differential CIs can access the latest build artifacts? I understood that both approach assumes that the build artifacts of the dependency packages are from the latest main
, and may fail if this assumption breaks (correct me if I am wrong). I am not sure how flexible we can manage the order of CIs in GitHub Actions, but I guess it may be difficult to guarantee above statement if we consider multiple PRs created and merged in arbitrary order.
@xmfcx thanks for the proposal. One issue I see with this approach is that we'd have to be very strict about API and ABI versioning when a module changes. In the example you provided, if the module changed its API or ABI, or if it were a messages package that changed a message definition, we would still need to build the downstream packages. AFAIK, neither API, ABI or messages have been formally stabilized, so any package is still free to change anything and the downstream packages would need to be updated.
@esteve let's consider following scenario:
graph TD
F:::cls_01 -->|"depends on"| D:::cls_03
D -->|"depends on"| E:::cls_02
D -->|"depends on"| C:::cls_04
C-->|"depends on"| B:::cls_05
B -->|"depends on"| A:::cls_06
classDef cls_00 fill:#DDDDDD,stroke:#999,stroke-width:1px;
classDef cls_01 fill:#FBF8CC,stroke:#999,stroke-width:1px;
classDef cls_02 fill:#FDE4CF,stroke:#999,stroke-width:1px;
classDef cls_03 fill:#FFCFD2,stroke:#999,stroke-width:1px;
classDef cls_04 fill:#F1C0E8,stroke:#999,stroke-width:1px;
classDef cls_05 fill:#CFBAF0,stroke:#999,stroke-width:1px;
classDef cls_06 fill:#A3C4F3,stroke:#999,stroke-width:1px;
Here the Package C is changed/updated package.
$ colcon list --packages-up-to C
A, B, C
$ colcon list --packages-above C
C, D, F
$ colcon list --packages-above-depth 1 C
C, D
$ colcon list --packages-above-and-dependencies C
A, B, C, D, E, F
Keeping these in mind, Let's assume everything is compiled with colcon build
Then, Package C is changed.
Now, recompiling Package C and Package D should suffice. (colcon list --packages-above-depth 1 C
)
No need to recompile Package F since it's not directly dependent on Package C.
But current implementation builds every package out there (A, B, C, D, E, F) since because the build artifacts from the latest build is not present.
My proposal just tells it to rebuild and test C and D, and that's it.
How is this related to API/ABI compatibility?
@kminoda one way is to store the build artifacts with the commit hash of the universe repository.
Then we can check if they match. (In the beginning of the build-and-test-differential
job.)
If they don't match, the job fails.
We can print a sensible message to retry the job once the main build artifact is present.
@xmfcx
Thanks for the explanation, I see it more clearly now. It may work if we only build C and D, however, transitve dependencies (e.g. _export
tags) and -extras.cmake
files may cause problems, though, but I think we barely use them and always declare all the dependencies in downstream packages.
How is this related to API/ABI compatibility?
If C changes its API/ABI and we don't rebuild D, we wouldn't catch any incompatibilities, but I see now that you're proposing to rebuild D as well.
@xmfcx
Even Approach 2 seems more elegant; we would need to create a new container, copy artifacts into the new container and save the container into registry to later use it as a container for build-test-differential
. Instead, we can just save our artifacts with the commit hash in the build-test
then download them into build-test-differential
and checking the commit hash to validate. Or am I missing something ?
This pull request has been automatically marked as stale because it has not had recent activity.
Will be hopefully solved by:
- https://github.com/autowarefoundation/autoware/issues/3999
Now that
- https://github.com/autowarefoundation/autoware/issues/3999
is completed, we can close this. ✨🎇
@xmfcx Looking at build-and-test-differential we are still doing rosdep install
and colcon build
on --packages-above-and-dependencies
and not --packages-above-depth 1
as proposed in https://github.com/autowarefoundation/autoware.universe/issues/4497#issuecomment-1695694729. Is that on purpose?
I think it works well in theory, although I am not convinced that all packages declare their direct dependencies. For example in the case of:
┌─┐
┌──┤C├──┐
│ └─┘ │
│ ▼
│ ┌─┐
│ │B│
│ └┬┘
│ ┌─┐ │
└─►│A│◄─┘
└─┘
I can perfectly see C
only listing B
and relying on its dependency to A
. In which case doing --packages-above-depth 1
on A
will not test the A
/C
interface. However, note that I didn't try to look for an example of this happening and it might be an irrelevant concern.
Either way, we should be able to at least remove the -and-dependencies
part to gain some time.
@ambroise-arm thanks for reminding this, It's been so long, I forgot the real premise of my proposal 😅
To recap:
- Our docker images were too big for public runners
- We started running these workflows on self hosted runners
- We didn't have many, jobs sometimes piled up and caused this problem
Then,
- https://github.com/autowarefoundation/autoware/issues/3999 was merged yesterday
- Images are small again, they now run on public runners
- They also now use ccache artifacts in the prebuilt image, speeding things up much more compared to what we had before
But looking at this again, @esteve has been fixing the dependencies for all the packages for a while now.
I think we can give --packages-above-depth 1
option a shot.
Even if we have some failures, we will detect them once the https://github.com/autowarefoundation/autoware.universe/actions/workflows/build-and-test.yaml is triggered (it runs everytime something is merged to universe main) and we will have time to update the incorrect dependencies.
I will create a PR to modify it that way.
$ colcon list --packages-up-to C A, B, C $ colcon list --packages-above C C, D, F $ colcon list --packages-above-depth 1 C C, D $ colcon list --packages-above-and-dependencies C A, B, C, D, E, F
@ambroise-arm but since we won't have the build artifacts as in ~build
and~ install
folders from most recent successful previous run, we will need to combine:
$ colcon list --packages-above-depth 1 C && colcon list --packages-up-to C
A, B, C, D
We still need to build dependencies under.
We would need to restructure colcon-build action from autoware-github-actions and sync it with autoware.universe repo.
- https://github.com/autowarefoundation/autoware-github-actions/blob/8ce979ef277058cd28af34ebecf35347e658b576/colcon-build/action.yaml#L53-L85
Currently this is too much work for relatively little benefit (compared to the time when I had opened up this issue), will mark this as a low priority issue.
I will un-assign myself and leave this open.
but since we won't have the build artifacts as in
build
andinstall
folders from most recent successful previous run, we will need to combine:$ colcon list --packages-above-depth 1 C && colcon list --packages-up-to C A, B, C, D
We still need to build dependencies under.
We have the install
directory in the prebuilt image. And I think that's enough. Doing source
on the install artifacts and checking out the codebase, I can then for example do:
root@f66d64e2081b:/autoware/autoware.universe# colcon build --packages-select pure_pursuit
Starting >>> pure_pursuit
[3.802s] WARNING:colcon.colcon_core.shell:The following packages are in the workspace but haven't been built:
- osqp_interface
- tier4_autoware_utils
- interpolation
- vehicle_info_util
- motion_utils
- trajectory_follower_base
They are being used from the following locations instead:
- /autoware/install/osqp_interface
- /autoware/install/tier4_autoware_utils
- /autoware/install/interpolation
- /autoware/install/vehicle_info_util
- /autoware/install/motion_utils
- /autoware/install/trajectory_follower_base
To suppress this warning ignore these packages in the workspace:
--packages-ignore osqp_interface tier4_autoware_utils interpolation vehicle_info_util motion_utils trajectory_follower_base
[Processing: pure_pursuit]
[Processing: pure_pursuit]
Finished <<< pure_pursuit [1min 7s]
Summary: 1 package finished [1min 10s]
EDIT: Oh, was "from most recent successful previous run" the important part? Yeah, that might be a problem.
EDIT: Oh, was "from most recent successful previous run" the important part? Yeah, that might be a problem.
Exactly, this requires a good orchestration effort. Uploading most recent successful main branch run artifacts and retrieving them in the build-and-test-differential jobs.
But looking at this again, @esteve has been fixing the dependencies for all the packages for a while now. I think we can give --packages-above-depth 1 option a shot.
Yeah, I agree, we can give it a try.
This pull request has been automatically marked as stale because it has not had recent activity.
Closing this in favor of:
- https://github.com/autowarefoundation/autoware.universe/issues/7501