autoware.universe icon indicating copy to clipboard operation
autoware.universe copied to clipboard

Improve CI efficiency on build-and-test-differential jobs

Open xmfcx opened this issue 1 year ago • 20 comments

Checklist

  • [X] I've read the contribution guidelines.
  • [X] I've searched other issues and no duplicate issues were found.
  • [X] I've agreed with the maintainers that I can plan this task.

Description

We have so many build-and-test-differential jobs queued up.

18 Jobs waiting in queue

Screenshot from 2023-08-01 18-16-31

We currently have 2 runners just dedicated to run them.

  1. ec2 c6a.xlarge: 4c4t (EPYC V3) 8gb
  2. leo-copper pc: 4c8t (7820HK) 32gb

And these machines are just dedicated to run this build-and-test-differential and clang-tidy-differential actions only.

clang-tidy-differential is not a required job, maybe we can think about disabling it too, we can discuss this in another issue.

But still, they take so much time on build action.

A good example to analyze is this job: feat(crosswalk): suppress chattering of GO/STOP decision

Here, in Get modified packages the only package to be built is: behavior_velocity_crosswalk_module It's the only modified package.

In the build job, it then runs colcon build --packages-above-and-dependencies behavior_velocity_crosswalk_module

This is a big problem because it will now build the following packages :astonished:: autoware$ colcon list --packages-above-and-dependencies behavior_velocity_crosswalk_module --names-only

186 packages
  1. ad_api_adaptors
  2. automatic_pose_initializer
  3. autoware_ad_api_specs
  4. autoware_adapi_v1_msgs
  5. autoware_adapi_version_msgs
  6. autoware_auto_common
  7. autoware_auto_control_msgs
  8. autoware_auto_geometry
  9. autoware_auto_geometry_msgs
  10. autoware_auto_mapping_msgs
  11. autoware_auto_perception_msgs
  12. autoware_auto_planning_msgs
  13. autoware_auto_system_msgs
  14. autoware_auto_tf2
  15. autoware_auto_vehicle_msgs
  16. autoware_cmake
  17. autoware_common_msgs
  18. autoware_control_msgs
  19. autoware_external_api_msgs
  20. autoware_iv_external_api_adaptor
  21. autoware_iv_internal_api_adaptor
  22. autoware_launch
  23. autoware_lint_common
  24. autoware_map_msgs
  25. autoware_perception_msgs
  26. autoware_planning_msgs
  27. autoware_point_types
  28. autoware_testing
  29. autoware_utils
  30. awapi_awiv_adapter
  31. behavior_path_planner
  32. behavior_velocity_blind_spot_module
  33. behavior_velocity_crosswalk_module
  34. behavior_velocity_detection_area_module
  35. behavior_velocity_intersection_module
  36. behavior_velocity_no_drivable_lane_module
  37. behavior_velocity_no_stopping_area_module
  38. behavior_velocity_occlusion_spot_module
  39. behavior_velocity_out_of_lane_module
  40. behavior_velocity_planner
  41. behavior_velocity_planner_common
  42. behavior_velocity_run_out_module
  43. behavior_velocity_speed_bump_module
  44. behavior_velocity_stop_line_module
  45. behavior_velocity_traffic_light_module
  46. behavior_velocity_virtual_traffic_light_module
  47. behavior_velocity_walkway_module
  48. compare_map_segmentation
  49. component_interface_specs
  50. component_interface_utils
  51. component_state_monitor
  52. costmap_generator
  53. crosswalk_traffic_light_estimator
  54. cuda_utils
  55. default_ad_api
  56. detected_object_feature_remover
  57. detected_object_validation
  58. detection_by_tracker
  59. dummy_perception_publisher
  60. eagleye_coordinate
  61. eagleye_fix2pose
  62. eagleye_gnss_converter
  63. eagleye_msgs
  64. eagleye_navigation
  65. eagleye_rt
  66. ekf_localizer
  67. emergency_handler
  68. euclidean_cluster
  69. external_cmd_converter
  70. external_cmd_selector
  71. external_velocity_limit_selector
  72. fake_test_node
  73. fault_injection
  74. freespace_planner
  75. freespace_planning_algorithms
  76. global_parameter_loader
  77. grid_map_utils
  78. ground_segmentation
  79. gyro_odometer
  80. image_projection_based_fusion
  81. image_transport_decompressor
  82. interpolation
  83. kalman_filter
  84. lane_departure_checker
  85. lanelet2_extension
  86. lidar_apollo_instance_segmentation
  87. lidar_centerpoint
  88. llh_converter
  89. map_based_prediction
  90. map_height_fitter
  91. map_loader
  92. map_tf_generator
  93. mission_planner
  94. motion_utils
  95. motion_velocity_smoother
  96. mpc_lateral_controller
  97. multi_object_tracker
  98. mussp
  99. ndt_omp
  100. ndt_scan_matcher
  101. object_merger
  102. object_range_splitter
  103. object_recognition_utils
  104. object_velocity_splitter
  105. obstacle_avoidance_planner
  106. obstacle_cruise_planner
  107. obstacle_stop_planner
  108. occupancy_grid_map_outlier_filter
  109. osqp_interface
  110. path_distance_calculator
  111. perception_utils
  112. pid_longitudinal_controller
  113. planning_evaluator
  114. planning_test_utils
  115. planning_validator
  116. pointcloud_preprocessor
  117. pointcloud_to_laserscan
  118. pose_initializer
  119. probabilistic_occupancy_grid_map
  120. pure_pursuit
  121. radar_crossing_objects_noise_filter
  122. radar_fusion_to_detected_object
  123. radar_object_clustering
  124. radar_object_tracker
  125. raw_vehicle_cmd_converter
  126. route_handler
  127. rtc_auto_mode_manager
  128. rtc_interface
  129. rtklib_msgs
  130. scenario_selector
  131. shape_estimation
  132. shift_decider
  133. signal_processing
  134. simple_planning_simulator
  135. surround_obstacle_checker
  136. system_error_monitor
  137. system_monitor
  138. tensorrt_classifier
  139. tensorrt_common
  140. tensorrt_yolo
  141. tensorrt_yolox
  142. tier4_api_msgs
  143. tier4_api_utils
  144. tier4_auto_msgs_converter
  145. tier4_autoware_api_launch
  146. tier4_autoware_utils
  147. tier4_control_launch
  148. tier4_control_msgs
  149. tier4_debug_msgs
  150. tier4_external_api_msgs
  151. tier4_localization_launch
  152. tier4_localization_msgs
  153. tier4_map_launch
  154. tier4_map_msgs
  155. tier4_pcl_extensions
  156. tier4_perception_launch
  157. tier4_perception_msgs
  158. tier4_planning_launch
  159. tier4_planning_msgs
  160. tier4_rtc_msgs
  161. tier4_sensing_launch
  162. tier4_simulation_msgs
  163. tier4_simulator_launch
  164. tier4_system_launch
  165. tier4_system_msgs
  166. tier4_v2x_msgs
  167. tier4_vehicle_launch
  168. tier4_vehicle_msgs
  169. topic_state_monitor
  170. traffic_light_arbiter
  171. traffic_light_classifier
  172. traffic_light_fine_detector
  173. traffic_light_map_based_detector
  174. traffic_light_multi_camera_fusion
  175. traffic_light_occlusion_predictor
  176. traffic_light_utils
  177. traffic_light_visualization
  178. trajectory_follower_base
  179. trajectory_follower_node
  180. vehicle_cmd_gate
  181. vehicle_info_util
  182. yabloc_common
  183. yabloc_image_processing
  184. yabloc_monitor
  185. yabloc_particle_filter
  186. yabloc_pose_initializer

And these 186 packages include the infamous behavior_path_planner and almost all of the behavior_velocity_x_modules.

This is a waste of time and we can avoid all these and only build the single package. I will explain the plan below.

Purpose

  • To reduce the time wasted in CI machines
  • To reduce the queue in CI
  • To help users get feedback from CI faster

Possible approaches

It's simple.

Approach 1, upload and download the build artifact

After a commit is made to the main branch,

  • Update the packages to whatever latest, similar to the build
  • Build the entire Autoware at that time (basically build-and-test job)
  • Upload the compressed autoware folder as an artifact. (These are stored for 90 days by default, we can also increase the duration.)
  • In this build-and-test-differential job, after pulling the build environment, we run download-workflow-artifact to download the artifact from that workflow.
  • We decompress the autoware folder and checkout to this repository's branch in autoware.universe.
  • Now that we have everything already built,
    • Instead of colcon build --packages-above-and-dependencies
    • We can run colcon build --packages-select
    • For the example given above, we will only build a single package.

Approach 2 (preferred), push the built version as a docker image

Another approach can be to build a docker image in the build-and-test with following layers:

  • base image
  • the changes made to update the system
  • the changes made to build the autoware And push it to the registry.

build-and-test-differential job can use this image as the base image.

This is much simpler and can be achieved faster probably.

With these approaches, feat(crosswalk): suppress chattering of GO/STOP decision will take like 4m instead of 40m

Definition of done

Approach 1:

  • [ ] Modify the build-and-test.yaml to upload the artifact after it successfully builds.
  • [ ] Modify the build-and-test-differential.yaml to:
    • [ ] Download the artifact using download-workflow-artifact
    • [ ] Extract and use it as the autoware folder instead of pulling
    • [ ] Also pull the latest branch in the autoware.universe repository
    • [ ] Build only the packages modified, not their dependencies

Approach 2 (preferred):

  • [ ] Modify the build-and-test.yaml to build and push the prebuilt image of the autoware.
  • [ ] Modify the build-and-test-differential.yaml to:
    • [ ] Pull the prebuilt image instead
    • [ ] Also pull the latest branch in the autoware.universe repository
    • [ ] Build only the packages modified, not their dependencies

xmfcx avatar Aug 01 '23 16:08 xmfcx

cc. @mitsudome-r @shmpwk @esteve

xmfcx avatar Aug 01 '23 16:08 xmfcx

I didn't take --packages-above into consideration.

--packages-above [PKG_NAME [PKG_NAME …]] Select the packages with the passed names as well as packages which recursively depend on them.

So we should use this parameter. I will also check how many packages are there with --packages-above, compared to --packages-above-and-dependencies

Edit:

$ colcon list --packages-above behavior_velocity_crosswalk_module --names-only
behavior_velocity_crosswalk_module
behavior_velocity_planner
behavior_velocity_walkway_module

There was only 3 packages. So, we would still be saving 186-3=183 packages by implementing this proposal.

xmfcx avatar Aug 01 '23 18:08 xmfcx

Related: https://github.com/orgs/autowarefoundation/discussions/3469#discussioncomment-5965946

xmfcx avatar Aug 03 '23 13:08 xmfcx

Thank you for the proposal @xmfcx. Let me ask a minor question that came up in my mind:

How do you think we can guarantee that all the build-and-test-differential CIs can access the latest build artifacts? I understood that both approach assumes that the build artifacts of the dependency packages are from the latest main, and may fail if this assumption breaks (correct me if I am wrong). I am not sure how flexible we can manage the order of CIs in GitHub Actions, but I guess it may be difficult to guarantee above statement if we consider multiple PRs created and merged in arbitrary order.

kminoda avatar Aug 28 '23 08:08 kminoda

@xmfcx thanks for the proposal. One issue I see with this approach is that we'd have to be very strict about API and ABI versioning when a module changes. In the example you provided, if the module changed its API or ABI, or if it were a messages package that changed a message definition, we would still need to build the downstream packages. AFAIK, neither API, ABI or messages have been formally stabilized, so any package is still free to change anything and the downstream packages would need to be updated.

esteve avatar Aug 28 '23 12:08 esteve

@esteve let's consider following scenario:

graph TD
    F:::cls_01 -->|"depends on"| D:::cls_03
    D -->|"depends on"| E:::cls_02
    D -->|"depends on"| C:::cls_04
    C-->|"depends on"| B:::cls_05
    B -->|"depends on"| A:::cls_06

classDef cls_00 fill:#DDDDDD,stroke:#999,stroke-width:1px;
classDef cls_01 fill:#FBF8CC,stroke:#999,stroke-width:1px;
classDef cls_02 fill:#FDE4CF,stroke:#999,stroke-width:1px;
classDef cls_03 fill:#FFCFD2,stroke:#999,stroke-width:1px;
classDef cls_04 fill:#F1C0E8,stroke:#999,stroke-width:1px;
classDef cls_05 fill:#CFBAF0,stroke:#999,stroke-width:1px;
classDef cls_06 fill:#A3C4F3,stroke:#999,stroke-width:1px;

Here the Package C is changed/updated package.

$ colcon list --packages-up-to C
A, B, C

$ colcon list --packages-above C
C, D, F

$ colcon list --packages-above-depth 1 C
C, D

$ colcon list --packages-above-and-dependencies C
A, B, C, D, E, F

Keeping these in mind, Let's assume everything is compiled with colcon build

Then, Package C is changed.

Now, recompiling Package C and Package D should suffice. (colcon list --packages-above-depth 1 C) No need to recompile Package F since it's not directly dependent on Package C.

But current implementation builds every package out there (A, B, C, D, E, F) since because the build artifacts from the latest build is not present.

My proposal just tells it to rebuild and test C and D, and that's it.

How is this related to API/ABI compatibility?

xmfcx avatar Aug 28 '23 13:08 xmfcx

@kminoda one way is to store the build artifacts with the commit hash of the universe repository.

Then we can check if they match. (In the beginning of the build-and-test-differential job.) If they don't match, the job fails.

We can print a sensible message to retry the job once the main build artifact is present.

xmfcx avatar Aug 28 '23 13:08 xmfcx

@xmfcx

Thanks for the explanation, I see it more clearly now. It may work if we only build C and D, however, transitve dependencies (e.g. _export tags) and -extras.cmake files may cause problems, though, but I think we barely use them and always declare all the dependencies in downstream packages.

How is this related to API/ABI compatibility?

If C changes its API/ABI and we don't rebuild D, we wouldn't catch any incompatibilities, but I see now that you're proposing to rebuild D as well.

esteve avatar Aug 29 '23 08:08 esteve

@xmfcx

Even Approach 2 seems more elegant; we would need to create a new container, copy artifacts into the new container and save the container into registry to later use it as a container for build-test-differential. Instead, we can just save our artifacts with the commit hash in the build-test then download them into build-test-differential and checking the commit hash to validate. Or am I missing something ?

oguzkaganozt avatar Oct 03 '23 15:10 oguzkaganozt

This pull request has been automatically marked as stale because it has not had recent activity.

stale[bot] avatar Dec 03 '23 14:12 stale[bot]

Will be hopefully solved by:

  • https://github.com/autowarefoundation/autoware/issues/3999

xmfcx avatar Feb 06 '24 15:02 xmfcx

Now that

  • https://github.com/autowarefoundation/autoware/issues/3999

is completed, we can close this. ✨🎇

xmfcx avatar Mar 06 '24 17:03 xmfcx

@xmfcx Looking at build-and-test-differential we are still doing rosdep install and colcon build on --packages-above-and-dependencies and not --packages-above-depth 1 as proposed in https://github.com/autowarefoundation/autoware.universe/issues/4497#issuecomment-1695694729. Is that on purpose?

I think it works well in theory, although I am not convinced that all packages declare their direct dependencies. For example in the case of:

   ┌─┐    
┌──┤C├──┐ 
│  └─┘  │ 
│       ▼ 
│      ┌─┐
│      │B│
│      └┬┘
│  ┌─┐  │ 
└─►│A│◄─┘ 
   └─┘    

I can perfectly see C only listing B and relying on its dependency to A. In which case doing --packages-above-depth 1 on A will not test the A/C interface. However, note that I didn't try to look for an example of this happening and it might be an irrelevant concern.

Either way, we should be able to at least remove the -and-dependencies part to gain some time.

ambroise-arm avatar Mar 07 '24 09:03 ambroise-arm

@ambroise-arm thanks for reminding this, It's been so long, I forgot the real premise of my proposal 😅

To recap:

  • Our docker images were too big for public runners
  • We started running these workflows on self hosted runners
  • We didn't have many, jobs sometimes piled up and caused this problem

Then,

  • https://github.com/autowarefoundation/autoware/issues/3999 was merged yesterday
    • Images are small again, they now run on public runners
    • They also now use ccache artifacts in the prebuilt image, speeding things up much more compared to what we had before

But looking at this again, @esteve has been fixing the dependencies for all the packages for a while now. I think we can give --packages-above-depth 1 option a shot.

Even if we have some failures, we will detect them once the https://github.com/autowarefoundation/autoware.universe/actions/workflows/build-and-test.yaml is triggered (it runs everytime something is merged to universe main) and we will have time to update the incorrect dependencies.

I will create a PR to modify it that way.

xmfcx avatar Mar 07 '24 10:03 xmfcx

$ colcon list --packages-up-to C
A, B, C

$ colcon list --packages-above C
C, D, F

$ colcon list --packages-above-depth 1 C
C, D

$ colcon list --packages-above-and-dependencies C
A, B, C, D, E, F

@ambroise-arm but since we won't have the build artifacts as in ~build and~ install folders from most recent successful previous run, we will need to combine:

$ colcon list --packages-above-depth 1 C && colcon list --packages-up-to C
A, B, C, D

We still need to build dependencies under.

xmfcx avatar Mar 07 '24 11:03 xmfcx

We would need to restructure colcon-build action from autoware-github-actions and sync it with autoware.universe repo.

  • https://github.com/autowarefoundation/autoware-github-actions/blob/8ce979ef277058cd28af34ebecf35347e658b576/colcon-build/action.yaml#L53-L85

Currently this is too much work for relatively little benefit (compared to the time when I had opened up this issue), will mark this as a low priority issue.

I will un-assign myself and leave this open.

xmfcx avatar Mar 07 '24 11:03 xmfcx

but since we won't have the build artifacts as in build and install folders from most recent successful previous run, we will need to combine:

$ colcon list --packages-above-depth 1 C && colcon list --packages-up-to C
A, B, C, D

We still need to build dependencies under.

We have the install directory in the prebuilt image. And I think that's enough. Doing source on the install artifacts and checking out the codebase, I can then for example do:

root@f66d64e2081b:/autoware/autoware.universe# colcon build --packages-select pure_pursuit
Starting >>> pure_pursuit
[3.802s] WARNING:colcon.colcon_core.shell:The following packages are in the workspace but haven't been built:
- osqp_interface
- tier4_autoware_utils
- interpolation
- vehicle_info_util
- motion_utils
- trajectory_follower_base
They are being used from the following locations instead:
- /autoware/install/osqp_interface
- /autoware/install/tier4_autoware_utils
- /autoware/install/interpolation
- /autoware/install/vehicle_info_util
- /autoware/install/motion_utils
- /autoware/install/trajectory_follower_base
To suppress this warning ignore these packages in the workspace:
--packages-ignore osqp_interface tier4_autoware_utils interpolation vehicle_info_util motion_utils trajectory_follower_base
[Processing: pure_pursuit]                         
[Processing: pure_pursuit]                                     
Finished <<< pure_pursuit [1min 7s]                             

Summary: 1 package finished [1min 10s]

EDIT: Oh, was "from most recent successful previous run" the important part? Yeah, that might be a problem.

ambroise-arm avatar Mar 07 '24 11:03 ambroise-arm

EDIT: Oh, was "from most recent successful previous run" the important part? Yeah, that might be a problem.

Exactly, this requires a good orchestration effort. Uploading most recent successful main branch run artifacts and retrieving them in the build-and-test-differential jobs.

xmfcx avatar Mar 07 '24 11:03 xmfcx

But looking at this again, @esteve has been fixing the dependencies for all the packages for a while now. I think we can give --packages-above-depth 1 option a shot.

Yeah, I agree, we can give it a try.

esteve avatar Mar 07 '24 12:03 esteve

This pull request has been automatically marked as stale because it has not had recent activity.

stale[bot] avatar May 06 '24 23:05 stale[bot]

Closing this in favor of:

  • https://github.com/autowarefoundation/autoware.universe/issues/7501

xmfcx avatar Jun 16 '24 09:06 xmfcx