atom
atom copied to clipboard
Create a CI/CD workflow for ATOM using GitHub Actions
I've been working on getting ATOM to be integrated in a CI/CD pipeline, in which simulations are run on the Cloud, using AWS as a Cloud service.
To this end, I am using Rigel as a tool for the "preparation" of ATOM for Cloud simulations. Specifically, Rigel creates two Docker containers:
- a
simulation
application, that (for now) simply launches a gazebo environment with a simple robot (two RGB cameras in a tripod); - a
robot
application that calibrates a dataset (for now).
Note that the names simulation
and robot
are only those because that's the nomenclature that AWS uses. Generally, the simulation
app includes the things the we don't want to test, while the robot
app contains everything we wish to test - in this case, the calibration process.
Rigel is a plugin-based tool, and so far I'm making use of 4 of them, the first three of which are:
-
dockerfile
automatically creates a Dockerfile for the creation of a Docker image, in this case, contains ATOM and the robot we wish to calibrate; -
build
actually builds said image; -
test
which tests the applications locally.
The fourth plugin is one I am currently developing, that does an introspection of a process (in this case, the calibration process) by reading a results .csv
file. So the way this fits into my current pipeline is the following: my robot
application not only runs the calibration, but also runs the calibration evaluation, outputting said .csv
file (this is only a feature in @JorgeFernandes-Git's ATOM branch, if I'm not mistaken). The test
plugin, after running this evaluation, extracts the produced .csv
file from the container, which is used by my file_introspection
plugin.
You can see this working in this video:
local_testing_calibration_evaluation.webm
The next step in this project is to get these local tests and the introspection to be run on a CI/CD pipeline, using GitHub Actions. I am going to be working on my ATOM fork, found here.
Your work looks awesome @Kazadhum. Congratulations
Great work @Kazadhum . Keep it up.
I'm running into an issue right at the beginning of trying to implement this. I've tried figuring this out but it's proving to be more complicated than I thought to find answers to this. So running a very simple workflow, I'm getting an error in the install_target_depencies
stage of the industrial_ci job. @rarrais, @MisterOwlPT, do you have any insights?
$ ( source /opt/ros/noetic/setup.bash && rosdep install -q --from-paths /root/target_ws/src --ignore-src -y | grep -E '(executing command)|(Setting up)' ; )
ERROR: the following packages/stacks could not have their rosdep keys resolved
to system dependencies:
atom_calibration: Cannot locate rosdep definition for [python3-graphviz-pip]
'( source /opt/ros/noetic/setup.bash && rosdep install -q --from-paths /root/target_ws/src --ignore-src -y | grep -E '(executing command)|(Setting up)' ; )' returned with 1
'install_target_dependencies' returned with code '1' after 0 min 1 sec
It seems that the issue is related to dependencies -- maybe the package atom_calibration
is not properly declared as a dependency? Can you please paste here the link to the log so that we have a look at what stage of the process this happens?
Let me tag here also @sergiodmlteixeira so that he has a look as well.
Thank you, @rarrais, here's the log. I did try changing that dependency in the atom_calibration
package to python3-graphviz
(from python3-graphviz-pip
) which did work, but I wanted to see if I could do it without changing the ATOM dependencies.
For the record, here's the log from when I changed the dependency in ´atom_calibration`.
In the meanwhile, I've been working on getting Rigel to run without using industrial_ci to build ATOM first (to save Actions usage time).
Ok, so I've had a bit of trouble running Rigel in Github Actions.
First, I had issues with the poetry environment, because I couldn't use the poetry shell
command to "get into" the virtual environment. I fixed this by simply activating it using:
. /home/runner/.cache/pypoetry/virtualenvs/rigel-u4E7_ENg-py3.10/bin/activate
The problem I'm running into currently is the following: I can enter the virtual env and the rigel run
command is recognized, but I'm getting 'unexpected extra arguments' for the job dockerfile
part. Here's the log. The same thing happens when running rigel run sequence deploy
.
Do you have any ideas, @rarrais, @MisterOwlPt, @sergiodmlteixeira? Thanks in advance
Hi @rarrais, @MisterOwlPT and @miguelriemoliveira! I'm tagging you to let you know about my progress!
I've managed to circumvent this issue by simply installing both Rigel's develop branch and my File Introspection Plugin via pip
in Github Actions, and I've managed to run rigel run sequence deploy
(i.e. the Dockerfile generation and Docker image building) without issues. Here's the corresponding log and workflow file.
Now, I'm going to try running the calibration, evaluation and introspection as well, so I can then work on getting several calibrations and introspections running in parallel.
I've run into an issue again. When running the testing and introspection plugins in Github Actions, the calibration evaluation results file rgb_to_rgb_results.csv
is not found inside the container. Because of this, it isn't saved as an artifact and the introspection does not run. Here's the log.
This did not happen locally. I suspect this might be a problem with environment variables, but I'm not too sure about that.
Hi @Kazadhum , looking at the logs and at the workflow file, it seems to me that the issue might be the plugin not being capable of accessing the file. Looking at the logs it seems that the file is not in this directory: /home/runner/.rigel/archives/test/latest/calibration_evaluation/rgb_to_rgb_results.csv
.
Do we have a way to confirm where the calibration procedure is saving the file?
Hi @rarrais! By looking at line 95 of that same log, it looks like the file is not in that directory because it doesn't exist in the container at all. I'll run some debugging commits to see if I can ascertain the reason.
Hi @rarrais! By looking at line 95 of that same log, it looks like the file is not in that directory because it doesn't exist in the container at all. I'll run some debugging commits to see if I can ascertain the reason.
Looks like this is correct, and I've confirmed this by checking the directory structure, like this.
So it seems that the issue occurs inside the container generated by Rigel. Rigel cannot find the file inside the container to output as an artifact. The weird thing is that this does not happen when running this locally. So I wanted to ask, @rarrais @MisterOwlPT @sergiodmlteixeira, because I can't find a clear answer online or in the Actions documentation, what exactly is running my workflow? Is it a container, a VM?
Thanks in advance :)
Hello @rarrais! Since we spoke yesterday, I've been working locally with act and using this I can attach a terminal to the rigel container inside the github actions container! And I've found that the issue is that the calibration_evaluation
container runs into the following error:
ImportError: this platform is not supported: ('failed to acquire X connection: Can\'t connect to display ":0": b\'No protocol specified\\n\'', DisplayConnectionError(':0', b'No protocol specified\n'))
Try one of the following resolutions:
* Please make sure that you have an X server running, and that the DISPLAY environment variable is set correctly
INFO - 2023-04-13 14:35:42,886 - core - signal_shutdown [atexit]
I'm trying to open the xhost server as part of the workflow to see if that fixes it.
Hi @Kazadhum , good news on the progress. Do you know why/if calibration_evaluation
actually needs an X server to run? As it is part of ATOM, maybe @miguelriemoliveira can help.
Hi @Kazadhum , good news on the progress. Do you know why/if
calibration_evaluation
actually needs an X server to run? As it is part of ATOM, maybe @miguelriemoliveira can help.
@rarrais It might have been a mistake on my end. It probably doesn't need it and I just put it in the Rigelfile in case it was needed. I'll run some local tests to check.
Hey @rarrais and @miguelriemoliveira. It seems I was wrong and it is in fact ATOM that needs the X server to run, so it wasn't my mistake.
Running the calibration locally without mounting the X11 volume and declaring the DISPLAY env variable results in the same error.
So it seems the question now is: how can I enable the X server inside the Github Actions container? Because just running xhost +
doesn't seem to work. I tried this using act
and then I ran it on Github to show you the log.
Hi @Kazadhum ,
The calibration evaluation will run some imshows from opencv, thus it needs an X server.
I think there is a mode however where no windows ares launched.
How are you launching the script?
Hi @miguelriemoliveira! I'm using:
rosrun atom_evaluation rgb_to_rgb_evaluation -train_json $ATOM_DATASETS/t2rgb/atom_calibration.json -test_json $ATOM_DATASETS/t2rgb/dataset.json --sensor_source right_camera --sensor_target left_camera --show_images False -sfr $HOME/
Note that I have show_images
set to false, but the same error happens. Thank you! :)
Hi @Kazadhum ,
I think the show_images flag is an action_true flag, meaning you add it to set true, and do not enter it to have it false.
Can you post the output of
rosrun atom_evaluation rgb_to_rgb_evaluation -h
Hi @Kazadhum ,
I think the show_images flag is an action_true flag, meaning you add it to set true, and do not enter it to have it false.
Can you post the output of
rosrun atom_evaluation rgb_to_rgb_evaluation -h
Hello @miguelriemoliveira! Here's the output you asked for:
usage: rgb_to_rgb_evaluation [-h] -train_json TRAIN_JSON_FILE -test_json TEST_JSON_FILE -ss
SENSOR_SOURCE -st SENSOR_TARGET [-si] [-sfr SAVE_FILE_RESULTS]
optional arguments:
-h, --help show this help message and exit
-train_json TRAIN_JSON_FILE, --train_json_file TRAIN_JSON_FILE
Json file containing train input dataset.
-test_json TEST_JSON_FILE, --test_json_file TEST_JSON_FILE
Json file containing test input dataset.
-ss SENSOR_SOURCE, --sensor_source SENSOR_SOURCE
Source transformation sensor.
-st SENSOR_TARGET, --sensor_target SENSOR_TARGET
Target transformation sensor.
-si, --show_images If true the script shows images.
-sfr SAVE_FILE_RESULTS, --save_file_results SAVE_FILE_RESULTS
Output folder to where the results will be stored.
Thanks,
so you see the [-si] ? iIt does not have value in capitals after it.
That means if you want images you use
rosrun atom_evaluation rgb_to_rgb_evaluation ... --show_images
if you you do not you run
rosrun atom_evaluation rgb_to_rgb_evaluation ...
Thanks,
so you see the [-si] ? iIt does not have value in capitals after it.
That means if you want images you use
rosrun atom_evaluation rgb_to_rgb_evaluation ... --show_images
if you you do not you run
rosrun atom_evaluation rgb_to_rgb_evaluation ...
Hi @miguelriemoliveira! Thank you the reply. The weird thing is, even without the [-si]
oprion, it still returns the same error message and I don't really know why. @MisterOwlPT, have you had any similar experiences with Github Actions?
EDIT: I've found this Stack Overflow entry: https://stackoverflow.com/questions/63125480/running-a-gui-application-on-a-ci-service-without-x11. I'm trying to see if I can use this action (https://github.com/coactions/setup-xvfb) to get through this part.
Let me do some experiments and get back to you ...
Hi @Kazadhum ,
I looked into the script and without the -si flag it should not require an xserver running. Perhaps it is a problem with the docker/riegel stuff...
Hello! Thank you for running those tests, it most likely is on either rigel or docker's side, as you have said. I've since found someone who faced a similar problem with Gitlab CI and solved it using Xvfb (https://forum.gitlab.com/t/run-things-that-need-a-glx-x-server-on-gitlab-ci/47440) and so I'll try to reproduce their solution.
Although, I wonder @rarrais and @MisterOwlPT, when working with AWS and running the simulations online, this wouldn't be a problem, right?
Hello @Kazadhum,
I've been doing some tests and I think I found a solution to your problem.
I took your CI/CD workflow and executed every step manually inside an empty AWS EC2 instance.
I cloned your fork of Atom and installed all dependencies as per the main.yml
file inside the .github/workflows
folder (i.e., rigel, your plugin, system dependencies, ...).
I copied the file rigelfiles/Rigelfile_1
to the root of the repository and renamed it to Rigelfile
.
Executing the command rigel run sequence test
and then docker logs -f calibration_evaluation
I was able to replicate the error.
Solution:
- I locally altered your image (
dvieira2001/atom:latest
) and installedxvfb-run
(apt install xvfb
) inside it. It allows you to run graphical applications without a "real" display (it creates one in memory); - I altered the Docker execution command in the Rigelfile to
["/bin/bash", "-c", "xvfb-run rosrun atom_calibration calibrate ... && xvfb-run rosrun atom_evaluation rgb_to_rgb_evaluation ..."]
. Note thatxvfb-run
was added before every sub command. This ensures everything can communicate with the virtual X server; - I used X server
:99
(export DISPLAY=:99
). This is the default server number used;
This way everything worked out perfectly.
I saw that you tried to use xvfb-run
without success before.
Can you try one more time with these steps? Maybe you missed something.
Don't forget to update the image first! Consider adding the dependency on the Rigelfile and deploying it to the registry.
NOTE: since everything "graphical" is handled inside the container I found it irrelevant to map
/tmp/.X11-unix -> /tmp/.X11-unix
;
Let me know if this message was useful and if your problem was solved 😃
PS: I found a typo in the field command
of the simulation_and_robot
component in the Compose plugin. Looking at the Dockerfile I don't think it is that important but still... you are using bin/bash
instead of /bin/bash
. This is causing the container to fail.
Hi @MisterOwlPT! Thank you so much for the testing you've done! I've been running some tests with the corrections you've made and, even though it's still not working properly, it's for another reason!
So, before, nothing ran in the calibration_evaluation
. Now, however, the calibration procedure is, indeed, run. The evaluation process, however, is not. I suspect if might be because of the command syntax, so I'll try some alternatives. But it seems the problem with the X server is indeed solved! I had tried to use xvfb
before but I hadn't installed in inside the container, so I assume that was the core issue.
Hello @miguelriemoliveira, @rarrais and @MisterOwlPT! I can confirm it works! here the log of the successful CI workflow run.
The problem:
Turns out that, for some reason, it wasn't running the last command. So, when I had:
command: ["/bin/bash", "-c", "xvfb-run rosrun atom_calibration calibrate -json $ATOM_DATASETS/t2rgb/dataset.json -v && xvfb-run rosrun atom_evaluation rgb_to_rgb_evaluation -train_json $ATOM_DATASETS/t2rgb/atom_calibration.json -test_json $ATOM_DATASETS/t2rgb/dataset.json --sensor_source right_camera --sensor_target left_camera -sfr $HOME/"]
in the Rigelfile, only xvfb-run rosrun atom_calibration calibrate -json $ATOM_DATASETS/t2rgb/dataset.json
was being run.
The solution:
I replaced the command above with the following:
command: ["/bin/bash", "-c", "xvfb-run rosrun atom_calibration calibrate -json $ATOM_DATASETS/t2rgb/dataset.json -v && xvfb-run rosrun atom_evaluation rgb_to_rgb_evaluation -train_json $ATOM_DATASETS/t2rgb/atom_calibration.json -test_json $ATOM_DATASETS/t2rgb/dataset.json --sensor_source right_camera --sensor_target left_camera -sfr $HOME/ && cd $HOME/ && ls"]
This way, only the ls
command is not run. Even though I have no clue as to why it works like this, it apparently does :smile:
Hi @miguelriemoliveira and @rarrais!
Happy to report that code coverage using Codacy is set up!
I wrote a couple of unit tests for the naming.py
module in the atom_core
package. Then, I used the coverage python package to produce a coverage report.
By doing this inside of a CI/CD workflow and using the codacy-coverage-reporter
action in GitHub (a secret Codacy API token is needed for this), the code coverage report is shown in Codacy, updated every time a push
event occurs.
Here's the workflow file I used and the log for the respective Actions job.
As you can see, Codacy has identified a number of issues in it's static code analysis, that go against standard coding practises:
An example of the type of issues it encountered:
When it comes to code testing coverage, only 2 files are counted as being covered: the naming.py
module and the test_naming.py
file, containing the unit tests.
It doesn't really make much sense to write a lot more tests for ATOM, but this is a positive contribution to the CI/CD pipeline to be used for other ROS applications, especially if these are developed with testing in mind (perhaps using Test Driven Development methods).
Hi @Kazadhum ,
Nice progress, and good results for the thesis. I would say that testing is sufficient, however, I'm concerned about the image you posted - surely not 100% of the ATOM source code is covered by the testing you coded. Could you please try to update such an indication to a realistic percentage? It might be necessary for you to adjust the files that are taken into account to compute this coverage percentage, so as to include them in the analysis even if they do not contain unit testing.
Another suggestion that you might want to come back to in the future, if time allows, is to go over the style/security indications provided by Codacy and actually suggest changes on a pull request that would fix those existing issues. I believe the comparison between before and after intervention would be a good outcome/result of your work.
Hey @rarrais!
These results will complement my thesis nicely, I agree.
About the code coverage, I've tried accounting for all Python scripts using Coverage.py but I have been unsuccessful. Weirdly, the report produced seems to only account for the lines of code in scripts already contemplated by tests (and the test scripts themselves).
I also think that the comparison between the "before and after" intervention to fix these issues seems like a good addition, and I'll definitely come back to it soon!