unity-actions
unity-actions copied to clipboard
Many times container stuck when action cancelled. Provide some behaviour for cleaning frozen containers from docker.
Context
Cancelling run from actions dashboard
Suggested solution
I suggest to check all containers before starting current and do something like described here: https://stackoverflow.com/questions/40744000/remove-all-docker-containers-except-one
We can check in couple ways:
- just by timeout (e.g. we know that regular build runs about 25 minutes, so if container lives more than 30-40 – kill it)
- or maybe kill all
unity containerswhich runs for now (it means I will not start two tasks in the one docker at the same time, so if I starting new one and there still couple of older – they all are dead
Also I should to be able provide list of names my own containers that should stay leave in this docker
Or maybe not check others but just kill all unity containers.
Example of implementation:
class Docker {
static async build(buildParameters, silent = false) {
const {path, dockerfile, baseImage} = buildParameters;
const {version} = baseImage;
const tag = ImageTag.createForAction(version);
// something like this should be added to docker before run
const cleanupCommand = `docker close $(docker ps -a -q | grep -v "{my_other_containers_ids}")`;
await exec(cleanupCommand, null, {silent});
//-––––
const command = `docker build ${path} \
--file ${dockerfile} \
--build-arg IMAGE=${baseImage} \
--tag ${tag}`;
await exec(command, null, {silent});
return tag;
}
static async run(image, parameters, silent = false) {
//-----
}
}
Better will be operate with action cancelling and close previous run properly. But I don't know how it works and can't suggest solution.
I've suggested solution which I understand and know how it works.
I believe this is on GitHub side and they have been improving the working of the runner behaviour by a lot of the past year.
Closing but feel free to reopen if this issue persists.
We just had a few android builds that were stuck because of unity being stuck. They were then marked as cancelled in github actions based on the timeout set in the workflow file. However when I checked a day or 2 later, the docker containers were still running on the host at 100% cpu usage. So it seems that docker containers of timed out runs are sometimes not properly cleaned up.
Github actions has the timeout-minutes: 30 option - I guess it would be nice if there was a way to pass it on to docker run. But I haven't found such an option in that command ( --stop-timeout is for something else )
Other solutions were to incorporate timeout into the docker call or the entry point. E.g. people use timeout 3000 docker run --rm mycontainer entrypoint.sh or docker run --rm mycontainer timeout 3000 entrypoint.sh
The proposed solution from the initial post could also work, if they were all tagged with e.g. gameci-runner, so no other docker containers are cleaned up on the host system.
Or another possibility is adding a clean up step in the builder actions, that runs after the job is cancelled/failed. It would kill the started container ( if it's still alive ). I think that would be the cleanest solution, as it leaves the timeout setting to github actions.
So the unity-builder step would add the started container id to the outputs. And then a post-action step with post-if: ${{ failed() || cancelled() }} could run something like docker rm -f ${{ outputs.runner-id.container-id }} &>/dev/null && echo 'Cleaned up container'
( Something like this, not sure how the actions syntax is for clean up steps, but I've seen other actions having clean up steps that run after all other steps were done )
Edit: Seems to add a post step and post condition is as easy as : https://github.com/MasterworksIO/action-local-cache/blob/main/action.yml#L16-L17 Documentation : post & post-if
Perhaps this is also something we could force using the CLI once it's implemented.
@webbertakken if you don't want to add more outputs ( as currently the build step has none ) then the docker container could simply receive the current job id as a tag ( e.g. -t gameci_builder_23452525 )
And then the post step could docker kill gameci_builder_23452525.
Since the docker container already runs with docker run --rm it will auto-remove itself.
( I don't know what you mean with CLI unfortunately. With the post step it would be done automatically which would be great. I discovered the hanging containers 2 days later, after wondering why suddenly all other builds were so slow 😓 )
To find out what we mean by CLI, you may see the description of our (new) Roadmap for v3.0.0