gitea Actions-Runner-Controller support for Gitea Actions

Feature Description

The Gitea Actions release was a great first step. But currently it's missing many features of a more mature solution based on K8s runners rather then single nodes. While it's possible to have runners on K8s this currently requires DinD which has it's hole set of own problems, security issues (privileged exec required as of today) and feature limitations (can't use DinD to start another container to build a container image (DinDinD)). I know with buildx workarounds exist, but those are just that: workarounds.

I think the next step could be something like what actions-runner-controller is doing for GitHub actions. Basically a operator that is deployed on K8s and registers as runner. Every job it starts is then started in it's own pod rather then the runner itself. The runner coordinates the pods.

Related docs:

Screenshots

No response

Mar 03 '24 21:03 omniproc

k8s hooks are technically (means there is no documentation, the docker compose examples use dind + docker hooks) already usable with Gitea Actions see this third party runner adapter https://gitea.com/gitea/awesome-gitea/pulls/149

Actions-Runner-Controller would require emulation of a bigger set of internal githib actions api

I actually find this interesting to reverse engineer that product too, but I never dealt with k8s myself.

act_runner with it's act backend doesn't support container hooks or k8s for the time beeing

Mar 04 '24 16:03 ChristopherHX

Interesting. I wasn't aware you could change the runner implementation just like that. Def will look into it. However given what you said about DinD still being a requirement I don't think it will change much (we already have our runners on K8s with DinD using a adopted version of gitea/act-runner for k8s but as mentioned, this comes with many headaches).

The goal IMHO would be to be able to start workflows on k8s directly. Possible implementations:

Every job is it's own pod. Challenge: data sharing between jobs would require PVs and complicated mount/unmount logic to support the more common RWO PVs. I'm aware that currently github's approach to data sharing between jobs is "yo dawg, just upload it to our artifact store" but in on-prem scenarios that's not what you normally want so some sort of common local cache between jobs is a relevant feature at least I would be very interested in.
Every workflow is a pod. Jobs start as containers. Benefit: all containers can have access to the same data easily using e.g. a EmptyDir volume. Challenge: pods are immutable so:
- either all jobs (==containers) need to be present when the pod starts, requiring some kind of wait logic when we need job dependencies which pbl. comes with it's own set of problems.
- possibly ephemeral container could be used to add containers (==jobs) to a pod at runtime when a dependent job is ready. However ephemeral containers come with a set of limitations and are ment for a different use case, so I'm not sure if that would be a good fit.

Option one (every job is it's own pod) seems like the most promissing option in my opinion.

Mar 05 '24 05:03 omniproc

However given what you said about DinD still being a requirement I don't think it will change much

I meant, I didn't create any k8s mode examples / actually tried it yet. Sorry for confusion here.

The docker container hooks only allow dind for k8s. While the k8s hooks should use kubernetes api for container management, I still need to look into creating a test setup running.

I can imagine

(controller) actions_runner is started with maxparallel 100 (yes it's possible to use any value >= 1)
(job controller) a worker script (spawned when a job request is received) forwards stdin and the network to the adapter to spawn the actions/runner
(actual job) k8s hooks spawn a job container using k8s apis

Well not using act_runner has limitations when you try to use Gitea Actions Extensions (using features not present in GitHub Actions)

I think option 1 is more likly to happen than option 2. Job scheduling is based on jobs not on workflows.

Mar 05 '24 10:03 ChristopherHX

k8shooks works for me using these files on minikube (arm64)

actions-runner-k8s-gitea-sample-files.zip

Missing usage of secrets, need to learn kubernetes
No autoscaling
No persistence of runner credentials

With clever sharing of the runner credentials volume, you could start a lot of replicas for more parallel runners

This works without dind

Test workflow

on: push
jobs:
  _:
    runs-on: k8s # <-- Used runner label
    container: ubuntu:latest # <-- Required, maybe the Gitea Actions adapter could insert a default
    steps:
    # Git is needed for actions/checkout to work for Gitea, rest api is not compatible
    - run: apt update && apt install -y git
    - uses: https://github.com/actions/checkout@v3 # <-- The almost only Gitea Extension supported
    - run: ls -la
    - run: ls -la .github/workflows

The runner-pod-workflow is the job container pod, running directly via k8s.

Mar 05 '24 15:03 ChristopherHX

Looks promising. I'll give it a shot and share my findings.

Mar 05 '24 18:03 omniproc

Okay, so... there seems to be some issues with the current setup. Let me share my findings:

You've been asking how to provide secrets in K8s, it's as simple as that:

- name: GITEA_RUNNER_REGISTRATION_TOKEN
   valueFrom:
      secretKeyRef:
        name: secret_name
        key: secret_key

and creating your secret with (take care: K8s is case sensitive):

apiVersion: v1
kind: Secret
metadata:
  name: secret_name
type: Opaque
stringData:
  secret_key: "s3cr3t"

You shouldn't start pods in K8s directly but rather wrap them into a higher level resource such as a deployment which will make it benefit from the (deployment) controller logic when updating or self-healing the pod. I did that so the result looks something like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: runner
  name: runner
spec:
  replicas: 1
  selector:
    matchLabels:
      app: runner
  template:
    metadata:
      labels:
        app: runner
    spec:
      strategy:
        type: Recreate
      restartPolicy: Always
      serviceAccountName: ci-builder
      #securityContext:
      #  runAsNonRoot: true
      #  runAsUser: 1000
      #  runAsGroup: 1000
      #  seccompProfile:
      #    type: RuntimeDefault
      volumes:
        - name: workspace
          emptyDir:
            sizeLimit: 5Gi
      containers:
      - name: runner
        image: ghcr.io/christopherhx/gitea-actions-runner:v0.0.11
        #securityContext:
        #  readOnlyRootFilesystem: true
        #  allowPrivilegeEscalation: false
        #  capabilities:
        #    drop:
        #      - ALL
        volumeMounts:
          - mountPath: /home/runner/_work
            name: workspace
        env:
          - name: ACTIONS_RUNNER_POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
            value: "true"
          - name: ACTIONS_RUNNER_CONTAINER_HOOKS
            value: /home/runner/k8s/index.js
          - name: GITEA_INSTANCE_URL
            value: https://foo.bar
          - name: GITEA_RUNNER_REGISTRATION_TOKEN
            valueFrom:
              secretKeyRef:
                name: gitea
                key: token
          - name: GITEA_RUNNER_LABELS
            value: k8s
        resources:
          requests:
            cpu: 500m
            memory: 2Gi
          limits:
            cpu: 1000m
            memory: 8Gi

Few changes I made here:

For the volume if no persistence across new pods started by the runner is needed a volume of type emptyDir can act as a temporary volume to share data between containers of a pod and write data to a well known location.
I added a resources section to follow best practice. numbers pbl. need to be adopted to something that makes more sense.
I added securityContext but needed to disable it for now for trouble shooting since it currently can't work as needed because of some issues with the current runner setup:
- The Dockerfile switches to the runner user using it's name in USER runner. K8s doesn't like that if runAsNonRoot is specified but no runAsUser is given in the security context and the image is using a "non-numeric" user. I'd opt in for using USER 1000 in the Dockerfile instead, which should make this easier in the future.
- allowPrivilegeEscalation: false can't currently be used because start.sh makes use of sudo to create the folder layout: sudo chown -R runner:docker /home/runner/_work and sudo chown -R runner:docker /data. I think a better approach would be to just create those folders within the mounted EmptyDir volume. The running user should already have all permissions there to create the folders so no sudo would be needed but I'm not sure what those folders are currently used for and how hardcoded those paths are.
- readOnlyRootFilesystem will pbl. also cause issues in the future when other paths then the mounted volume is used and again, I think the easiest way to allow for max. container security in k8s would be to simply not use the root fs at all but simply do everything on the mounted volume.

So, those are simply improvement suggestions for the future. For now as you can see I've been trying to keep it as simple as possible, but I still run into a issue. The runner starts and registers, but when using the job you provided I run into the following error returned by the job:


[WORKER 2024-03-12 15:59:08Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2024-03-12 15:59:08Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2024-03-12 15:59:08Z INFO HostContext] Well known config file 'Credentials': '/home/runner/.credentials'
[WORKER 2024-03-12 15:59:08Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2024-03-12 15:59:08Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2024-03-12 15:59:08Z INFO HostContext] Well known config file 'Runner': '/home/runner/.runner'
[WORKER 2024-03-12 15:59:08Z INFO Worker] Version: 2.314.0
[WORKER 2024-03-12 15:59:08Z INFO Worker] Commit: bc79e859d7b66e8018716bc94160656f6c6948fc
[WORKER 2024-03-12 15:59:08Z INFO Worker] Culture: 
[WORKER 2024-03-12 15:59:08Z INFO Worker] UI Culture: 
[WORKER 2024-03-12 15:59:08Z INFO Worker] Waiting to receive the job message from the channel.
[WORKER 2024-03-12 15:59:08Z INFO ProcessChannel] Receiving message of length 6322, with hash '30564f1b4d3e28c3d9cc39d17eca1132cc026a2abeb6ab1be6736d80cf019ea9'
[WORKER 2024-03-12 15:59:08Z INFO Worker] Message received.
Newtonsoft.Json.JsonReaderException: Invalid character after parsing property name. Expected ':' but got:  . Path 'ContextData.github.d[20].v.d[5].v.d[14].v.d[11].v', line 1, position 6322.
   at Newtonsoft.Json.JsonTextReader.ParseProperty()
   at Newtonsoft.Json.JsonTextReader.ParseObject()
   at Newtonsoft.Json.Linq.JContainer.ReadContentFrom(JsonReader r, JsonLoadSettings settings)
   at Newtonsoft.Json.Linq.JContainer.ReadTokenFrom(JsonReader reader, JsonLoadSettings options)
   at Newtonsoft.Json.Linq.JObject.Load(JsonReader reader, JsonLoadSettings settings)
   at Newtonsoft.Json.Linq.JObject.Load(JsonReader reader)
   at GitHub.DistributedTask.Pipelines.ContextData.PipelineContextDataJsonConverter.ReadJson(JsonReader reader, Type objectType, Object existingValue, JsonSerializer serializer)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.DeserializeConvertable(JsonConverter converter, JsonReader reader, Type objectType, Object existingValue)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.PopulateDictionary(IDictionary dictionary, JsonReader reader, JsonDictionaryContract contract, JsonProperty containerProperty, String id)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateObject(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateValueInternal(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.SetPropertyValue(JsonProperty property, JsonConverter propertyConverter, JsonContainerContract containerContract, JsonProperty containerProperty, JsonReader reader, Object target)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.PopulateObject(Object newObject, JsonReader reader, JsonObjectContract contract, JsonProperty member, String id)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateObject(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateValueInternal(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.Deserialize(JsonReader reader, Type objectType, Boolean checkAdditionalContent)
   at Newtonsoft.Json.JsonSerializer.DeserializeInternal(JsonReader reader, Type objectType)
   at Newtonsoft.Json.JsonSerializer.Deserialize(JsonReader reader, Type objectType)
   at Newtonsoft.Json.JsonConvert.DeserializeObject(String value, Type type, JsonSerializerSettings settings)
   at Newtonsoft.Json.JsonConvert.DeserializeObject[T](String value, JsonSerializerSettings settings)
   at GitHub.Runner.Sdk.StringUtil.ConvertFromJson[T](String value)
   at GitHub.Runner.Worker.Worker.RunAsync(String pipeIn, String pipeOut)
   at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)
[WORKER 2024-03-12 15:59:09Z ERR  Worker] Newtonsoft.Json.JsonReaderException: Invalid character after parsing property name. Expected ':' but got:  . Path 'ContextData.github.d[20].v.d[5].v.d[14].v.d[11].v', line 1, position 6322.
[WORKER 2024-03-12 15:59:09Z ERR  Worker]    at Newtonsoft.Json.JsonTextReader.ParseProperty()
[WORKER 2024-03-12 15:59:09Z ERR  Worker]    at Newtonsoft.Json.JsonTextReader.ParseObject()
[WORKER 2024-03-12 15:59:09Z ERR  Worker]    at Newtonsoft.Json.Linq.JContainer.ReadContentFrom(JsonReader r, JsonLoadSettings settings)
[WORKER 2024-03-12 15:59:09Z ERR  Worker]    at Newtonsoft.Json.Linq.JContainer.ReadTokenFrom(JsonReader reader, JsonLoadSettings options)
[WORKER 2024-03-12 15:59:09Z ERR  Worker]    at Newtonsoft.Json.Linq.JObject.Load(JsonReader reader, JsonLoadSettings settings)
[WORKER 2024-03-12 15:59:09Z ERR  Worker]    at Newtonsoft.Json.Linq.JObject.Load(JsonReader reader)
[WORKER 2024-03-12 15:59:09Z ERR  Worker]    at GitHub.DistributedTask.Pipelines.ContextData.PipelineContextDataJsonConverter.ReadJson(JsonReader reader, Type objectType, Object existingValue, JsonSerializer serializer)
[WORKER 2024-03-12 15:59:09Z ERR  Worker]    at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.DeserializeConvertable(JsonConverter converter, JsonReader reader, Type objectType, Object existingValue)
[WORKER 2024-03-12 15:59:09Z ERR  Worker]    at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.PopulateDictionary(IDictionary dictionary, JsonReader reader, JsonDictionaryContract contract, JsonProperty containerProperty, String id)
[WORKER 2024-03-12 15:59:09Z ERR  Worker]    at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateObject(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue)
[WORKER 2024-03-12 15:59:09Z ERR  Worker]    at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateValueInternal(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue)
[WORKER 2024-03-12 15:59:09Z ERR  Worker]    at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.SetPropertyValue(JsonProperty property, JsonConverter propertyConverter, JsonContainerContract containerContract, JsonProperty containerProperty, JsonReader reader, Object target)
[WORKER 2024-03-12 15:59:09Z ERR  Worker]    at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.PopulateObject(Object newObject, JsonReader reader, JsonObjectContract contract, JsonProperty member, String id)
[WORKER 2024-03-12 15:59:09Z ERR  Worker]    at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateObject(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue)
[WORKER 2024-03-12 15:59:09Z ERR  Worker]    at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateValueInternal(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue)
[WORKER 2024-03-12 15:59:09Z ERR  Worker]    at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.Deserialize(JsonReader reader, Type objectType, Boolean checkAdditionalContent)
[WORKER 2024-03-12 15:59:09Z ERR  Worker]    at Newtonsoft.Json.JsonSerializer.DeserializeInternal(JsonReader reader, Type objectType)
[WORKER 2024-03-12 15:59:09Z ERR  Worker]    at Newtonsoft.Json.JsonSerializer.Deserialize(JsonReader reader, Type objectType)
[WORKER 2024-03-12 15:59:09Z ERR  Worker]    at Newtonsoft.Json.JsonConvert.DeserializeObject(String value, Type type, JsonSerializerSettings settings)
[WORKER 2024-03-12 15:59:09Z ERR  Worker]    at Newtonsoft.Json.JsonConvert.DeserializeObject[T](String value, JsonSerializerSettings settings)
[WORKER 2024-03-12 15:59:09Z ERR  Worker]    at GitHub.Runner.Sdk.StringUtil.ConvertFromJson[T](String value)
[WORKER 2024-03-12 15:59:09Z ERR  Worker]    at GitHub.Runner.Worker.Worker.RunAsync(String pipeIn, String pipeOut)
[WORKER 2024-03-12 15:59:09Z ERR  Worker]    at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)

##[Error]failed to execute worker exitcode: 1

/Edit so the root cause seems to be somewhere here: https://github.com/actions/runner/blob/v2.314.0/src/Runner.Worker/Program.cs#L20

In addition I found that providing a runner config by mounting one and setting the CONFIG_FILE env var doesn't seem to work, you'll get a Error: unknown flag: --config if you try. Root cause seems to be this.

Mar 12 '24 16:03 omniproc

I didn't got this this kind of error before (at least for a year)

Receiving message of length 6322, with hash '30564f1b4d3e28c3d9cc39d17eca1132cc026a2abeb6ab1be6736d80cf019ea9' [WORKER 2024-03-12 15:59:08Z INFO Worker] Message received. Newtonsoft.Json.JsonReaderException: Invalid character after parsing property name. Expected ':' but got: . Path 'ContextData.github.d[20].v.d[5].v.d[14].v.d[11].v', line 1, position 6322.

Sounds like the message inside the container got trimmed before it reached the actions/runner.

Based on the error the begin was sent to the actions/runner successfully

Maybe some data specfic to your test setup might cause this. (even parts not in the repo are stored in the message)

I would need to add more debug logging to diagnose this

Mar 12 '24 23:03 ChristopherHX

If you add the logging I can reproduce the issue if you like. My guess is that's it's maybe proxy related. But can't tell from the error logs.

Mar 13 '24 14:03 omniproc

@omniproc you made changes via the deployment file that are not compatible with actions/runner k8s container hooks and I have no idea if using a deployment is possible. Actions-Runner-Controller might use helm charts + kubernetes api, not shure how they do that.

Unable to attach or mount volumes: unmounted volumes=[work], unattached volumes=[], failed to process volumes=[work]: error processing PVC default/runner-785778b969-v88f8-work: failed to fetch PVC from API server: persistentvolumeclaims "runner-785778b969-v88f8-work" not found

the workspace cannot be an empty dir volume, like in my example files it is required to be a persistentvolumeclaim

You can technically change the name of the pvc via ACTIONS_RUNNER_CLAIM_NAME env, but I don't know how to get a dynamically generated name of a volume. See https://github.com/actions/runner-container-hooks/blob/main/packages/k8s/README.md, if that doesn't match it will error out.

allowPrivilegeEscalation: false can't currently be used because start.sh makes use of sudo to create the folder layout: sudo chown -R runner:docker /home/runner/_work and sudo chown -R runner:docker /data. I think a better approach would be to just create those folders within the mounted EmptyDir volume. The running user should already have all permissions there to create the folders so no sudo would be needed but I'm not sure what those folders are currently used for and how hardcoded those paths are.

This led mkdir /data fail and you get an error about a .runner file.

Would require an empty dir mount

          - mountPath: /data
            name: data

Maybe if I create that dir in the Dockerfile it would work without that as long your fs is read write

The nightly doesn't have sudo anymore in the start.sh file, but it can still certainly break existing non k8s setups as of now.

If you add the logging I can reproduce the issue if you like. My guess is that's it's maybe proxy related. But can't tell from the error logs.

I found a mistake in the python wrapper file, probably due to resource constaints to RAM has os.read read less than expected and shorten the message.

I also added some asserts about return values of pipe communication + env ACTIONS_RUNNER_WORKER_DEBUG would print the job message from python side.

Please try to use that nightly image https://github.com/ChristopherHX/gitea-actions-runner/pkgs/container/gitea-actions-runner/190660665?tag=nightly important change to the os/arch tab and copy full tag + sha variant, I had problems with old cached nightly images.

it should get you to the point that you omited the persistentvolumeclaims of my example and kubernetes cannot start the job pod (also make shure to create an empty dir mount at /data/)

Mar 15 '24 10:03 ChristopherHX

I'm now able to start the runner in k8s namespace with DinD mode. How can I scale up the runners by setting replica=2 or 3?

May 15 '24 05:05 xyziven

@ChristopherHX Hi, an interesting project there!

Just a little advice here:

You can technically change the name of the pvc via ACTIONS_RUNNER_CLAIM_NAME env, but I don't know how to get a dynamically generated name of a volume.

I used StatefulSet and its volumeClaimTemplates functionality to dynamically provision PVCs and get its PVC names into the container as env var. You can use https://kubernetes.io/docs/tasks/inject-data-application/define-interdependent-environment-variables/ to define env var that's dependent on another.

Like the following:

  volumeClaimTemplates:
    - metadata:
        name: work
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi

and refer as

          env:
            - name: ACTIONS_RUNNER_POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: ACTIONS_RUNNER_CLAIM_NAME
              value: work-$(ACTIONS_RUNNER_POD_NAME)

A full working example that I tested is also available from https://github.com/traPtitech/manifest/blob/3ff7e8e6dfa3e0e4fed9a9e8ca1ad09f9b132ff1/gitea/act-runner/gitea-act-runner.yaml.

Jun 01 '24 12:06 motoki317

Thanks for your example, it makes manual scaling pretty straightforward and works in minikube for testing purposes even with 4 replicas.

The first time I read your response I thought work-$(ACTIONS_RUNNER_POD_NAME) looks like dark magic as a kubernetes newby, since it looks like an indirect resource naming assumption

Jun 02 '24 12:06 ChristopherHX

@ChristopherHX

is it possible that the runner doesn't yet support no-proxy? With the http_proxy / https_proxy and no_proxy env vars set, I see the runner using the proxy:

[WORKER 2024-07-23 12:09:08Z INFO HostContext] Configuring anonymous proxy http://my.proxy/ for all HTTP requests.
[WORKER 2024-07-23 12:09:08Z INFO HostContext] Configuring anonymous proxy http://my.proxy/ for all HTTPS requests.

but it doesn't mention the no_proxy setting and later on errors when trying to connect to itself using it's pod IP (which is in the no_proxy list)

[WORKER 2024-07-23 12:09:09Z ERR  GitHubActionsService] GET request to http://172.27.1.66:42791/_apis/connectionData?connectOptions=1&lastChangeId=-1&lastChangeId64=-1 failed. HTTP Status: Forbidden

Jul 23 '24 12:07 omniproc

I didn't go myself through the limitations of actions/runner proxy support

https://github.com/actions/runner/blob/41bc0da6fe09466d23fd37d691feeb68dd3b4338/docs/adrs/0263-proxy-support.md?plain=1#L51

They seem to ignore ip exclusions

We will not support IP addresses for no_proxy, only hostnames.

https://github.com/actions/runner/blob/41bc0da6fe09466d23fd37d691feeb68dd3b4338/src/Runner.Sdk/RunnerWebProxy.cs#L171

Not shure how my gitea runner can switch to hostnames, maybe try to reverse dns the ip and automatically add it to NO_PROXY?

Jul 23 '24 12:07 ChristopherHX

You can simply use something like this to add it to no-proxy:

- name: POD_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
            - name: no_proxy
              value: localhost,127.0.0.1,.local,$(POD_IP)

But that won't work if they ignore IP addresses for no_proxy (as in fact, I tested it and it doesn't work). So, why does the runner try to contact itself via it's external interface anyway? Why not use localhost?

Besides you could always use the DNS service built in k8s, but that would only work if that DNS name is used by the runner instead of the IP, see https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pods

Jul 23 '24 13:07 omniproc

Why not use localhost?

I can send a single hostname/ip + port to the actions/runner

If I send localhost

artifacts are broken
cache is broken Those are requests from the job container that use the same endpoint

If my gitea runner adapter would be part of the Gitea backend, we would have a real hostname that forwards into nested containers, like the ARC has it

Jul 23 '24 14:07 ChristopherHX

well, then let's use k8s DNS service. That would work.

I can send a single hostname/ip + port to the actions/runner

Can you point me to where this is done? Can this be configured?

Jul 23 '24 14:07 omniproc

Can you point me to where this is done? Can this be configured?

Here I set the connection address, but there is some url entries, maybe one of them is still pointing to the gitea instance without redirection https://github.com/ChristopherHX/gitea-actions-runner/blob/main/runtime/task.go#L734-L745

Jul 23 '24 15:07 ChristopherHX

So, if I understand that correct nektos artifactcache allows you to set outboundIP as endpoint in StartHandler. Although the name implies an IP address it is actually just a string and there seems to be no validation of it, besides a fallback to use the interface IP if none was provided. But it seems like in gitea-actions-runner always the interface IP is provided to the handler.

Because gitea-act-runner always uses an IP and actions/runner doesn't support no_proxy for IPs as you pointed out, this means currently there is no functional proxy support.

So, what do you think about making the IP configurable via an environment variable? GITEA_CACHE_HOST or something similar?

Jul 23 '24 15:07 omniproc

Is it currently viable to run gitea actions on k8s or is this still very much a work in progress?

Jul 23 '24 20:07 TomTucka

So, what do you think about making the IP configurable via an environment variable? GITEA_CACHE_HOST or something similar?

Yes I agree mostly about this. However I wouldn't put CACHE into the env. More something like GITEA_ACTIONS_RUNNER_HOST, because a fake actions runtime is also implemented in the runner (used more than one port for tcp listeners).

Eventually if unset prefer hostnames over ips, but that needs testing on my side. (probably behind a feature env var)

I would queue this tomorrow into my todo list, working on multiple projects...

Jul 24 '24 22:07 ChristopherHX

@omniproc Proxy should now work in tag v0.0.13

Use the following env

            - name: GITEA_ACTIONS_RUNNER_RUNTIME_USE_DNS_NAME
              value: '1'
            - name: GITEA_ACTIONS_RUNNER_RUNTIME_APPEND_NO_PROXY # this appends the dns name of the pod to no_proxy
              value: '1'
            - name: http_proxy
              value: http://localhost:2939 # some random proxy address for testing, use the real one
            - name: no_proxy
              value: .fritz.box,10.96.0.1 # first exclusion for gitea, second for kubernetes adjust as needed

Jul 25 '24 11:07 ChristopherHX

@TomTucka

Is it currently viable to run gitea actions on k8s or is this still very much a work in progress?

This pretty much depends on your requirements, so more would try to use it so more issues can be found & fixed

Actions-Runner-Controller itself won't work (yet)
- only the k8s mode from the runner provided by GitHub (not Gitea) is working
- proxy support is work in progress (not confirmed to work in real world k8s proxy environment)
- updating the runner fleet is experimental, seems like it kills jobs on changes as of now
- review behavior differences carefully and this is not supported by Gitea Maintainer / Company etc.
- currently you can find in this issue some examples for stateful replica sets
- Missing autoscaling, only static scaling as of now
if you heavily use Gitea Actions features or want Gitea Support then only the dind option remains as of now, this is discussed somewhere else e.g. on
- https://github.com/go-gitea/gitea/issues/
- https://gitea.com/gitea/helm-chart/issues
  - https://gitea.com/gitea/helm-chart/issues/459
  - https://gitea.com/gitea/helm-chart/pulls/666
- https://gitea.com/gitea/act_runner/issues
  - https://gitea.com/gitea/act_runner/issues/31

Jul 25 '24 15:07 ChristopherHX

@ChristopherHX testing v0.0.13... getting closer... now TLS errors. I'm not sure from the log output if this happens due to the mentioned RFC 6066 issue (I was under the impression that now DNS names will be used so not sure why this is logged anyway) or because the CA of the proxy is missing. I'll try to mount the CA to the runner and see what happens. First have to find out what location it's looking up for trusted CAs.

Current runner version: '2.317.0'
Secret source: Actions
Runner is running behind proxy server 'http://myproxy:8080/' for all HTTP requests.
Runner is running behind proxy server 'http://myproxy:8080/' for all HTTPS requests.
Prepare workflow directory
Prepare all required actions
Getting action download info
Download action repository 'https~//github.com/actions/checkout@v4' (SHA:N/A)
Complete job name: test
##[group]Run '/home/runner/k8s/index.js'
shell: /home/runner/externals/node16/bin/node {0}
##[endgroup]
(node:50) [DEP0123] DeprecationWarning: Setting the TLS ServerName to an IP address is not permitted by RFC 6066. This will be ignored in a future version.
(Use `node --trace-deprecation ...` to show where the warning was created)
##[error]Error: unable to get local issuer certificate
##[error]Process completed with exit code 1.
##[error]Executing the custom container implementation failed. Please contact your self hosted runner administrator.
##[group]Run '/home/runner/k8s/index.js'
shell: /home/runner/externals/node16/bin/node {0}
##[endgroup]
(node:61) [DEP0123] DeprecationWarning: Setting the TLS ServerName to an IP address is not permitted by RFC 6066. This will be ignored in a future version.
##[error]Error: unable to get local issuer certificate
(Use `node --trace-deprecation ...` to show where the warning was created)
##[error]Process completed with exit code 1.
##[error]Executing the custom container implementation failed. Please contact your self hosted runner administrator.
Cleaning up orphan processes
Finished

Jul 26 '24 06:07 omniproc

@omniproc nodejs ignores ca certs from common locations on linux and does it's own thing, point env NODE_EXTRA_CA_CERTS to your certbundle file including your kubernetes api cert chain

that cert bundle needs to be mounted to the runner container.

I assume this undescriptive very short error comes from kubernetes api access via https from node Got somthing similar short if I didn't add it to no_proxy and my proxy didn't even exist.

your kubernetes api controller is defined to be accessed by an ip addr like mine
your kubernetes api controller uses https unlike mine
it's tls cert has it's name set to an ip or it would be never valid, I didn't know node deprecated that

For the dind backend I wrote an provisions script for my self signed certs for all containers run by actions/runner, I could look into creating containers using modified k8s hooks for cert provisioning.

By default every container you use is assumed to have env NODE_EXTRA_CA_CERTS set and the ca folders populated if you use selfsigned certs, not really practicable...

EDIT Is your kubernetes api accessed by your proxy?

Jul 26 '24 10:07 ChristopherHX

@ChristopherHX I can confirm it's working. It was two issues (as you expected):

I did not configure no_proxy for the K8S API.
The cert of the git server was not trusted, setting NODE_EXTRA_CA_CERTS worked as you mentioned worked
I tested the checkout plugin and basic container commands, DinD build is the next thing i have to test

A few UX improvement suggestions here from my side. As a user when I configure no_proxy I usually only have the URLs in my mind that i know should not be proxied but have to be reached by the runner. I know them because I usually configure them explicitly in my pipeline (e.g. Git repo). What I don't know is what other stuff the runner has to reach. Of course, on second thought, it's obvious why Node tries to reach the K8s API. But since it's the runner who wants to reach it I think it should be the runner's responsibility to setup everything it can to make this happen. So my suggestion would be:

if http_proxy was set, auto add the K8S API exposed via the EnvVars as used by Node to the no_proxy list (e.g. KUBERNETES_SERVICE_HOST). A optional EnvVar flag to disable this default behavior would prbl be nice for edge cases where the K8s you try to reach is not the K8s the runner is executed on and has to be reached via a proxy (but I'm not sure if that's a use-case for the runner ATM).
if http_proxy was set, auto add the K8s API certificate to the NODE_EXTRA_CA_CERTS by default (/var/run/secrets/kubernetes.io/serviceaccount/ca.crt)
You can read everything about the exposed EnvVars and mounted certificates at https://kubernetes.io/docs/tasks/run-application/access-api-from-pod/

Jul 31 '24 09:07 omniproc

So now that the runner starts a new pod for the workflow I was trying to get DinD to work in it using catthehacker/ubuntu:act-22.04 as the job container image, which doesn't work since the docker socket is not available. I know that in theory it's possible because gitea/act_runner:nightly-dind-rootless can run DinD but that image is of course missing all the Act components.

So before I start fiddling around building a hybrid of catthehacker and dind-rootless: how did you get DinD to run @ChristopherHX ?

Jul 31 '24 10:07 omniproc

@omniproc I might have caused confusion here, I have not set up dind in the job container yet.

Did this only for the runner (that is by default a docker cli client) outside of kubernetes

I would expect using a custom fork of https://github.com/actions/runner-container-hooks could configure a dind installation installed on the external tools (the folder that has node20 etc. for the runner) on any job container

in theory it's possible because gitea/act_runner:nightly-dind-rootless can run DinD

This is similar like I did it e.g. in docker compose (docker hook mode), https://github.com/ChristopherHX/gitea-actions-runner/blob/main/examples/docker-compose-dind-rootless/docker-compose.yml

This only works if you don't use the k8s container hooks, but I'm not shure if the docker.sock bind mount works in that setup as I didn't make use of it

This approuch has flaws if you try to run the following you get strange bugs

docker run -v $PWD:$PWD ....

Jul 31 '24 10:07 ChristopherHX

@ChristopherHX so I got a working prototype of this. Instead of using DinD, which arguably is a security nightmare more often then not (or comes with many limitations as of today when running unprivileged), I switched to buildkit, which doesn't require any privileges and can be executed in a non-root pod. So the process currently looks like so:

Span a stateful set (gitea-actions-runner), as given by the example @motoki317 - this could be a deployment pbl. but for now i'm rolling with this
The gitea-actions-runner registers with Gitea and can now accept workflows. For every workflow a new pod is started by the gitea-actions-runner. I'm using a image that is based on catthehacker/ubuntu:act extended by a installation of buildkit (buildkitctl actually) to maximize compatibility with many Github actions.
The workflow pod uses buildkitctl to perform the actions against a buildkit container moby/buildkit:master-rootless. Currently I'm running the buildkit container as a sidecar container of the gitea-actions-runner stateful set, so this would scale along with the stateful set. Other szenarios could be done as long as the builtkit container is reachable from the workflow pod.
result is a container build without special privileges with the workflow being executed in a dedicated, deterministic pod environment without DinD and it's flaws.

Aug 05 '24 10:08 omniproc

@ChristopherHX is it possible that currently we can not pass environment variables using the env parameter? E.g.:

on:
  push

jobs:
  test:
    runs-on: myrunner
    container:
      image: ghcr.io/catthehacker/ubuntu:act-22.04 
    env:
      FOO: BAR

In this case FOO will not be set

Aug 06 '24 16:08 omniproc

gitea gitea copied to clipboard

Actions-Runner-Controller support for Gitea Actions

Feature Description

Screenshots

gitea
gitea copied to clipboard