Ben Hearsum (he/him)

Results 200 comments of Ben Hearsum (he/him)

For lack of a better place to track it - I just finished uploading all of the artifacts from [these task groups](https://github.com/mozilla/firefox-translations-training/issues/312#issuecomment-1946802157) with a set of hacky scripts. I'm planning...

We're already running docker tasks on generic-worker machine for CPU tasks. I did a quick test to see if its trivial to do this, but alas, https://github.com/mozilla/firefox-translations-training/pull/619 ran into some...

Taskcluster added support for d2g in the insecure engine: https://github.com/taskcluster/taskcluster/pull/7031. It looks like they may have a follow-up to deal with there, but once it's stablized we'll need to roll...

We appear to have been running an up-to-date enough worker-runner: > $ start-worker --version 2024/01/26 20:15:43 Error disabling OOM killer for the start-worker process: write /proc/1166/oom_adj: permission denied start-worker 59.1.3...

Curiously, https://firefox-ci-tc.services.mozilla.com/tasks/GYYVwr5RS-61EM8otXtMNg/runs/0 reports `CLAIM_EXPIRED`. https://github.com/taskcluster/taskcluster/blob/892c07ad0d6a8a5eecbfa704fabef4a17cc11581/workers/generic-worker/main.go#L520-L522 seems to suggest this should be `WORKER_SHUTDOWN`.

I've opened https://github.com/taskcluster/taskcluster/issues/6802 on the Taskcluster side for this. It's not clear to me whether it's my expectations that are wrong here, or there's a bug somewhere.

Another interesting thing from logs is this case, where we're polling every 30 seconds, and then 8 seconds after a poll the system starts shutting down: ``` Jan 30 10:32:00Z...

We've made a number of improvements in the past few months on the worker side. We now notice and respond to spot termination notices immediately, and we upload all artifacts...

Apologies for the slow reply - I didn't see this issue until now. It is technically _possible_ to run your own Taskcluster instance and run training on it, although I'm...

The work that @gabrielBusta is doing in #226 will be a good basis for this. The difference with spot terminations is that the tasks will automatically rerun, and we won't...