domjudge Modern Judgehost

This is only somewhat a feature request and mostly a discussion to explore some options. In particular, I would like to explore what your thoughts are / what you would like to have. Then, I would probably offer this as a student project (or multiple projects) and try to eventually contribute this.

Currently, the judgehost is a bit iffy to use as it relies on cgroup-v1 and cgroup-v2 is more and more becoming the standard. I think long-term a migration is inevitable?

Moreover, I think it is a bit tricky to use the judgehost environment to test Kattis problems: Ideally, one could use the exact same container, resource restrictions, etc. when locally testing compared to when running on the actual judging machine. Currently, problemtools AFAIK only uses the local installed tools etc.

So, here my idea / proposal:

Have a Python script that manages API interaction with the judge, downloads submissions that need to be judged and upload the results etc. (I would argue that Python is easier to maintain than php, but that might be personal preference). This is somewhat independent of the other things and basically amounts to replacing judgedaemon.main.php with a Python script. However, it would be a good first step as the other points profit from a Python-based solution.
Have a separate utility that handles execution of submissions in a cgroup based container (which could then equally be used in problemtools!). Having the Python script above would simplify this and allow unification of how things are executed between problemtools and judgehost.
Use https://github.com/sosy-lab/benchexec to properly take care of setting up cgroups, isolation, etc. This would need some exploration, but I think that benchexec supports everything that judging needs out of the box, works with both cgroups v1 and v2, and has a convenient Python API. Downside is that an additional dependency is introduced, but I think it would be worth it here, as getting cgroups and isolation correct is hard (citation needed). Another downside is that in order to run this on your own machine, you would need support for benchexec, which however mostly boils down to cgroup support (unless I am missing something). But it should be possible to fall back to Wallclock based measurement for problemtools. Again, having a Python based approach is nice here, as then interaction can be done via API and not command-line / script-based fiddling around. Nevertheless, I'm sure there are some pitfalls here.

Thoughts?

Mar 11 '25 11:03 incaseoftrouble

We already added cgroupv2 support as part of https://github.com/DOMjudge/domjudge/issues/1072

I do think it would be a good idea though to make it possible to execute submissions separately (e.g. as part of problemtools).

I would like to first land all parts of https://docs.google.com/document/d/1WZRwdvJUamsczYC7CpP3ZIBU8xG6wNqYqrNJf7osxYs/edit?tab=t.0#heading=h.i7kgdnmw8qd7 before though

Mar 11 '25 12:03 meisterT

Ah! I was looking at https://www.domjudge.org/docs/manual/8.3/install-judgehost.html ; didn't see the snapshot changes.

Indeed, also thinking through which parts of interactive problems run where (in which cgroup etc.) is very relevant here. I did only think of noninteractive problems

Mar 11 '25 12:03 incaseoftrouble

I agree with what @meisterT already said. I just want to add that we already have a separation of the API handling code in judgedaemon.main.php versus the backend sandboxing in runguard.cc. And then there's the shell script glue in between, mostly in testcase_run.sh.

Clearly there's some buildup of cruft in all of this over the 20 years that this code has been in the project, including using PHP for legacy reasons. However, there's also a lot of tweaked details in this code, so I think it is non-trivial to overhaul. Also, over the years I've looked into various other sandboxing solutions, and never found one that had the quality and specific feature requirements that I think would make the replacement of what we have worthwile.

Mar 11 '25 13:03 eldering

Are the tweaked details in judgedaemon.main.php? I agree that the sandboxing part is really hard and not something I would want to touch unless certain. But if the "frontend" would be replaced by something modern, this could help with, for example, having a local judgehost running on your system to which problemtools could be sending judging requests (again, to have things running on the same platform). To me, it seems that "obtain a package to judge" and "do the judging" are quite well separated and adding some flexibility to part 1 could be worthwhile?

(Just to reiterate, I'm just throwing ideas out there, if none of these sound sensible and you are happy with status quo that's of course fine, too!)

Mar 11 '25 15:03 incaseoftrouble

Are the tweaked details in judgedaemon.main.php? I agree that the sandboxing part is really hard and not something I would want to touch unless certain. But if the "frontend" would be replaced by something modern, this could help with, for example, having a local judgehost running on your system to which problemtools could be sending judging requests (again, to have things running on the same platform). To me, it seems that "obtain a package to judge" and "do the judging" are quite well separated and adding some flexibility to part 1 could be worthwhile?

(Just to reiterate, I'm just throwing ideas out there, if none of these sound sensible and you are happy with status quo that's of course fine, too!)

The tweaks are mostly in runguard.cc I think, but for example we have a separate program evict that we call from judgedaemon.main.php to clear kernel caches, as this impacts consistent performance.

In general, I think it will also be difficult to create a clean separation between the frontend and backend part, as there's a lot of data passed between them and custom setup done, see for example the runguard call at https://github.com/DOMjudge/domjudge/blob/main/judge/testcase_run.sh#L238-L242 with lots of options that partially come from system and problem configuration from the server, partially from local machine OS setup, etc. Then there's a separate script chroot-start-stop.sh to set up the chroot... I think it is not easy to create a simple interface to the backend that can easily be ported to use in other places supporting all/most features we have. OTOH, if you're willing to drop the more complex integration stuff, then I think runguard as it is already does a good job at providing that: it was on purpose from the beginning meant to not be tightly integrated into the system (the only bit of config it pulls in is https://github.com/DOMjudge/domjudge/blob/main/etc/runguard-config.h.in) and it can do most of the sandboxing on its own, except for providing a full chroot environment.

Coming back to your original proposal. I do see that PHP is not the most modern language anymore, and we could probably do some cleanup of our code to separate frontend/backend concerns a bit better. But I don't think that will make things a lot easier. At the end of the day, there's a lot of OS level stuff that needs to happen and plenty of it depends on the details of the context of the system and type of problem you run with.

Mar 11 '25 23:03 eldering

I see. So with all that, what I envision is to be able to have a judgedaemon running (directly on the system or in docker - in my case always docker) and start it in two different "modes". One is in "client mode", pulling from a running DOMjudge instance, and one is "server mode", where, say, a local problemtools (or variants of it) could push submissions to be executed. Gathering all relevant data etc. should be possible in this scenario (after all, I could automatically spin up a DOMjudge + judgehost docker, put the submissions into its database, wait for judgehost to grade it, and then fetch the results from the database)

In other words: If I / a student manage to get something running, you would not be opposed to having a 1:1 python replacement of judgedaemon? (that would be my first step anyway, adaptations like a separate way of sending it submissions would come later)

Mar 12 '25 07:03 incaseoftrouble

I assume you are mostly talking about judgedaemon.main.php? We were talking about rewriting it, and I would rather have picked Golang as its concurrency model makes it easy to do stuff in the background (e.g. upload results) while judging is ongoing. Not sure how easy that is to do in Python nowadays.

Another way to accomplish what you want today would be to implement your own version of the JudgehostController API. Then problemtools (or whatever) would spin up a local server, start a judgehost connect it to the local server and just needs to define work packets and process the results. That's probably doable in a few hundred lines of python.

Mar 12 '25 07:03 meisterT

asyncio etc. make this kinda easy in Python now, too. Not to the extreme scale of GoLang but I would assume that one judgehost hardly has more than 100 things going on concurrently :) Python would have the advantage that (AFAIK) it is far more popular (so easier to maintain / get students to do things).

Implementing the judgehost API: Yes, that would be an option, indeed. But to me this feels less clean. I think having an isolated package that does one job ("give me source code + test cases and I will run it") is neat. But maybe I want to do too much here.

Mar 12 '25 08:03 incaseoftrouble

Implementing the judgehost API: Yes, that would be an option, indeed. But to me this feels less clean. I think having an isolated package that does one job ("give me source code + test cases and I will run it") is neat. But maybe I want to do too much here.

Throwing in one more alternative here - it sounds like you want to be able to invoke testcase_run.sh easily?

Mar 12 '25 08:03 meisterT

Yes, that is a pragmatic solution to the problem. Something like a python script that is able to shove a submission into a docker container that has judgedaemon. That should then be reasonably easy to integrate with Kattis checking.

Mar 12 '25 09:03 incaseoftrouble

I think it would make sense to call something like testcase_run.sh as backend directly. The problem right now is that there's not a very clear interface defined for calling it. To be able to define this interface (more) clearly would require going through the current code flow (judgedaemon -> testcase_run -> runguard) and making an inventory of what information is passed and what system setup is done, and maybe try to refactor things to make that more logically separated between frontend and backend stuff. However, there's for example the chroot-startstop script called from judgedaemon for performance reasons (so we don't (un)mount chroot bind mounts for every individual test case). Such things make a clean separation difficult.

In the context of the problem package format (a joint effort between DOMjudge, Kattis, PC^2 and some other people (e.g. problem creators for a teaching setting) we've discussed similar things, but found it hard to come up with a clean interface for running programs. In that case compare/compile scripts etc, see e.g. https://github.com/Kattis/problem-package-format/issues/330

Mar 12 '25 11:03 eldering

There is no concrete action to take here. Feel free to reopen with a concrete proposal.

Sep 04 '25 11:09 meisterT

domjudge domjudge copied to clipboard

Modern Judgehost

domjudge
domjudge copied to clipboard