cms
cms copied to clipboard
Grader-less communication task type
Currently, communication tasks always require a grader header file. Batch tasks, on the other hand, can be configured to instead use stdin/stdout. Would you be interested such an option for communication tasks?
Unfortunately, this is not entirely trivial: When the manager terminates (because of an illegal query, for example) and the solution keeps reading (ignoring EOF errors), strange things might happen. We might not receive the expected "wrong answer" verdict, but instead get segmentation fault or TLE. So, when the manager finishes, we want to kill the solution without its cooperation.
Since we still prefer stdin/stdout in our selection camps, we've already implemented this a while ago. It required changes to Communication.py and isolate (small change), and you always need to include a specific header (managerframework.h) in your manager. The idea is as follows:
The manager and the code in Communication.py communicate via an additional set of two fifos. When the manager is done (wrong answer/correct), it tells Communication.py. Then, Communication.py sends SIGINT to isolate, which will kill the submission. Communication.py waits for the sandbox to terminate, then tells the manager, which ultimately exits.
(Even if you use a grader header file, it's also important to remember to open the manager<->submission pipes in the correct order to avoid deadlocks and to ignore SIGPIPE signals in the manager in case the submission terminates or closes stdin early.)
As I said, we've already implemented it, but I wanted to ask if this is a desired feature and you like our approach, before cleaning up the code for a pull request.
https://github.com/ioi-germany/cms/blob/new_merge/cms/grading/tasktypes/Communication.py https://github.com/ioi-germany/cms/blob/new_merge/cmscontrib/gerpythonformat/lib/include/managerframework.h https://github.com/ioi-germany/isolate/blob/master/isolate.c
We are also interested in stdin/stdout for interactive problems :)
I think that ideally the grader and the user program should be able to just read/write stdin/stdout without any helper code. This has multiple advantages: you only use standard interfaces, you don't need to implement helper code for every single programming language and you can easily grade solutions locally:
mkfifo fifo
grader < fifo | solution > fifo
It would be nice to see if this approach can be implemented, I think it would be useful.
Your suggestion manager < fifo | solution > fifo
would still work, except for the fact that cms currently passes the communication pipes to the manager as command line arguments instead of redirecting stdin/stdout. (Of course, this simplified command line wouldn't automatically kill the submission when the manager is done, though.)
I see your point in avoiding helper code.
Just to clarify: The only part needing translation would be managerframework.h, in case you want to write managers in languages other than C++. (And you only have to do it once per language.) Most of the lines in managerframework.h need to be written, anyways: open the test case input file and the communication pipes and ultimately write the score and verdict. The "interesting" stuff is in lines 45-49, 60-64, 72-74, 80-85. Everything except 72-74 should be straightforward to implement in most languages.
Maybe someone with a better background in operating system programming than me knows how to externalize the interesting parts: signal handling and the Communication.py <-> manager interaction
Given that there definitely is interest in the feature, I believe it's worth including it in upstream CMS.
The grader < fifo | solution > fifo
approach has the issue that the grader cannot report the evaluation result using the standard manager output format, since that requires writing to both stderr and stdout (which is redirected). In other terms, the grader needs to read/write to 4 streams in total, hence this can't be done with standard stdin/stdout only. One approach would be:
mkfifo down
mkfifo up
solution < down > up & grader up down
Meaning the solution can read/write using stdin/out but the grader, like now, needs to open the named pipes it receives as arguments. (File descriptors for unnamed pipes could be used as well I guess.)
Plus, the grader might also need to read an input file. If you don't like hardcoding filenames in your programs, you could have it read from stdin and then the above becomes solution < down > up & grader down up < input.txt
.
@fagu I don't understand why you need two extra pipes for the grader to communicate with the TaskType. Take the approach I just presented and have the TaskType kill both sandboxes as soon as either the grader terminates (gracefully, giving a result, or ungracefully) or the time limit is hit. What would be wrong with that?
Just for the record, testlib works similarly, but the stream assignments are reversed:
* Interactor, using testlib running format:
* interactor.exe <Input_File> <Output_File> [<Answer_File> [<Result_File> [-appes]]],
* Reads test from inf (mapped to args[1]), writes result to tout (mapped to argv[2],
* can be judged by checker later), reads program output from ouf (mapped to stdin),
* writes output to program via stdout (use cout, printf, etc).
I think the assignment I proposed above is better, because contestants who are testing locally don't have to save their input to a file and then look at another file for the output, they can do so right from the shell (coming up with the input on the fly, writing it one line at a time to see where their code fails, ...). Also, it is exactly the one we require from graders now, which would allow to use the previous ones unchanged.
I think the original proposal has a bit too many working pieces for the official repo, to be honest.
Also, I'm not sure I understand it fully, Fabian - as I see it the visible differences are that one can have tasks with self-sufficient submissions (without stub), and admins have a provided main for the manager. So maybe I'm missing the point of "using stdin/stdout" (I assumed you meant "instead of files", but maybe you meant "let the user code manage the I/O instead of relying on a library?).
Regarding the other suggestion from Andrey and Luca, I think that's quite different, is it? (Also that wouldn't work well with multiple user programs).
@fagu I don't understand why you need two extra pipes for the grader to communicate with the TaskType. Take the approach I just presented and have the TaskType kill both sandboxes as soon as either the grader terminates (gracefully, giving a result, or ungracefully) or the time limit is hit. What would be wrong with that?
The solution has to be killed before the manager closes stdout (or terminates). Otherwise, the solution (ignoring EOF) might read a nonsensical value from stdin and segfault before we get a chance to kill it. That's why I want the manager to communicate to the TaskType that it's ready, before it actually terminates.
I think you're right that it's unnecessary for the TaskType to tell the manager once it may terminate (as soon as the solution has terminated or been killed). The TaskType could instead simply kill the sleeping manager.
You also make a good point with regard to killing the solution even if the manager terminates ungracefully (right afterwards). There's no reason to wait for the solution, which might time out. I'll change that.
Alternative: Instead of modifying the manager to announce when it is ready, we could have an intermediate program that forwards the messages between manager and solution. Once the manager closes its stdout fifo, the forwarder kills the solution.
I don't think any points should be awarded to a solution that segfaults after it exhausted its input and produced the correct output. The reason is consistency with Batch: if a Batch solution produces the full correct output but then exits ungracefully (segfaults, nonzero return code, ...) it gets a score of zero regardless of what output it produced. Why would you want to be more lenient with Communication solutions?
On Mon, Aug 20, 2018, 18:37 fagu [email protected] wrote:
@fagu https://github.com/fagu I don't understand why you need two extra pipes for the grader to communicate with the TaskType. Take the approach I just presented and have the TaskType kill both sandboxes as soon as either the grader terminates (gracefully, giving a result, or ungracefully) or the time limit is hit. What would be wrong with that?
The solution has to be killed before the manager closes stdout (or terminates). Otherwise, the solution (ignoring EOF) might read a nonsensical value from stdin and segfault before we get a chance to kill it. That's why I want the manager to communicate to the TaskType that it's ready, before it actually terminates.
I think you're right that it's unnecessary for the TaskType to tell the manager once it may terminate (as soon as the solution has terminated or been killed). The TaskType could instead simply kill the sleeping manager.
You also make a good point with regard to killing the solution even if the manager terminates ungracefully (right afterwards). There's no reason to wait for the solution, which might time out. I'll change that.
Alternative: Instead of modifying the manager to announce when it is ready, we could have an intermediate program that forwards the messages between manager and solution. Once the manager closes its stdout fifo, the forwarder kills the solution.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/cms-dev/cms/issues/993#issuecomment-414257480, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHX6mSCtZIhHPMdA7b0lTrzK75Li2gqks5uSoN1gaJpZM4WCkmM .
@stefano-maggiolo Yes, I'm trying to support self-sufficient submissions that use stdin/stdout directly, not through a grader header file. (And the reason I'm providing the main function in the manager is mainly to parse the command line arguments telling us where to find the fifos for communicating with the TaskType. It would perhaps be annoying and error-prone to reimplement the stuff in managerframework.h
every time you write a manager.)
And to be more explicit about why I want to kill the submission, here's what could happen otherwise:
- The submission prints
Q -2
. - The manager receives
Q -2
, which is not a valid query. It outputs the verdict "Wrong answer => 0 points" and terminates. - The submission now tries to read the response with
scanf("%d", &response);
. Since the manager didn't print anything before terminating, this hits the end of the stdin file. Now, theresponse
variable might have an arbitrary value. Assuming the submission didn't detect EOF (by looking at the return value ofscanf
, for example), it might now run into all sorts of troubles. It might segfault, becauseresponse
is out of bounds. Or it might time out becauseresponse
is large. Or it might use too much memory. Or it might keep asking the manager questions and keep getting nonsensical responses (until it times out, perhaps). Etc.
I think the correct verdict would always be "wrong answer", since that's the first thing that went wrong. But the TaskType has no way of knowing whether "manager says WA, submission crashed" means "the manager said WA, then the solution crashed" or "the solution crashed, then the manager (receiving no input) said WA".
@lerks I see. The idea wasn't to give more points to a solution that segfaults after the correct answer, but to give a meaningful verdict to a solution that makes an illegal query (always WA, even if as a consequence the solution later also segfaults). That's only a side-effect which I was willing to accept. :)
@lerks It feels a bit awkward to make that distinction, but the manager could instead treat the following two cases differently: Illegal queries: Immediately kill the solution. Correct(/wrong?) answers: Don't kill the solution.
Update: Also see below.
In my opinion there should be a simpler way to handle all cases using only standard Unix behavior.
What do you think about this approximate scheme (it can probably be improved):
- Start the manager and the solution, with pipes connected to solution's stdin and stdout
- The manager does not try to handle
SIGPIPE
manually
- The manager does not try to handle
- Wait for the manager to finish
- If the manager reported OK/WA:
- Wait for the solution to finish
- If the solution finished successfully, report OK/WA, else report solution's status (non-zero exit code/crashed)
- If the manager was killed by
SIGPIPE
:- Communication was lost; assume WA
- Wait for the solution to finish
- If the solution finished successfully, report WA, else report solution's status (non-zero exit code/crashed)
- If the manager died from a different cause:
- Assume internal error
The penultimate case is when the manager tries to write to a disconnected pipe. When it tries to read from a disconnected pipe, it will read EOF, so this condition can be handled normally and the manager can report WA.
If CMS doesn't have a separate Internal error status, then the last two cases can be joined.
You could also require the manager to handle SIGPIPE
and immediately report WA, this could simplify the above scheme but make writing managers more complex.
EDIT: I didn't include TLE in the above scheme, it needs to be considered.
@fagu Sorry, I started writing my answer before your latest posts. Now I see your point. In this case it may be fair to consider the manager's WA verdict authoritative and ignore the actual solution status. After all, what matters is that the solution made an incorrect query, not what it did afterwards.
@andreyv Assuming SIGPIPE in the manager (i.e., solution didn't read everything) means "wrong answer" seems a bit problematic to me. In principle, shouldn't it be allowed to guess the correct solution without reading everything?
What is the standard way of solving the SIGPIPE problem in communication tasks that use a grader stub? I'm thinking that the same method would work in both the standalone and the grader stub mode.
@lerks I think a manager should in any case only report "correct answer" once it's reached the end of the solution's output: Extra output by the solution should also be "wrong answer". So, in my proposal, a "correct answer" solution would get killed as soon as it closes stdout. (It seems slightly strange for a solution to segfault after closing stdout, although it could happen of course, maybe even by accident. :) )
@fagu This is an interesting case.
If the communication is synchronous (each side reads the other's answer before printing their own), there should be no problem. When the solution writes an answer, the manager must be in read mode to get the answer. Afterwards, the manager checks the answer and exits without further writes.
If the communication is asynchronous, as in your example (the manager is busy writing a big block of data, while the solution writes the answer and exits), this does need extraordinary SIGPIPE
handling in the manager. In C just ignoring SIGPIPE
and write errors might be sufficient, in other languages more work may be needed, as write operations will raise exceptions.
Normally, though, the communication is synchronous. The communication is usually of form
solution> QUERY X Y Z
grader> 1 2 3
solution> QUERY A B C
grader> 4 5 6
solution> ANSWER 42 <EOF>
The "bad" case would be when the solution stops reading mid-query, which is unusual. Now I'm not sure if this should be supported or not; in theory it should be, but omitting this case allows to simplify graders, without practical impact, to the point that no setup code is needed.
I only think telling the students "wrong answer" just because they didn't read all input is a strange trap. We'd have to warn them in the problem statements, making those longer and more convoluted. (Problem statement length is part of the reason why we like to avoid grader stub headers in the first place... :) )
And I believe it's entirely possible that a solution might want to make a query even if it's no longer interested in any responses. The solution might already know what the responses are going to be, but still need the query's side-effect. (Imagine an interactive graph-theory problem: A query makes you walk along an edge and the response tells you whether you've visited the node before. At some point, you might know that you've already visited the entire graph, so reading results becomes pointless. But maybe the problem asks you to finish at a certain node, so you have to continue making queries.)
PS: Of course, if you don't handle SIGPIPE at all, the manager crashes, CMS produces error messages, and the contest organizers have some head-scratching to do. ;)
PPS: In principle, couldn't there even be a SIGPIPE just because the solution didn't read all trailing whitespace? (Say, the manager flushed just before the whitespace.)
In #1000 I implemented the proposal I made in https://github.com/cms-dev/cms/issues/993#issuecomment-414179869, because that is an unobtrusive change that fits well with the current design of the task type, and it seems it's a prerequisite to any further changes. If you have feedback on that please comment over there.
Then I must admit that I got lost in your discussions. I'm leaning against any solution that involves SIGPIPE (or similar techniques) because it would make it harder to write managers (I don't know if we can assume that all contest admins are fluent in UNIX). In general, as @stefano-maggiolo pointed out, fewer moving parts and simpler approaches are better. Also, adding a new task type to the upstream repo that is only slightly different from the current Communication would create confusion, hence I think that all proposal must be formulated as changes to the current implementation and thus must be backwards compatible.
I must also say that I am quite satisfied by the situation we'll reach after #1000 will be merged. If you still feel that this solution doesn't meet your needs and want to keep your version and make it available to the a wider audience, you could package it as a separate project (thanks to our revamped plugin system it would then be easy to have CMS detect it) and we could reference it in the documentation.
The SIGPIPE discussion is unrelated to my original proposal. It exists whether you use grader stubs or not. It also doesn't solve the problem of grader-stub-less communication tasks explained here.
I only mentioned the separate SIGPIPE issues as a side remark. In any case, I think #1001 would be a better place for discussing them. :)
@lerks I'll have a look at #1000, thanks! But it won't solve https://github.com/cms-dev/cms/issues/993#issuecomment-414264600 . That issue can be solved with a properly written grader stub, but should be addressed in a different way for grader-sub-less tasks, I believe.
And if we introduced TaskType<->manager fifos just in the case of grader-stub-less problems, that would be a backwards-compatible change, wouldn't it? (Since grader-stub-less communication problems didn't use to be an option.)
I think an implicit assumption in Communication is that the manager, being provided by the admins, should be a good-quality robust program, that should never fail no matter what the contestant program does (this relates to #1001 and to the fact that it needs to detect EOF and SIGPIPE). Bonus points if it prints out a message explaining what went wrong.
If a stub is provided, then that should also be written with care, properly handling the EOF situation you detailed in https://github.com/cms-dev/cms/issues/993#issuecomment-414264600 and others. Unfortunately, while the manager must be written only once, a different stub must be provided for each language, which means more work.
If one decides they don't want stubs, then they're basically saying that the contestant's program should pick up those duties and take care of proper I/O itself. If the contestant program doesn't do that (because, say, the contestant doesn't know how) then, well, no one is doing that and the issues we discussed here emerge.
Your proposal in #1001 of having an intermediate program to take care of SIGPIPE&co. basically amounts to having a stub that is not built into the contestant program but runs parallel to it.
In the end, both the issue outlined here and the one in #1001 are not about figuring out the proper outcome (correct/wrong) or the right score, but are about giving the best error message possible. The manager, when robustly written, should be able to understand what went wrong and give a meaningful error message. The problem is that, if the contestant program crashes, we discard the manager's message and just return a generic message about the contestant's program. What we could do instead is merge the two messages, so that the manager's message can help the contestant understand the causes (and consequences) of the crash.
In general, it would be a good thing to have a repo of high-quality tasks, with good managers and stubs, in order for admins to simply copy from there and adapt to their needs rather than rewrite from scratch. This should help avoid common mistakes (e.g., not flushing, not checking for EOF errors, ...) and create a more uniform landscape in task quality.