cms
cms copied to clipboard
PostgreSQL raises an exception on strings containing the NULL character
New CMS (cf4cdf0039ba45d21058790cf19cf8516a55adb3 commit) failed to handle the wrong java submission which led to the EvaluationService crash due to the null insertion into the DB. Skladisce.txt This was the .java file (I renamed it to upload it here). We haven't made a further investigation whether a contestant submited this file or it was corrupted by the CMS.
Could you provide a log of the crash of EvaluationService?
They were overridden as soon as the wrong submission was deleted. But you can reproduce it using the same code. But it said that the insertion violated the non-null constraint when storing the source as a string.
for what it's worth, it looks like a compiled java file, not a java source code (or maybe I'm missing something?)
So, just to clarify, you are saying that someone submitted a Java bytecode file as a Java scource file and that crashed EvaluationService? I will try to reproduce this and let you know.
I managed to reproduce it and here are the logs:
2018-03-12 22:06:58,708 - INFO [Evaluation,0] Writing result to db for compile on 115 against dataset 26
Traceback (most recent call last):
File "/home/luca/Development/cms/venv/lib/python3.6/site-packages/gevent/greenlet.py", line 536, in run
result = self._run(*self.args, **self.kwargs)
File "/home/luca/Development/cms/cms/service/flushingdict.py", line 108, in _check_flush
self.flush()
File "/home/luca/Development/cms/cms/service/flushingdict.py", line 91, in flush
self.callback(list(iteritems(self.fd)))
File "/home/luca/Development/cms/cms/service/EvaluationService.py", line 194, in wrapped
return func(self, *args, **kwargs)
File "/home/luca/Development/cms/cms/service/EvaluationService.py", line 527, in write_results
session, object_result, operation_results)
File "/home/luca/Development/cms/cms/service/EvaluationService.py", line 583, in write_results_one_object_and_type
session, object_result, operation, result)
File "/home/luca/Development/cms/venv/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 573, in __exit__
self.rollback()
File "/home/luca/Development/cms/venv/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__
compat.reraise(exc_type, exc_value, exc_tb)
File "/home/luca/Development/cms/venv/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 187, in reraise
raise value
File "/home/luca/Development/cms/venv/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 570, in __exit__
self.commit()
File "/home/luca/Development/cms/venv/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 467, in commit
self._prepare_impl()
File "/home/luca/Development/cms/venv/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 447, in _prepare_impl
self.session.flush()
File "/home/luca/Development/cms/venv/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 2198, in flush
self._flush(objects)
File "/home/luca/Development/cms/venv/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 2318, in _flush
transaction.rollback(_capture_exception=True)
File "/home/luca/Development/cms/venv/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__
compat.reraise(exc_type, exc_value, exc_tb)
File "/home/luca/Development/cms/venv/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 187, in reraise
raise value
File "/home/luca/Development/cms/venv/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 2282, in _flush
flush_context.execute()
File "/home/luca/Development/cms/venv/lib/python3.6/site-packages/sqlalchemy/orm/unitofwork.py", line 389, in execute
rec.execute(self)
File "/home/luca/Development/cms/venv/lib/python3.6/site-packages/sqlalchemy/orm/unitofwork.py", line 548, in execute
uow
File "/home/luca/Development/cms/venv/lib/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 177, in save_obj
mapper, table, update)
File "/home/luca/Development/cms/venv/lib/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 737, in _emit_update_statements
execute(statement, multiparams)
File "/home/luca/Development/cms/venv/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 945, in execute
return meth(self, multiparams, params)
File "/home/luca/Development/cms/venv/lib/python3.6/site-packages/sqlalchemy/sql/elements.py", line 263, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/home/luca/Development/cms/venv/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1053, in _execute_clauseelement
compiled_sql, distilled_params
File "/home/luca/Development/cms/venv/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1189, in _execute_context
context)
File "/home/luca/Development/cms/venv/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1405, in _handle_dbapi_exception
util.reraise(*exc_info)
File "/home/luca/Development/cms/venv/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 187, in reraise
raise value
File "/home/luca/Development/cms/venv/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
context)
File "/home/luca/Development/cms/venv/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute
cursor.execute(statement, parameters)
ValueError: A string literal cannot contain NUL (0x00) characters.
Mon Mar 12 22:06:58 2018 <Greenlet at 0x7fae52bcaa60: <bound method FlushingDict._check_flush of <cms.service.flushingdict.FlushingDict object at 0x7fae528d3978>>> failed with ValueError
I kept the sandbox to look at what was happening and found out that the compiler stderr consists of many times the line batchstdio.java:1: error: unmappable character for encoding UTF8
followed by a line copied verbatim from the submitted "source" which contains some zeros. Hence when we store that in the DB it fails.
I was going to say that we should treat the stdout and stderr as binary and decode them to UTF-8 in Python, taking care of errors, but we already do that: https://github.com/cms-dev/cms/blob/beb5f754560e00cc444156117db242c294c5fcde/cms/grading/init.py#L289-L296 Though a zero-byte isn't considered an error, apparently.
Shall we start by patching ES not to drop all results when one raises an exception while writing?
Yes, that would be a good thing to do regardless.
It's hard to list all places where a contestant can control (in some way) the data that is written to the DB. In CWS it's sometimes easy (asking a question), sometimes seemingly impossible (playing a token), sometimes unclear (submitting, as this issue shows). I think it will be hard to find a generic solution. SQLAlchemy may provide a way to pass every string to be stored in the DB through a function we provide (which could strip unwanted characters), but that would have a tremendous performance impact.
Are there updates on this? I just encountered this bug as well :sweat_smile:
For now the approach is to fix case-by-case the situations where strings like these are the most likely to be injected (e.g., output of a compilation, in this case). How were you affected?
In general, I think we could consider a few approaches. One is for example to preprocess all strings we're saving to the DB and replace the NULL character by, say, the Unicode REPLACEMENT CHARACTER (often rendered by a question mark in a box). A less destructive transformation, but one that makes data unreadable to humans, is to increase all codepoints by one when we store strings and decrease them by one when we load strings (e.g., "ab1" is stored as "bc2").
I was affected in the same way as OP, with a wrong Java submission.
I don't really understand why postgres is so picky with the characters we use. We just want to store a file, after all. Is it really that weird to have 0x0 characters somewhere in the file? I would expect some very exotic languages to even have that character in the source code, at times.
For me, the only solutions that make sense are:
- convince postgres that it's not such a bad thing to store 0x0, maybe changing the data type?
- catch this ValueError in ES so that (only when the exception happens, not every single time) it will replace the character, or even better, it will replace the whole source code with "invalid characters, cannot store to DB"
The output of the compilation command is stored as a string in a column of a PostgreSQL table. Submission files themselves are stored as large objects. Hence the latter ones can contain what they want, whereas the former ones suffer from this issue. I don't think changing the column type to be binary is a good solution. What's wrong with replacing NULLs with the replacement character? Or even something else?
Oh, so the output of the compilation is what we are talking about. I was thinking about the source code itself (being a Java byte code, it makes sense that it contains 0x0)
In that case I agree with you, even though I can't help but blame Java for not doing this replacement themselves!
Anyway it still makes sense to do the replacement only when this exception is raised, doesn't it? To improve performance..
On Sun, 7 Oct 2018, 09:51 Luca Wehrstedt, [email protected] wrote:
The output of the compilation command is stored as a string in a column of a PostgreSQL table. Submission files themselves are stored as large objects. Hence the latter ones can contain what they want, whereas the former ones suffer from this issue. I don't think changing the column type to be binary is a good solution. What's wrong with replacing NULLs with the replacement character? Or even something else?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cms-dev/cms/issues/888#issuecomment-427633492, or mute the thread https://github.com/notifications/unsubscribe-auth/ABOc8d-kyj6bjCy1IwtFMonfYlV82Eqeks5uibJagaJpZM4SnQBN .
We also encountered the same bug in the same way, someone submitted a compiled Java file. The fun part was EvaluationService trying to recompile the same submission every two minutes :)
I agree that replacing bad characters is an acceptable solution. After all, the final goal is displaying readable text, not binary data. We can also draw parallels with Python's universal_newlines
/text
mode for files — with it you can also use an errors=replace
action to filter out undecodable sequences.
Actually, how does the worker read compiler output? If it uses text mode, then maybe the replace action can be specified already at decoding time. EDIT: Looks like there's nothing like that in the default codecs library. UTF-8 accepts null characters just fine.
EDIT2: Relevant code is here: https://github.com/cms-dev/cms/blob/54f7a5c182c49a23a4524fc03d2a32dafbb5a09c/cms/grading/steps/stats.py#L53-L57
Holas yo tenia el mismo problema y lo solucione haciendo esto:
contenido = contenido.replace('\x00', '')
contenido = contenido.replace('\00', '')
contenido = contenido.replace('\0', '')
y asi le quite los valores null que no podia insertar a mi base postgres
I've also recently encountered this issue, but in my case it was the checker output that was causing problems. An easy fix, similar to what @andreyv did, is to add the following:
text = text.replace("\x00", "\uFFFD")
to line 101 of cms/grading/steps/trusted.py
.
Just for the record, there is also a possibility of removing the character (matches terminal behavior) or replacing it with U+2400 "SYMBOL FOR NULL" (␀). Of course, it would be nice if there was a better solution (such as the utf-8
codec handling it transparently, but it doesn't).