ADD simple cheat detection on question responses
from difflib import SequenceMatcher
import json
import pandas as pd
def similar(a, b):
return SequenceMatcher(None, a, b).ratio()
a = json.load(open('./assignment-final-question-assignments.json'))
netids = list(a.keys())
first = next(a.values())
pools = list(map(lambda x: x['question']['pool'], first['questions']))
results = {_pool: [] for _pool in pools}
for netid in netids:
for question in a[netid]['questions']:
pool = question['pool']
text = question['response']['text']
row = []
for _netid in netids:
_text=None
for _q in a[netid]['questions']:
if _q['pool'] == pool:
_text = _q['response']['text']
break
ratio = similar(text, _text)
if len(text) == 0 or len(_text) == 0 or netid == _netid:
ratio = 0
row.append(ratio)
results[pool].append(row)
Another resource to checkout: https://www.ics.uci.edu/~kay/checker.html
Yeah, MOSS is the classic one. A few people used but I have heard the interface it hard to use. Another thing is that you are basically submitting all the students code to some server ...
Well the scope of this card is specifically for just the question responses, not the github submissions. I think we can take whatever parts of the MOSS system make sense.
Teo and I have been talking a bit about what to do with our cheat detection. Some of the ideas are taking the existing similarity script above, and applying some simple statistics distributions to the results. With the distributions, we can calculate confidence scores. I don't think we should have Anubis say "this person cheated", rather say "we are 71% confident." That puts the nasty business of deciding what to do next up to the course Professor's discretion.
Right now the distribution and confidence scores are one level above what we already have.
I found a paper discussing the algorithm used by MOSS. I only read the introduction so far but don't think this algorithm is complicated. I guess we can implement it for cheat detection on question responses. https://dl.acm.org/doi/pdf/10.1145/872757.872770
Ironically, this algorithm does not behave well for detecting plagiarism in code even though most of the time MOSS is used for programming assignments.