cms icon indicating copy to clipboard operation
cms copied to clipboard

Support plagiarism detection

Open lw opened this issue 8 years ago • 2 comments

We may want to provide to contest administrators the ability to determine whether some submissions are too similar to some public code or to each other. This is probably most pertinent to online contests or classroom use rather than onsite contests, which makes it a bit outside of the main scope of CMS.

I don't think implementing this ourselves is the best way to go. I believe there already exist such tools, with sophisticated algorithms and large corpora of sources. We should provide a way for CMS to interface with them.

I wouldn't be surprised if this issue had arisen before and I would love to hear from administrators that faced it how they addressed it.

lw avatar May 18 '17 11:05 lw

I just implemented cmsExportSubmission for this :)

After exporting everything, I run the submissions folder with jplag

Il gio 18 mag 2017, 13:13 Luca Wehrstedt [email protected] ha scritto:

We may want to provide to contest administrators the ability to determine whether some submissions are too similar to some public code or to each other. This is probably most pertinent to online contests or classroom use rather than onsite contests, which makes it a bit outside of the main scope of CMS.

I don't think implementing this ourselves is the best way to go. I believe there already exist such tools, with sophisticated algorithms and large corpora of sources. We should provide a way for CMS to interface with them.

I wouldn't be surprised if this issue had arisen before and I would love to hear from administrators that faced it how they addressed it.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cms-dev/cms/issues/764, or mute the thread https://github.com/notifications/unsubscribe-auth/ABOc8Tkjl3JlHWXSVmQP-iov2qTGgUHTks5r7CfhgaJpZM4NfEm5 .

wil93 avatar May 18 '17 11:05 wil93

I have prepared a repo with the scripts we use at the University of Trento for that: cms_check-plagiarism.

The script check_plagiarism.sh works as follows:

  • we extract all the source files submitted for a contest;
  • we use sherlock and compare each pair of source files submitted (by different users) to the contest;
  • for each user, we cluster the source files and we choose representatives for each cluster. Then, we pass JPLAG over the selected files.

You can read more in the section "How it works" of the README.md.

CristianCantoro avatar Dec 20 '17 21:12 CristianCantoro