teachers
teachers copied to clipboard
Public repositories and cheating
Having TA'ed at Berkeley for a couple semesters now, the most serious problem we have with GitHub is the amount of cheating that it enables. Every semester, some students take their work from the semester and dump them into public GitHub repositories to show off their work. With hundreds of students in each class, it is inevitable that you get students who look online for past solutions of projects and homework. For almost every project at Berkeley that has existed for at least one semester, you can find some solution code online on GitHub.
For our class, we currently provide a private repository via an educational organization to our students through the entire term, but it doesn't stop them from taking their repository and hosting it publicly after the term. Does anyone (especially GitHub staff) have experience in the best way to solve this problem? Some possibilities I'd imagine include:
- licenses for DMCA takedown - not too familiar with the best licenses to do this, also seems extremely hostile to students
- projects that can be hosted publicly (unique among students/groups so that no cheating can occur, although this takes extreme ingenuity in assignment design)
- larger free micro accounts (10 repos?) which may help encourage students to keep more private
I believe in another thread, someone mentioned their thoughts on cheating and how rather than do our best to try and discourage cheating, we should focus on working on ways to encourage students to share information. You can find his comment here. Whether this is the right solution for your scenario, I am not sure. I definitely think this mentality is a step in the right direction though.
Here's a good problem for an assignment.
Write a program that computes the difference between n different solutions to this assignment, and applies an algorithm or heuristic to determine the extent to which each solution is derivative from others. Set a threshold value of this heuristic to make a determination (along with any other relevant data) as to whether, in your opinion, a solution should be considered plagiarized.
To receive a passing grade, your program:
-
Must not be flagged as plagiarism when invoked on itself.
-
Must not be flagged as plagiarism by > 50% of other passing solutions.
Hint:
man git-diff
. You may wish to review the Github API documentation. Submit your project by forking this repository and committing your solution to your fork.
I think the kids in EECS should be able to handle it.
Many Berkeley classes already use Moss as plagiarism detection, which is much more advanced than any one person is going to come up with as a solution to an assignment or a just-for-fun side project (and far beyond a basic git-diff). This is perhaps the best we can do so far, though there are still many cases/styles of cheating that go uncaught by this text-based processing. There is a bit of work going on in one of our lower division classes to analyze patterns in cheating and to perhaps build a framework for detecting it, but this is just getting started and any useful results would be a long way off.
I think you might have missed my point there. The idea was to use various combinations of very advanced git-diff features (or, anything else; just somewhere to start) to build a tool like mentioned over time, as comparing and evaluating against previous solutions is a requisite part of the assignment itself. Anyway.
Sorry, I didn't mean to come off as completely disregarding your suggestions. The idea of using versioning to detect cheating is a fantastic idea, and a full-fledged system would likely even involve analyzing individual commits and perhaps using git-diff in very unique ways. Thanks for your ideas!
@georgeyiu I'm curious to hear if you have any updates, or changes to the way you are doing things that have addressed this issue.
We have a similar issue at another university.
Without this people would probably share code internally and then systems like moss would fail to work. Most of the time, course work is the only thing the student has to offer to potential employers upon graduation.
As a TA at Drexel University, I don't have much control over what happens. However, I've given it a bit of thought. My suggestion: for higher level classes I'd provide students with a overview of an idea and ask them to write the specs for this idea, essentially design the project, and submit that as their first homework. For their second homework they build this spec out. This would allow them to learn about writing specifications for homework, and would essentially enable them to be a better CS student upon graduation.