classroom
classroom copied to clipboard
Teachers using private repo belonging to their user account
A lot of teachers have been using private repos created on their user account (rather than on the org) as their starter code repo.
This isn't an issue when they create the assignment because we're querying that repo on their behalf. The repo is accessible.
However, when a student accepts that assignment, we pick a random org owner's token to create a repo on the classroom org. If that random token is not the teacher who owns the starter repo, the repo is inaccessible.
I'm guessing we never noticed this because most orgs/classrooms will only have one owner (aka the teacher that owns the private repo). In reality, many classrooms contain multiple owners (TA's for example).
This is the actual reason templates were failing for the past few weeks and most likely has been the biggest cause of failure using porter as well.
100/100 investigative work, nice work 😄
@EricPickup @shaunakpp I'd love for you to try to solve this in your last week - if possible. If you don't have time to implement anything, outlining a proposed fix for handoff to @education/classroom-reviewers to work on would be :100:
@shaunakpp let's set up some time today to work on this!
A couple solutions I've thought about..
-
When a new assignment is created or a new teacher/TA is added as a classroom owner, add them as collaborators to any starter code repositories. We'd definitely add some message to indicate this. We could check who created each assignment, grab their token and add all other classroom owners. This would help prevent reaching rate limits in our largest classrooms.
-
When creating a repo for the student, iterate through (or choose at random) the list of classroom owners and send a GET for the starter code repo on their behalf. If we get a 404, skip to the next owner. Once we get a successful response use their token. This is the safest way to do it and requires no unexpected changes to the teacher's repo. However, this does leave a higher chance of hitting the rate limits on large classrooms.
@d12 @shaunakpp let me know what you guys think.
When a new assignment is created or a new teacher/TA is added as a classroom owner, add them as collaborators to any starter code repositories.
This may not work if the starter code repository is a public repository not belonging to the teacher or the organization.
When creating a repo for the student, iterate through (or choose at random) the list of classroom owners and send a GET for the starter code repo on their behalf. If we get a 404, skip to the next owner. Once we get a successful response use their token.
This makes an already long-running job longer, is it possible to short-circuit this step while assignment creation? Like, keep a list of user_ids that are valid owners/collaborators for the starter code repository and use one of them while creating the assignment repo for a student?
I'm also in favor of the second solution as it is not mutating anything on dotcom.
This may not work if the starter code repository is a public repository not belonging to the teacher or the organization.
This step wouldn't be necessary if it's a public repo!
keep a list of user_ids that are valid owners/collaborators for the starter code repository and use one of them while creating the assignment repo for a student?
The only issue I see with this is that even though it's unlikely collaborators for the repo can change at any time. It also still leaves the rate limit issue. 😬
I think we should wait to see what @d12 thinks before we get started since both methods have their ups/downs.
Awesome job investigating this 👌 This was totally an oversight!
Issue with Proposed Solutions
Both suggested solutions 1. and 2. in https://github.com/education/classroom/issues/2222#issuecomment-520513631 rely on users maintaining access to their classroom organization. This would prevent classroom organizations form using GitHub organization access as they might normally.
If a ta/teacher is removed from their classroom organization, classroom would loose access to their privately owned starter code repos.
Users of classroom would be inconvenienced by this constraint (which could be perceived as unintuitive). Organization access should be able to be revoked for teachers/ta without causing disruption to classroom.
Potential Solution
A more stable solution would to copy the assignment creator's privately owned starter code repo to the classroom organization. This copy would serve as a base repo for the assignment and classroom would always maintain access to it (given that the classroom admins have the appropriate levels of access).
This approach has some bonus benefits:
- Assignment repos could always be template repos
- Assignment repos would not disappear if the source repo went from public to private
- Paid organizations wouldn't have to keep certain users in their organization for the purpose of maintaining access, which would cost them seat(s)
This step wouldn't be necessary if it's a public repo!
Yes, but we currently store just the starter_code_repo_id
so we wouldn't know whether the repo is public or not until we start the assignment creation process for a student. So, probably we'll need to add a flag like starter_code_repo_private
to circumvent this added check if the repo is public.
The only issue I see with this is that even though it's unlikely collaborators for the repo can change at any time. It also still leaves the rate limit issue. 😬
We can keep user_ids as cache, and bust the cache if we get a 404, and retry the process. (This is a rare scenario, but worth handling).
@BenEmdon I really like this solution and think it's the most stable.
We could fork the repo over to the organization instead of copying using templates. That removes the extra step of making that repo into a template and also cover the plan that @d12 proposed related to this: https://github.com/education/classroom/issues/2032 where if the user doesn't have permission to make a repo into a template, we fork it and make the fork a template.
@BenEmdon creating a fork looks like a good solution 👍.
We'll need to add clear messages that we are creating a fork in the organization, as currently picking a starter code repository creates no perceived side effects to the organization. But, with this implementation, we'll be adding a new repository to the org and at the same time make it a template.
This makes sense to me :+1:
As far as copying goes, we should be forking, not importing or template copying. The repo may not necessarily be a template repo, and we don't want to add further dependencies on the importer.
Keep in mind that GitHub forks are absurdly complex, there's a LOT of edge cases to worry about. @BenEmdon 's point about not having to keep a seat in a paid org is depending on a bug that we've been trying to fix for 3 years now: https://github.com/github/github/issues/62000, so lets not depend on that always being the case.
That being said, make sure to test any assumed fork functionality before starting work, forks sometimes don't behave how you'd expect them to behave.
Let's also make sure we're looping in design here so that this is communicated properly to the end user, Teachers should understand why this repo is being copied into their org.
The forking can solve a lot of our issues such as this one, #2234 and #2032 but I've just started working on it and am noticing some concerns:
- We can only fork a repository once per user/org. What if a teacher wants to use the same starter code repo in multiple assignments? We could check if the fork already exists and use that one
- How will we handle a situation where the teacher wants to update their starter code repo? We'll need to make sure that they know to modify the fork rather than the original one we're taking.
cc @femmebot on all of this stuff!
-
Yep, we should check if the fork exists. Make sure we don't make naming assumptions here since forks aren't guaranteed to be named something specific.
-
@femmebot May have ideas for how we can design this. Super savvy users may be able to figure this out themselves, but let's guide the users from the Classroom UI.
Although it solves a variety of problems, relying on forking makes me generally uneasy and I tend to agree with @d12 that it could introduce more problems, including possibly:
- Forks are slow
- Forks can become detached
- You can't fork something more than once
- Multiple people can't fork the same thing to the same org
- It introduces new ownership problems
- Forks of private repos can be limited in specific cases
- When there are multiple repositories owned by a single user which one is used is confusing
I think a lot of similar problems exist with the template scenarios as well.
If we build a matrix of the cases we'll see which ones don't work (assuming that the repo in question is marked as a template the following should be true):
Owner | Public/Private | Students can see/use the template |
---|---|---|
Teacher | Public | Yes |
Teacher | Private | No |
Classroom Org | Private | Yes |
Classroom Org | Public | Yes |
Third party Org[^1] | Private | No |
Third party Org | Public | Yes |
Org with 3rd Party Access Restrictions[^2] | Public | No |
Org with 3rd Party Access Restrictions[^2] | Private | No |
[^1]: assuming the student is not a member of the third party org [^2]: assuming that the org has not explicitly enabled GitHub Classroom
If this table isn't true then I think it represents a bug in GitHub we should fix.
If this table is true then I think we should look at a couple of smaller interventions:
- When the Teacher creates the assignment:
- Check public/private
- If private, check the owner (it must be the classroom org)
- If the org is an owner check for third party access restrictions (this would require a new API)
- Make it possible to use any repo as a template via the API (this is potentially controversial)
If those conditions aren't met, the teacher could still fork (manually) to make it work.
Cons:
- If someone changes the visibility of the repository after the assignment is created, students could still encounter bad states
- Same as previous, but about third party access restrictions
- This doesn't take advantage of Third Party Org permissions (shared between students and teachers) but simplifies the model, and in that case it would be ideal if that third party org was the classroom org
FYI @jeffrafter we should add this row to the table due to #2234:
Owner | Public/Private | Students can see/use the template |
---|---|---|
Third party org[^2] | Public | No (can see, cannot use) |
I agree with @jeffrafter, I don't think this forking solution is the way to go anymore in light of the barriers we keep finding with forks.
I think splitting the starter code repository from the teacher inputting it (their own repo) to a different repo adds a lot of confusion. If the repo is already on the org we couldn't fork it and would reference the original repo, which adds inconsistency to this process.
- If private, check the owner (it must be the classroom org)
If the owner is a user (and private) we could add a button to transfer the repo to the org through classroom. They'd still have to manually accept the transfer on the org side through the web UI but we could provide a link to their org to accept it easily.
If someone changes the visibility of the repository after the assignment is created, students could still encounter bad states
We could look for 404s when creating the student repo and give a detailed error message:
We could not find the starter code repository. This is most likely because the repository was made private and not all of the organization owners have access to it. If you are the teacher, you should make the repository public or transfer the ownership to the classroom organization.
Keep in mind for this issue to occur, the classrooms needs to:
- Have multiple organization admins
- Use a private personal repository as starter code
I think adding the forking idea would add a lot of confusion to all assignments, just to fix a specific scenario. I prefer the other method of enforcing private repos belonging to orgs and flashing a detailed message if they change the visibility after. This adds a little extra work (but not unexpected) for teachers in this specific case rather than a lot of confusion to all.
If the owner is a user (and private) we could add a button to transfer the repo to the org through classroom. They'd still have to manually accept the transfer on the org side through the web UI but we could provide a link to their org to accept it easily.
Oh, I really like that idea. We could even make this button just open GitHub's transfer page.
We could not find the starter code repository. This is most likely because the repository was made private and not all of the organization owners have access to it. If you are the teacher, you should make the repository public or transfer the ownership to the classroom organization.
💯 Better error messages are so helpful
I'm :-1: for one major reason: If I'm understanding correctly, this solution doesn't let us share starter code repos among classrooms if they aren't in the same org. Teachers very often share starter code repositories among classrooms in different organizations, often having a shared faculty organization with these shared teaching resources. or, teachers are free to use starter code repositories shared from other teachers. Requiring transferring ownership to use a starter code repo goes against the idea of "shared teaching resources" that I'd love to move towards in the future.
@d12 that makes sense:
Teachers very often share starter code repositories among classrooms in different organizations, often having a shared faculty organization with these shared teaching resources. or, teachers are free to use starter code repositories shared from other teachers.
In those cases are the students typically members of the third party org or are the shared repositories public?
Not based off of what I've seen, teachers often have "teacher only" orgs for sharing internal resources.
Make it possible to use any repo as a template via the API (this is potentially controversial)
If this could happen, it'd solve all our problems 🤞
@d12 that's a good point. What do you think about just providing detailed failure/error reports to the user like I mentioned in https://github.com/education/classroom/issues/2222#issuecomment-521460371? I personally think it's still worth providing a failure reason for this specific case rather than confusing the situation for all cases with forks.
@EricPickup The failure messages would be good, but if they're surfaced when students accept a repo and not when teachers create the assignment, it's going to cause a lot of trouble for teachers.
Update: I've put up a PR for a back-end solution to this in #2286 and have proposed a front-end method of preventing this in #2285.
Possible solutions:
- We could work around rate limiting via a GitHub App.
- Or, warn teacher that it's a private repo
Maybe we ask teachers to fork or move their repo into the organization
We can tell which repos which repos will run into this
- Is the repo private
- Does the organization have more than one admin
Precursor:
- Measure how often teachers are using private repos where the teacher is the owner and it's not in the org? Let's create a dump of historical data for this.
- Design: Do we want to fork it for them, or do we want to direct them to GitHub to complete the action.
Some preliminary data gathered from the GitHub Classroom production console and octokit (limited it to just the past month for now because this query takes a long time to run):
- 4180 total assignments created in the past month (individual and group)
- 3075 with a repo starter code
- 2014 assignments whose starter repo is private
- 1604 assignments whose private starter repo belongs to an org
- 410 assignments whose private starter repo belongs to a user (so it won't be accessible to any other org member)
- 26 assignments whose private starter repo wasn't accessible at all (most likely because it was deleted or the creator's token expired)
(script to generate this data: https://gist.github.com/spinecone/9f6517129e06c0d2e40a16301d433878)
Followup steps:
- Adding a datadog metric tracking when an assignment is created that belongs to a user rather than an organization
- When a private repo belonging to a user is selected as the starter code repo for an assignment, show a message to the user that says:
We will soon only be allowing starter code repositories that
belong to your classroom organization to be used for assignments.
If you have any feedback about this change, let us know at [an issue url]