sourced-ce icon indicating copy to clipboard operation
sourced-ce copied to clipboard

Support GitHub users in `orgs` command

Open marnovo opened this issue 6 years ago • 7 comments

Right now the orgs command seems to only support, well, Github orgs, but definitely a common (given most people don't own orgs) and interesting (given you might want to check your Github "profile" in-depth) use case is to try it on your own Github user. Is it easy enough to extend orgs to also cover individual users?

marnovo avatar Jun 26 '19 13:06 marnovo

About being possible, I'd say yes; I'd maybe change "org" by "owner", being able to be either a "user" or an "org"; doing so we would also avoid problems if the user becomes an org at any point.

But: with "org", we fetch metadata from its members. with "user", we won't fetch that metadata.

But I'm not sure what's the purpose of getting the org members. If the purpose is to assign the activity in the repos, to its members, then there will be some activity that won't be assigned (because it will belong to gh users that won't be members of that org, so they won't be imported; example: one issue opened in bblfsh by a non bblfsh member, won't be assigned to any user in our DB)

If we need to get the info about all the users contributing in a repo (like the example above), we should also fetch:

  1. all gh users having activity in that repo, and not being members of that repo org,
  2. try to find gh users from repo commits (to be able to assign commits to users, not only gh activity).

If we import also repos from users, as suggested by this issue, the activity in their repos won't be assigned to another user than the imported user, unles we also do (1) and (2).

dpordomingo avatar Jun 27 '19 09:06 dpordomingo

@marnovo even technically it's not that different from org but the results might be very unexpected for users and we should do something about it. Problems I see:

  • half (or more) of the repos I have and any other dev in src-d are forks. Similar happens for external devs. The problem with forks: nobody updates master. Most of our charts rely on the HEAD so repos would produce results only to the moment when they were forked
  • there are no issues or pull requests in forks, all metadata charts will become useless

As a solution for user command, I would propose to resolve forks and download code/metadata for the original repo. Even in some cases (example) it would make more sense to download the fork, but such cases are exceptions.

smacker avatar Jun 27 '19 10:06 smacker

I wouldn't do it automatically but maybe with options: --use-parent, to use the parent repo instead, or --add-parent to fetch both: original, and parent; or even fully ignore forks with --no-forks as requested by @warenlg at https://github.com/src-d/sourced-ce/issues/109 Or also --exclude and pass a list of repos to be ignored (in case of repos causing konwn fails, o whatever other reasons) This way everything would be more explicit, what I think would be better, and more flexible.

dpordomingo avatar Jun 27 '19 11:06 dpordomingo

I'd love to have this feature, and I also think that it would increase a lot the chance of being tried by people.

BTW regarding forks I agree that there could be different needs depending on the user. But in general I think that it's either --ignore-forks or not. If the user is interested in resolving forks with original repo then maybe it's more straightforward to just initialize sourced-ce with the owner (whether it is an org or a user) of that original repo and maybe provide some filtering capabilities such as init orgs apache --repositories=incubator-superset.

Also because the repositories that are most likely to be forked are popular ones, and including popular repos together with mine, I think that it will just hide a lot of insights as it will add a lot of noise.

se7entyse7en avatar Jun 27 '19 16:06 se7entyse7en

Agree with Marvin for most of the points. Though I would want to remind that not everybody (I don't have numbers but most probably it's a majority of github users) don't have real repositories that aren't forks and aren't dump of some code (for a school or workshop or something like that). So analyzing only the profile doesn't make sense for them at all. Exploring the information about repositories they contributed to, on another hand, can be interesting.

smacker avatar Jun 27 '19 16:06 smacker

Though I would want to remind that not everybody (I don't have numbers but most probably it's a majority of github users) don't have real repositories that aren't forks and aren't dump of some code (for a school or workshop or something like that).

I don't know whether is the majority of the users, but you're absolutely right about this type of users, I didn't think about it. I'm just wondering how this type of users is likely to use a tool like this for their forked repos, but this is a different point.

se7entyse7en avatar Jun 27 '19 16:06 se7entyse7en

All very good points. Effectively the underlying use case and technical questions for personal users may be quite different from orgs in the end vs. just a matter of conforming to the API…

marnovo avatar Jun 27 '19 16:06 marnovo