sourced-ce
sourced-ce copied to clipboard
Allow easy repo management
Most of the idea(s) can be found in this Slack thread, but basically, I would like to be able to manage and especially exclude repositories with ease. I know this can also be done by adding filters on superset, but as a user I want something easier.
Feature proposals
- Add an
--exclude
flag on the command line. This would be especially useful when importing repos from organizations, but also when repo are stored locally, as one may just have centralized all repos but only want to analyze a part, and doesn't want or is not allowed to move them. This could take in atxt
file, or just repo names - Have this option available as well on the UI. The reasoning is that one may have already launched sourced-ce, it took a bit time to compute metrics, and suddenly he sees he forgot to remove some repo.
- Add an
--exclude-forks
boolean flag, that would exclude by default all forks. There is already another issue for this. - Have a setting that would enable analysis of forked repositories data only after the moment it was forked
Brainstorming the entry points where the exclusion list could (in theory) be set:
- Docker compose
- CLI flag:
- Repo name(s) as args
- File(s) with repo name list as arg(s)
- Web UI
Any other?
I assume this would have to take place before/during the init
, right? So probably 1 and 2 above are more likely?
- Docker compose: then we simply do not mount the repo(s) concerned on the volume
- / 3. Gitbase will have to do the work after being informed, either by dropping the data from it's database if it's already launched, or adding this excluding functionality if it is not.
I don't really see any other entry points, but think this should be doable at any point, not only before or during the init
, as the functionality could prove useful during data exploration.
I'd say that my other answer fits here.
We can start with a flag for cli but according to my experience, it would be much more useful to filter out repositories from UI.
I run srcd-ce without forks on src-d organization. After it downloaded all the data I saw some strange data in the charts. I quickly identified that go-vitess
repository is the reason. It's not marked as a fork on github but it is a fork. The point is: a user, just like me, would often identify what should be excluded only AFTER init.