grimoirelab
grimoirelab copied to clipboard
Support for projects hierarchy
Wouldn't it be nice having some kind of projects hierarchy in GrimoireLab?
Currently GrimoireLab only supports 1 level hierarchy according to projects.json example and the existing documentation. So we have Project > Repositories
In Bestiary is being introduced the Ecosystem level, so 2 levels are gonna be supported (sort of): Ecosytem > Project > Repositories
Is it gonna support Ecosystem > Project > ... > Project > Repositories
? Do we need that deepth? This might be also related with #71
I would like also knowing if the community would agree on the following assumptions:
- Do we expect that some same repositories might be in different projects or ecosystems? I would say "no, by design".
- Might some projects be under no specific ecosystem? I would say "no, by design".
- Might some repositories be under no specific project? I would say "no, by design".
- Project names might be not unique. For example, several ecosystems might have a "Documentation" project. This should not be an issue.
Thanks for the comments.
Is it gonna support Ecosystem > Project > ... > Project > Repositories? Do we need that deepth?
We need it. Projects like Eclipse
have a complex hierarchy of subprojects.
Do we expect that some same repositories might be in different projects or ecosystems?
Yes. Some user migth be interested on having a dashboard about a set of projects and another dashboard that includes some projects of that set. For instance, I'm thinking of "topic dashboards". You can have analyzing "Apache" activity but another one that analyzes "Web Servers". This one will include some of the repositories you already have in "Apache" dashboard. In this context, It doesn't make sense to retrieve the same repository several times.
Might some projects be under no specific ecosystem?
Why not? Ecosystem is only a way for grouping information at a high level. You wouldn't need it for analyzing a bunch of projects.
Might some repositories be under no specific project?
Same as above.
Project names might be not unique. For example, several ecosystems might have a "Documentation" project. This should not be an issue.
It shouldn't be.
Do we expect that some same repositories might be in different projects or ecosystems?
Yes. Some user migth be interested on having a dashboard about a set of projects and another dashboard that includes some projects of that set. For instance, I'm thinking of "topic dashboards". You can have analyzing "Apache" activity but another one that analyzes "Web Servers". This one will include some of the repositories you already have in "Apache" dashboard. In this context, It doesn't make sense to retrieve the same repository several times.
Although I find the interest of this, it breaks the (current) assumption that a repository is in a single place in the hierarchy. This allows us (currently) to tag all its items (eg, commits) with a single project name.
If we go the way you propose, we would need to tag items with lists of projects (at any level in the hierarchy). I'm not sure how this would affect to visualizations (eg, to tables with projects as rows). Anyone have tested something similar?
Do we expect that some same repositories might be in different projects or ecosystems?
Yes. Some user migth be interested on having a dashboard about a set of projects and another dashboard that includes some projects of that set. For instance, I'm thinking of "topic dashboards". You can have analyzing "Apache" activity but another one that analyzes "Web Servers". This one will include some of the repositories you already have in "Apache" dashboard. In this context, It doesn't make sense to retrieve the same repository several times.
I understand your point @sduenas. Currently, GrimoireLab only shows a dashboard by projects.json file, and I was thinking about a similar use case for ecosystem.json file or similar approach. Perhaps it's my lack of knowledge about how things are counted if repositories or projects definition are duplicated.
Might some projects be under no specific ecosystem?
Why not? Ecosystem is only a way for grouping information at a high level. You wouldn't need it for analyzing a bunch of projects.
Well, I understand. I was thinking about it as a way to define an scope of analysis. At least it could be My bunch of projects > Projects
, so it's some kind of ecosystem
I can't help thinking that this is potentially solvable as part of a more general solution to issue #37 i.e. "ecosystem" may just be a property in the indexes.
To put it another way, if #37 was possible today, this issue would already be solvable via a custom metadata property called "ecosystem", that users of Grimoire would be responsible for populating.
If we go the way you propose, we would need to tag items with lists of projects (at any level in the hierarchy). I'm not sure how this would affect to visualizations (eg, to tables with projects as rows). Anyone have tested something similar?
Having a field in an enriched index that is a list has no issues when managing it from panels (or in general, from Elasticsearch). For example, in some items we have a list of tags applied to it, and there is no problem in filtering using this list.
To put it another way, if #37 was possible today, this issue would already be solvable via a custom metadata property called "ecosystem", that users of Grimoire would be responsible for populating.
But we need to take this meta information and use it during data processing. If a field is part of a clear defined data schema (like projects grouping is inside Bestiary), it should not be a dynamic metadata field, but a concrete field that always exists. For data is more "schema less" I find the metamodel more interesting because its dynamic mature.
But we need to take this meta information and use it during data processing.
Why? Why can't it just be a "dumb" data value that gets added to every document associated to that project in the ES index?
If a field is part of a clear defined data schema (like projects grouping is inside Bestiary), it should not be a dynamic metadata field, but a concrete field that always exists.
I get that, but why does it need to be a schema-defined field? Why can't it just be one (of potentially many) dynamic fields? Does this restrict how the field is used / visualised in Kibana, for example?
Why? Why can't it just be a "dumb" data value that gets added to every document associated to that project in the ES index?
This is what I call data processing in this case, so we are aligned.
About the schema-defined field, I think about metadata fields as basic dynamic extensions of the data model. But the data model (in Bestiary there is a data model defined in Django ORM) is a fixed one and it is always there.