grimoirelab-perceval
grimoirelab-perceval copied to clipboard
Classified fields in Perceval docs should correspond to the fields actually removed
When executing the github backend with the option --filter-classified
, the list of all classified fields is reported in each JSON document (pointer). Thus, it isn't possible to derive which fields were removed from a given document, if the latter didn't contain one of the classified fields. It would be useful to adapt the code to include in the classified_fields_filtered only the fields that have been removed.
The idea of these fields is all or nothing. You don't decide which fields you want to remove and which you don't want.
Is there anything I'm missing?
Based on the classified fields declared at https://github.com/chaoss/grimoirelab-perceval/blob/master/perceval/backends/core/github.py#L102, for each document only the classified fields actually removed from it should appear in the classified_fields_filtered
attribute.
Having a look at the code, I'm basically saying that in case of a KeyError, that classified field shouldn't be added to the classified_fields_filtered
attribute. The reason is that with the document obtained, it isn't possible to know if that classified field was removed or didn't exist.
Is that really necessary? What would be the difference? In the end, data is not going to be there which is what we really want with that option.
We can live with it, so feel free to close this issue.
The point is that the classified fileds at https://github.com/chaoss/grimoirelab-perceval/blob/master/perceval/backends/core/github.py#L102 include a mix of attributes present in issues and pull requests. It would be better to have classified fields per category, then we can decide whether to include in the classified_fields_filtered
of each document the fields actually removed.