grimoirelab-perceval icon indicating copy to clipboard operation
grimoirelab-perceval copied to clipboard

Classified fields in Perceval docs should correspond to the fields actually removed

Open valeriocos opened this issue 5 years ago • 4 comments

When executing the github backend with the option --filter-classified, the list of all classified fields is reported in each JSON document (pointer). Thus, it isn't possible to derive which fields were removed from a given document, if the latter didn't contain one of the classified fields. It would be useful to adapt the code to include in the classified_fields_filtered only the fields that have been removed.

valeriocos avatar Feb 06 '20 11:02 valeriocos

The idea of these fields is all or nothing. You don't decide which fields you want to remove and which you don't want.

Is there anything I'm missing?

sduenas avatar Feb 06 '20 11:02 sduenas

Based on the classified fields declared at https://github.com/chaoss/grimoirelab-perceval/blob/master/perceval/backends/core/github.py#L102, for each document only the classified fields actually removed from it should appear in the classified_fields_filtered attribute.

Having a look at the code, I'm basically saying that in case of a KeyError, that classified field shouldn't be added to the classified_fields_filtered attribute. The reason is that with the document obtained, it isn't possible to know if that classified field was removed or didn't exist.

valeriocos avatar Feb 06 '20 12:02 valeriocos

Is that really necessary? What would be the difference? In the end, data is not going to be there which is what we really want with that option.

sduenas avatar Feb 06 '20 20:02 sduenas

We can live with it, so feel free to close this issue.

The point is that the classified fileds at https://github.com/chaoss/grimoirelab-perceval/blob/master/perceval/backends/core/github.py#L102 include a mix of attributes present in issues and pull requests. It would be better to have classified fields per category, then we can decide whether to include in the classified_fields_filtered of each document the fields actually removed.

valeriocos avatar Feb 08 '20 11:02 valeriocos