Search problem appeared in last 5 days or so
This query returns no records for the last 5 days - https://apps.fedoraproject.org/datagrepper/v2/search?user=bretth&delta=1000000&topic=org.fedoraproject.prod.kerneltest.upload.new
The same query without the "user=bretth" returns a message from bretth uploaded in the last 2 hours - https://apps.fedoraproject.org/datagrepper/v2/id?id=2022-effc9ab8-5f44-4373-b5a3-c716181d270e&is_raw=true&size=extra-large
The "user=" query has worked fine for ages but now appears to not find recent messages.
I am seeing the same query behaviour on the "org.fedoraproject.prod.fedbadges.badge.award" topic as well
Happy to try to help out, any thoughts on figuring this out? It would be helpful to see the records in the datanommer db - is there another way to do that?
By a quirk of fate, I am a Java coder but am Python literate so I can work at source code level if that helps.
Thanks
I just noticed that userid searches on the "org.fedoraproject.prod.bodhi.update.comment" topic appear normal ie find recent records. The userid (bretth) appears in the title line of the records found, the title line for kerneltest records do not contain userid - unsure if that is relevant.
Making some progress. I've got a datagrepper VM running, shifted the data source from staging to production, captured some messages and done some snooping in the datanommer database with psql.
It appears the success of the "user=" query is related to there being an entry in the "users" table for that user.
Current working hypthosesis - Bodhi messages seem to create an entry in the user table and kerneltest messages do not.
Still digging...
Indeed, I suspect that the kerneltest messages don't have a schema in fedora-messaging, and as a result datanommer does not know how to extract the username from the messages. It would explain why the user table is not populated.
Writing a schema is not difficult for a python programmer, we can add that for kerneltest, but adding the missing schemas has not been very high on our priorities yet. Hopefully this bug report will bump it.
Thanks for your investigation.
Thanks for the reply.
I started looking into adding schemas and realised that to test the solution would require changes to the schema in the kerneltest.upload header (and also the schema in the badge.award header - badge award messages have the same problem with user= query).
I've written a quick fix by adding these lines to the usernames() method in fedora_messaging/message.py: if "agent" in self.body: return [self.body["agent"]] if "user" in self.body and isinstance(self.body["user"],dict): return [self.body["user"]["username"]] With that fix, user= queries now work for kerneltest.upload and badge.award messages.
Just thinking, would changing the base message to cover some simple cases save the effort of writing multiple trivial schemas and changing message headers in multiple messages? Or is that too much of a hack?