web-tools
web-tools copied to clipboard
emoji queries don't filter results correctly
Dennis reports some people are using "🍑" in place of the term "impeachment" (ha!). So we tried to do a comparison search buy the 🍑 query is just returning all stories. Lets figure out where in the stack this is breaking.
Sample query:
https://explorer.mediacloud.org/#/queries/search?qs=%5B%7B%22label%22%3A%22impeach*%22%2C%22q%22%3A%22impeach*%22%2C%22color%22%3A%22%231f77b4%22%2C%22startDate%22%3A%222019-09-22%22%2C%22endDate%22%3A%222019-10-22%22%2C%22sources%22%3A%5B%5D%2C%22collections%22%3A%5B186572516%5D%2C%22searches%22%3A%5B%5D%7D%2C%7B%22label%22%3A%22🍑%22%2C%22q%22%3A%22🍑%22%2C%22color%22%3A%22%23B80000%22%2C%22startDate%22%3A%222019-09-22%22%2C%22endDate%22%3A%222019-10-22%22%2C%22sources%22%3A%5B%5D%2C%22collections%22%3A%5B186572516%5D%2C%22searches%22%3A%5B%5D%7D%5D
at this link it says the correct encoding is U%2b1F351 but the filter doesn't work
Query API for total stories returns 296,628 stories:
https://api.mediacloud.org/api/v2/stories_public/count?q=((%20tags_id_media:(186572516)))&fq=publish_day:[2019-09-22T00:00:00Z%20TO%202019-10-22T00:00:00Z]&key=a1fc017a2e58c2d1a0dd93ee6962ab097574ba4ae711c7722e863a615f0b0a60
Querying the API with the unencoded emoji produces the same result (296,628 stories) :-(
https://api.mediacloud.org/api/v2/stories_public/count?q=(🍑)%20AND%20((%20tags_id_media:(186572516)))&fq=publish_day:[2019-09-22T00:00:00Z%20TO%202019-10-22T00:00:00Z]&key=a1fc017a2e58c2d1a0dd93ee6962ab097574ba4ae711c7722e863a615f0b0a60
Querying the API with the encoded emoji produces more promising results (2105):
https://api.mediacloud.org/api/v2/stories_public/count?q=(U%2b1F351)%20AND%20((%20tags_id_media:(186572516)))&fq=publish_day:[2019-09-22T00:00:00Z%20TO%202019-10-22T00:00:00Z]&key=a1fc017a2e58c2d1a0dd93ee6962ab097574ba4ae711c7722e863a615f0b0a60
So perhaps the fix here is for our Python API client library to encode the parameters better before sending them off to the back-end.
Marked as "blocked" because we can't do anything until we fix our api-client library.