web-tools icon indicating copy to clipboard operation
web-tools copied to clipboard

emoji queries don't filter results correctly

Open rahulbot opened this issue 4 years ago • 3 comments

Dennis reports some people are using "🍑" in place of the term "impeachment" (ha!). So we tried to do a comparison search buy the 🍑 query is just returning all stories. Lets figure out where in the stack this is breaking.

Sample query:

https://explorer.mediacloud.org/#/queries/search?qs=%5B%7B%22label%22%3A%22impeach*%22%2C%22q%22%3A%22impeach*%22%2C%22color%22%3A%22%231f77b4%22%2C%22startDate%22%3A%222019-09-22%22%2C%22endDate%22%3A%222019-10-22%22%2C%22sources%22%3A%5B%5D%2C%22collections%22%3A%5B186572516%5D%2C%22searches%22%3A%5B%5D%7D%2C%7B%22label%22%3A%22🍑%22%2C%22q%22%3A%22🍑%22%2C%22color%22%3A%22%23B80000%22%2C%22startDate%22%3A%222019-09-22%22%2C%22endDate%22%3A%222019-10-22%22%2C%22sources%22%3A%5B%5D%2C%22collections%22%3A%5B186572516%5D%2C%22searches%22%3A%5B%5D%7D%5D

rahulbot avatar Oct 23 '19 18:10 rahulbot

at this link it says the correct encoding is U%2b1F351 but the filter doesn't work

cindyloo avatar Jan 06 '20 16:01 cindyloo

Query API for total stories returns 296,628 stories:

https://api.mediacloud.org/api/v2/stories_public/count?q=((%20tags_id_media:(186572516)))&fq=publish_day:[2019-09-22T00:00:00Z%20TO%202019-10-22T00:00:00Z]&key=a1fc017a2e58c2d1a0dd93ee6962ab097574ba4ae711c7722e863a615f0b0a60

Querying the API with the unencoded emoji produces the same result (296,628 stories) :-(

https://api.mediacloud.org/api/v2/stories_public/count?q=(🍑)%20AND%20((%20tags_id_media:(186572516)))&fq=publish_day:[2019-09-22T00:00:00Z%20TO%202019-10-22T00:00:00Z]&key=a1fc017a2e58c2d1a0dd93ee6962ab097574ba4ae711c7722e863a615f0b0a60

Querying the API with the encoded emoji produces more promising results (2105):

https://api.mediacloud.org/api/v2/stories_public/count?q=(U%2b1F351)%20AND%20((%20tags_id_media:(186572516)))&fq=publish_day:[2019-09-22T00:00:00Z%20TO%202019-10-22T00:00:00Z]&key=a1fc017a2e58c2d1a0dd93ee6962ab097574ba4ae711c7722e863a615f0b0a60

So perhaps the fix here is for our Python API client library to encode the parameters better before sending them off to the back-end.

rahulbot avatar Jun 24 '20 20:06 rahulbot

Marked as "blocked" because we can't do anything until we fix our api-client library.

rahulbot avatar Jun 26 '20 00:06 rahulbot