hubot-frinkiac icon indicating copy to clipboard operation
hubot-frinkiac copied to clipboard

Random frame selection uses entire API response and causes low quality results

Open justinxreese opened this issue 7 years ago • 0 comments

I've noticed I'm not getting great results using this script, and I did a little research and experimentation to figure out what is happening, using the example search string "pi is exactly."

When asking my bot to frinkiac me "pi is exactly", I was given the following result: image

https://frinkiac.com/meme/S10E09/240856.jpg?lines=A%3A%20THAT%20IS%20NOT%0AWATER.%20%0AIT%20IS%20DIET%20MR.%20PIBB%0AAND%20B%3A...%20OH.%20#.jpg

This comes from the partial query "pi" matching the "pi" in "Dr. Pibb". So while technically a correct result, it's not really what I want. I want Professor Frink yelling to a crowd of scientists that "Pi is exactly three!"

I went through the code and rebuilt the search URL as https://frinkiac.com/api/search?q=pi%20is%20exactly which returns

[{"Id":1672472,"Episode":"S12E16","Timestamp":1115448},{"Id":1672468,"Episode":"S12E16","Timestamp":1114405},{"Id":1672469,"Episode":"S12E16","Timestamp":1114822},{"Id":1672470,"Episode":"S12E16","Timestamp":1114613},{"Id":1672475,"Episode":"S12E16","Timestamp":1115031},{"Id":1672474,"Episode":"S12E16","Timestamp":1115239},{"Id":96455,"Episode":"S02E01","Timestamp":1062088},{"Id":84872,"Episode":"S01E13","Timestamp":141483},{"Id":751984,"Episode":"S06E14","Timestamp":1277959},{"Id":1446869,"Episode":"S10E23","Timestamp":1129544},{"Id":444213,"Episode":"S04E09","Timestamp":1055820},{"Id":114526,"Episode":"S02E04","Timestamp":557210},{"Id":939686,"Episode":"S07E18","Timestamp":188121},{"Id":1274809,"Episode":"S09E20","Timestamp":1248146},{"Id":94503,"Episode":"S02E01","Timestamp":708819},{"Id":1357141,"Episode":"S10E09","Timestamp":242741},{"Id":2208729,"Episode":"S17E01","Timestamp":174257},{"Id":241034,"Episode":"S03E01","Timestamp":193270},{"Id":1357135,"Episode":"S10E09","Timestamp":240856},{"Id":1213913,"Episode":"S09E11","Timestamp":150850},{"Id":1162503,"Episode":"S09E02","Timestamp":801433},{"Id":1914123,"Episode":"S14E15","Timestamp":855396},{"Id":848856,"Episode":"S07E05","Timestamp":84884},{"Id":1712646,"Episode":"S13E02","Timestamp":809934},{"Id":1446868,"Episode":"S10E23","Timestamp":1129127},{"Id":2085428,"Episode":"S04E06","Timestamp":448931},{"Id":2208735,"Episode":"S17E01","Timestamp":175717},{"Id":2054629,"Episode":"S15E19","Timestamp":1173422},{"Id":2054634,"Episode":"S15E19","Timestamp":1174256},{"Id":1446853,"Episode":"S10E23","Timestamp":1127041},{"Id":1712650,"Episode":"S13E02","Timestamp":810768},{"Id":848847,"Episode":"S07E05","Timestamp":83749},{"Id":94513,"Episode":"S02E01","Timestamp":710988},{"Id":1446871,"Episode":"S10E23","Timestamp":1130161},{"Id":1869686,"Episode":"S14E07","Timestamp":428303},{"Id":18420,"Episode":"S01E03","Timestamp":1165732}]

The first six results are all from the moment I want. Second result is the perfect https://frinkiac.com/meme/S12E16/1114405.jpg. I also notice that the one I received back is included in the results. This all goes to show me that the frinkiac API response is sorted by relevancy, so choosing from the entire 35 memes in the response dataset is more likely to cause a lower quality result.

image

By changing frame = Math.floor(Math.random() * response.data.length) to frame = Math.floor(Math.random() * 4), I was able to get much more consistent results (S12E16 is the right episode).

supplybot> https://frinkiac.com/meme/S12E16/1114822.jpg?lines=and%20the%20paying%0Aof%20attention.%20%0APi%20is%20exactly%20three.%0A%0A%28%20gasping%29%20#.jpg

supplybot> @supplybot frinkiac pi is exactly
supplybot> https://frinkiac.com/meme/S12E16/1114613.jpg?lines=and%20the%20paying%0Aof%20attention.%20%0APi%20is%20exactly%20three.%0A%0A%28%20gasping%29%20#.jpg

supplybot> @supplybot frinkiac pi is exactly
supplybot> https://frinkiac.com/meme/S12E16/1115448.jpg?lines=and%20the%20paying%0Aof%20attention.%20%0APi%20is%20exactly%20three.%0A%0A%28%20gasping%29%20%0AVery%20sorry%20that%20it%20had%0Ato%20come%20to%20that.%20#.jpg

supplybot> @supplybot frinkiac pi is exactly
supplybot> https://frinkiac.com/meme/S12E16/1114613.jpg?lines=and%20the%20paying%0Aof%20attention.%20%0APi%20is%20exactly%20three.%0A%0A%28%20gasping%29%20#.jpg

supplybot> @supplybot frinkiac pi is exactly
supplybot> https://frinkiac.com/meme/S12E16/1114405.jpg?lines=and%20the%20paying%0Aof%20attention.%20%0APi%20is%20exactly%20three.%0A%0A%28%20gasping%29%20#.jpg

I chose 4 arbitrarily and sometimes there may not be enough results for that to not cause a crash. But conceptually, using a smaller number seems better. I could put in a PR that uses a lower number and covers potential errors, but wanted to get an idea from the maintainers on the level of randomness that is needed/acceptable.

justinxreese avatar Nov 27 '17 20:11 justinxreese