hubot-frinkiac
hubot-frinkiac copied to clipboard
Random frame selection uses entire API response and causes low quality results
I've noticed I'm not getting great results using this script, and I did a little research and experimentation to figure out what is happening, using the example search string "pi is exactly."
When asking my bot to frinkiac me "pi is exactly", I was given the following result:
https://frinkiac.com/meme/S10E09/240856.jpg?lines=A%3A%20THAT%20IS%20NOT%0AWATER.%20%0AIT%20IS%20DIET%20MR.%20PIBB%0AAND%20B%3A...%20OH.%20#.jpg
This comes from the partial query "pi" matching the "pi" in "Dr. Pibb". So while technically a correct result, it's not really what I want. I want Professor Frink yelling to a crowd of scientists that "Pi is exactly three!"
I went through the code and rebuilt the search URL as https://frinkiac.com/api/search?q=pi%20is%20exactly
which returns
[{"Id":1672472,"Episode":"S12E16","Timestamp":1115448},{"Id":1672468,"Episode":"S12E16","Timestamp":1114405},{"Id":1672469,"Episode":"S12E16","Timestamp":1114822},{"Id":1672470,"Episode":"S12E16","Timestamp":1114613},{"Id":1672475,"Episode":"S12E16","Timestamp":1115031},{"Id":1672474,"Episode":"S12E16","Timestamp":1115239},{"Id":96455,"Episode":"S02E01","Timestamp":1062088},{"Id":84872,"Episode":"S01E13","Timestamp":141483},{"Id":751984,"Episode":"S06E14","Timestamp":1277959},{"Id":1446869,"Episode":"S10E23","Timestamp":1129544},{"Id":444213,"Episode":"S04E09","Timestamp":1055820},{"Id":114526,"Episode":"S02E04","Timestamp":557210},{"Id":939686,"Episode":"S07E18","Timestamp":188121},{"Id":1274809,"Episode":"S09E20","Timestamp":1248146},{"Id":94503,"Episode":"S02E01","Timestamp":708819},{"Id":1357141,"Episode":"S10E09","Timestamp":242741},{"Id":2208729,"Episode":"S17E01","Timestamp":174257},{"Id":241034,"Episode":"S03E01","Timestamp":193270},{"Id":1357135,"Episode":"S10E09","Timestamp":240856},{"Id":1213913,"Episode":"S09E11","Timestamp":150850},{"Id":1162503,"Episode":"S09E02","Timestamp":801433},{"Id":1914123,"Episode":"S14E15","Timestamp":855396},{"Id":848856,"Episode":"S07E05","Timestamp":84884},{"Id":1712646,"Episode":"S13E02","Timestamp":809934},{"Id":1446868,"Episode":"S10E23","Timestamp":1129127},{"Id":2085428,"Episode":"S04E06","Timestamp":448931},{"Id":2208735,"Episode":"S17E01","Timestamp":175717},{"Id":2054629,"Episode":"S15E19","Timestamp":1173422},{"Id":2054634,"Episode":"S15E19","Timestamp":1174256},{"Id":1446853,"Episode":"S10E23","Timestamp":1127041},{"Id":1712650,"Episode":"S13E02","Timestamp":810768},{"Id":848847,"Episode":"S07E05","Timestamp":83749},{"Id":94513,"Episode":"S02E01","Timestamp":710988},{"Id":1446871,"Episode":"S10E23","Timestamp":1130161},{"Id":1869686,"Episode":"S14E07","Timestamp":428303},{"Id":18420,"Episode":"S01E03","Timestamp":1165732}]
The first six results are all from the moment I want. Second result is the perfect https://frinkiac.com/meme/S12E16/1114405.jpg
. I also notice that the one I received back is included in the results. This all goes to show me that the frinkiac API response is sorted by relevancy, so choosing from the entire 35 memes in the response dataset is more likely to cause a lower quality result.
By changing
frame = Math.floor(Math.random() * response.data.length)
to frame = Math.floor(Math.random() * 4)
, I was able to get much more consistent results (S12E16 is the right episode).
supplybot> https://frinkiac.com/meme/S12E16/1114822.jpg?lines=and%20the%20paying%0Aof%20attention.%20%0APi%20is%20exactly%20three.%0A%0A%28%20gasping%29%20#.jpg
supplybot> @supplybot frinkiac pi is exactly
supplybot> https://frinkiac.com/meme/S12E16/1114613.jpg?lines=and%20the%20paying%0Aof%20attention.%20%0APi%20is%20exactly%20three.%0A%0A%28%20gasping%29%20#.jpg
supplybot> @supplybot frinkiac pi is exactly
supplybot> https://frinkiac.com/meme/S12E16/1115448.jpg?lines=and%20the%20paying%0Aof%20attention.%20%0APi%20is%20exactly%20three.%0A%0A%28%20gasping%29%20%0AVery%20sorry%20that%20it%20had%0Ato%20come%20to%20that.%20#.jpg
supplybot> @supplybot frinkiac pi is exactly
supplybot> https://frinkiac.com/meme/S12E16/1114613.jpg?lines=and%20the%20paying%0Aof%20attention.%20%0APi%20is%20exactly%20three.%0A%0A%28%20gasping%29%20#.jpg
supplybot> @supplybot frinkiac pi is exactly
supplybot> https://frinkiac.com/meme/S12E16/1114405.jpg?lines=and%20the%20paying%0Aof%20attention.%20%0APi%20is%20exactly%20three.%0A%0A%28%20gasping%29%20#.jpg
I chose 4 arbitrarily and sometimes there may not be enough results for that to not cause a crash. But conceptually, using a smaller number seems better. I could put in a PR that uses a lower number and covers potential errors, but wanted to get an idea from the maintainers on the level of randomness that is needed/acceptable.