openverse-api
openverse-api copied to clipboard
Random image endpoint
Problem
Many of the images in Openverse, particularly from Flickr, have never actually been viewed by someone before.
Description
It could be compelling to help people find images that have never been viewed before. A /random endpoint could be a cool way to do this. We could use the random_scoring feature in ElasticSearch to facilitate this.
Are we psychically connected? I was thinking about this last night!
It'd be nice to add some parameters to it to confine it optionally. For example, it'd be cool to be able to do /images/random?providers=... or even better /images/random?provider_group=glam for example. Maybe both!
Whatever we use could also be used to create a random/daily endpoint as well that caches the response for 24 hours. This could power a "Picture of the day" kind of feature (could be used to make Openverse a provider for KDE's POTD desktop plugin or those various "new tab" plugins that show a nice new picture every day).
It might be possible to use the same set of filters as search for this random image endpoint. Filters like resolution and maybe even license, considering how attribution is not viable in wallpapers, would be very useful.
In https://github.com/WordPress/openverse-api/pull/554#issuecomment-1065142995 @zackkrida noted that license and license_type parameters can conflict in some sense. If you use the commerical license type filter but then add a non-commercial license filter, what should it do?
My gut tells me that the license_type is essentially a mask of licenses and it should just do something like this:
licenses_to_search = get_licenses_for_group(license_type) + licenses_from_param
And call it a day. This eliminates the potential for conflict, needing to resolve anything, and allows user refined searches that still benefit from the concept of filter groups.
For example, I believe the following should be a valid search (and I can easily imagine a use case for it):
/v1/images/?provider_group=glam&provider=non-glam-provider
You could want the GLAM designated providers and also to include a specific provider that is not in that group for whatever reason.
Did some research on this and built a local demo to test the functionality. The key is to use a function_score query which wraps the original search query and then multiplies the random scores with the document's own scores, effectively shuffling them.
s.query = Q("function_score", query=s.query, random_score={})
@dhruvkb PR time! 😆 Hah, kidding mostly, but I really would love to look into how much work this would take to expand on and launch. The documentation for Unsplash's similar endpoint might be helpful there.
I have it working pretty well locally using the code snippet above but I'm waiting for #696 and #699 to be merged first.