Media Library - Image URLs
Is your feature request related to a problem? Please describe.
As of right now when an asset is uploaded to beacon, it uses the filename from the user's computer with the following url structure.
/__beacon_media__/user-filename.webp
As it is currently structured this can cause an issue if someone accidentally uploads two different files with the same name. (i.e. Java the programming language vs the Island) This will try to resolve to the same file and we will get an internal server error.
Describe the solution you'd like We should use the following url structure instead:
/__beacon_media__/images/:asset_id
/__beacon_media__/images/:asset_id/:alias
In this case alias is actually ignored by the backend so you could technically write anything but we would just default to the uploaded file's filename. Right now there's no way to update the filename, but ideally that value doesn't matter, it is just for SEO. I would also likely advise against having the file extension directly in the url. You could imagine that we would want the backend to transform images or render different file formats based on
See https://developers.cloudflare.com/images/transform-images/transform-via-url/ for inspiration.
(Cloudflare does this within the url, but it makes more sense to do it in params IMO)
Something like myblog.com/__beacon_media__/images/:uuid/beacon?format=webp&width=200&height=200 seems pretty future proof to me, but I would be open to hear any objections.
I've also prefixed the route with /images in case we want the media library to serve other assets like video, pdfs, word docs, excel files, etc.
We should probably also keep the original user upload around somewhere if we want to do additional transforms other than converting to webp.
Additional context
Hi @tomciopp sorry for the very late reply. Those are valid concerns and good suggestions, and runtime transformations like a service/worker is an interesting idea. I do have some concerns/questions tho:
-
It seems like those generated files should be stored somewhere otherwise the same transformation would run for any client that doesn't have the cached version on their end yet, ie: it should cache the transformed file on the backend.
-
I'm not sure how that could work well with img srcset - which is recommended for responsive images. Let's say someone uploads an image, then a asset processor creates all the variants to be used in a img srcset. Eventually someone wants to
blurthat original image likemyblog.com/__beacon_media__/images/:uuid/beacon?format=webp&blur=50. Should it apply the same transformation to all variants? Or maybe the user wants to blur only one of the variants? -
How to avoid abuses and attacks? Let's say your site has a 1gb video and now with transformations via url anyone can trigger some kind of processing on that file, which opens the door for denial of service attacks very easily, for example. We can only expose such transformation endpoint along with protective measures, which could be using queues, identifying abusers, whitelist IPs, whitelist which options are valid for each media type, etc.
1.) Typically this setup is run behind a cdn so the files are only generated once. So if an image is not available to the public it will be generated by the backend and then stored in the cdn.
2.) A lot of these transformations (blur, filter, etc.) should be handled client side by CSS. However each separate parameter should be treated like a separate image. In srcset the values could be ?format=webp&blur=50 and ?format=webp&blur=50&size=thumb. The image manipulation is handled dynamically as needed based on the requests made by the user. The user does not lose any flexibility based on this setup and can choose to transform the images any way they wish.
3.) This is a legitimate concern and looking at prior art the easiest way to solve this problem is with an additional path which cryptographically signs the url. You can take a look at the docs for imgproxy to see how they've solved the problem. Essentially we know the secret and salt so we can generate a signature within the url and check against it before doing any complex tasks. If the signature does not match we can return a 404 without doing any heavy lifting.