cimage
cimage copied to clipboard
Save image based on path to image
Hello, is it possible to save the images based on the requested path to them? My cache folder started to grow really big.
Example: https://example.site/img.php?src=uploads/avatars/234234234/avatar22.png Cache path should be: /images/cache/uploads/avatars/234234234/avatar22-cached-picture-something-234234.png
It is related to the cache dir growing?
A crontab script monitoring and removing old files is one way to deal with it.
Option. One can (pre)process and save images as an "alias" image and use it instead of direct access to img.php. https://cimage.se/doc/config-file#alias
Could you elaborate a bit more on your perseived problem?
Yes so i dont have problem with the disk space, but the folder itself has really big amount of files and i cant even "ls" in it. I dont need to delete old files but i would like to be able to put the cached image in the cache folder with the same structure as from the url i will try to explain again.
Cache folder: /images/resized_cache/
Request uri: https://example.site/img.php?src=uploads/avatars/234234234/avatar22.png
Now is saved like this: /images/resized_cache/avatar22-something-2342343.png
I would like to be like this: /images/resized_cache/uploads/avatars/234234234/avatar22-something-2342343.png
Ok, so you would like to configure how the structure is created in the cache dir? Instead of saving all files under cache/ you would prefer it used a subdirectory structure, the same way your original files are stored? And this could/should be an option one could configure in the configfile?
That could be done. I had some thoughts on this earlier but decided to stick with a flat file structure (easier to implement).
From an interest, what would you gain from having this structure? Nice to have or real benefit?
Big +1!
We also have a huge cache directory (amount of files). On some servers this (huge amount of files in one single directory) could slow down things and cause problems.
Will it be enough to use a directory structure, in the long term, or should one look into a solution where many images goes into one file (and one index-file)? Like this: https://code.facebook.com/posts/685565858139515/needle-in-a-haystack-efficient-storage-of-billions-of-photos/ Perhaps both and start with directory structure and se how long it will be enough.
One could also consider using SQLite for smaller images: https://www.sqlite.org/fasterthanfs.html
Still, I would like a straight forward solution, without to much hassle.
I think a directory structure would be the most pragmatic solution in my opinion. For my needs the directory structure would solve it!
I think that the directory structure will work best.
Any progress so far regarding this issue? :)
My cache directory is holding 900k of files so far. I have a cronjob enabled deleting cache files older than 365 days, but the directory is still growing. Splitting the cache directory down - like mentioned above - would be super helpful!
Not much progress, no. I checked my own cache dir and the largest only contains 5k files, not much in comparison to 900k. Nice to know we have some heavy users/usage out there.
I refreshed my memory and gave it some new thougth though. I have no definitive answer for now. I need to sleep on it.
@mosbth We are also very much in need of such a feature as well since our cache directory contains more than a million files by now.
So, huge +1 from us as well.
Do you have any plans on integrating the suggested feature? :)
Plans exists yes. Then, its that thing, with time... and being able to prioritize among all other stuff one has on his magic agenda to conquer the world.
The plan it to build an alternative cache-structure that mirrors the directory structure of the img/
folder. This alternative structure can be turned on in the config file, its default off (to start with).
The directory structure will look something like this:
Source image:
img/image.png
Cache structure:
cache/image.png/h700-w300-cf.png
cache/image.png/h700-w300-cf-a=0,0,50,0.png
So, each image will have its own cache directory where all its cached versions are saved.
I am not sure how well this scales with a million images, but it might make it a tad easier to keep track on files in the cache directory and to get a visual overview of its content.
This should also make it possible to find stray images in the cache, that is images removed/moved in the img/
folder but still remains in the cache/
folder.
There is also the existing cache/fasttrack
, where a hit goes straight to the cached image, that should, most likely, go into the new directory structure.
For those having a cache with a million files, one could guess that some kind of transfer process is needed, from old cache to new cache.
So that is the current, waiting to be implemented.
For stats, I looked in my largest website using cimage.
$ du -sk htdocs/img
442296 htdocs/img
$ du -sk cache/cimage/
586516 cache/cimage/
$ ls -R1 htdocs/img/ | wc -l
3059
$ ls -1 cache/cimage/ | wc -l
6237
$ ls -1 cache/cimage/fasttrack/ | wc -l
7050
As you see, I have a pretty small cache when comparing to a million cached files.
Anyway, I guess I should be real happy to see that some of you are using cimage to an extent of sites with a million cached files - that is really nice to know. Real nice. :-)
@mosbth ah great, see things issue getting picked up!
Right now we are having almost 2 (!) million files in the cache directory. It is starting to get nasty :) I don't think that your pointed out solution is the greatest for cimage applications with a lot if single files (as it is in our case) - since it would also create a LOT of subfolders in the /cache/ directory. I would rather prefer a "time based" directory structure in the cache directory.
cache/2018/11/ cache/2019/01/ cache/2019/02/
Again: we are super happy with cimage serving so many image files to your users! Thank you for your great work here @mosbth !
@flobox @Surf-N-Code A quick question, when you say you have millions files in the cache, how many files do you have in the source img/
?
@flobox Yes, a time based structure could be an alternative. I would prefer avoiding the need of creating too many sub directories.
This is how the cache currently works, a bit simplified and excluding the usage of HTTP cache settings which further decreases the need to actually run cimage to process the request.
- Image url is incoming,
/img/image.png?w=700&h=300&cf
. - Create a MD5 key of the string
img/image.png?w=700&h=300&cf
. - Check the
cache/fasttrack/${key}.json
. - If a hit, load the json file, get the path to the cache file and serve it. Done.
- No hit in the fasttrack, process the request through cimage.
- Create a new cache file for the request.
- Save a new entry to the fasttrack.
- Serve the cached image.
The obvious improvement I see is to limit the need of files in the fasttrack directory. This can be reduced to 1 json file per source image, instead of 1 file per "request string" as it is now. These files are small.
The fasttrack could be implemented as a SQLite database, this limits the fasttrack files to 1 and likely adds some time for lookup.
The amount of cached image files could perhaps be limited through rules allowing how to access cimage.
Maybe there is some limited opportunity to reduce the number of files, when one image is the exact copy of another image, but the request url is different. This implies some more processing in cimage, or perhaps some improvement to the code. Anyway, the improvement is most likely not much.
For general reference, I do assume that there is no actual hard limit, that we are close to reaching, even with millons of cache files, on how many files we can have in a single directory (using ext4) (source).
Another conclusion from the same source is that there is no notable difference in performance, having a directory with millions of files compared or 10 files. At least, not in the way cimage is using the files in the cache dir.
I'm trying to wrap my head around "why do we (really) want this", sort of asking "5 Whys" to get to the root cause of it. That feels like a good exercise before coding away...
So, this far I have:
- Physical limit on the filesystem (NO)
- Performance improvement related to number of files/directories or its structure (NO)
- General improvements in how many files exists in the cache (YES, mainly related to
cache/fasttrack
which can be reduced to 1 json-file per source image, pointing out the actual cache images). - Replace
cache/fasttrack
with a SQLite database (NO, reduces the files but most likely adds lookup time). - A more user friendly cache for visual inspection (YES, through directory structure mirroring
img/
) - A time based directory structure (YES, for visual inspection, ease of cleanup and perhaps backup)
- Reduce the number of files, as a way to reduce the amount of data stored (NO, a good intention but not really an issue).
- General cleanup and monitoring issues for the cache, keeping track of old files or not used files (NO, would be nice but not yet pointed out as an issue).
- Working with the cache through
ls
,find
(NO, not pointed out as an issue).
Anything to add to the list?
@flobox @Surf-N-Code A quick question, when you say you have millions files in the cache, how many files do you have in the source
img/
?
@mosbth in the img/
we also have more than 1 million files, BUT they are structured in subfolders!
For general reference, I do assume that there is no actual hard limit, that we are close to reaching, even with millons of cache files) on how many files we can have in a single directory (using ext4) (source).
You are absolutely right on this. It is just getting a little bit unwieldy with that many files in the cache directory in one directory.
Anything to add to the list?
Great list! Nope. Nothing to add from my side.