Lychee icon indicating copy to clipboard operation
Lychee copied to clipboard

Deterministic photo links (e.g. hash / filename)

Open twatzl opened this issue 6 years ago • 10 comments

Hello,

I want to use Lychee as a photo library for my photo blog. I am going to host the blog in a dockerized environment at some service provider, however I would like to save the effort of having to backup the database, so my idea was to just put the photos in a folder on my PC and if something happens to the server I would just upload them again.

However I have noticed that the links seem to be not deterministic. So if I start Lychee, upload some photos and then delete everything again start another new Lychee instance and upload the same photos the link will be different than the first time.

Of course this is a problem if I embed the photos and the links change afterwards.

Is there anyway to configure Lychee in a way that the links given to the photos will be deterministic? Is it for example possible to include the original picture name in the link (which are mostly unique in my photo collection) or have 'human readable' links?

twatzl avatar Jul 24 '19 22:07 twatzl

At the moment the photo URLs are based on photo ID, which in turn is based on upload time. It would be possible to change the ID to e.g. a file hash, but this would have impact all over the codebase. A static link is more likely. As such I'll alter the title and move this to the Lychee-Laravel repository, where development is ongoing. Soon we will be migrating to that version for the v4 release.

d7415 avatar Jul 24 '19 22:07 d7415

Thank you. I might look into it if I have time, but I don't want to promise anything.

twatzl avatar Jul 25 '19 07:07 twatzl

what you can do, is:

  1. take the hash of the file.
  2. truncate it to it's first 16 characters (64 bits), because this is a truncation, we do not mess with the randomness of each bit.
  3. convert the hexadecimal string to an integer and use it as the ID.

Beware, after uploading 2^32 pictures (~4 000 000 000) you have high risk of collisions (two images having the same ID).

I would suggest you add a setting (disabled by default) which decide whether to use time or hash to generate the ID.

// get the hash
$hash = sha1(rand());

// truncate
$v = substr($hash,0,16);

// convert to int
$va = base_convert($v,16,10);

// print hash (substr)
echo $v;
echo "||";
// print int
echo $va;
echo "||";
// check it fits in 64 bits.
echo log($va,2);

ildyria avatar Jul 26 '19 08:07 ildyria

Slowly but surely I think I understand the solution. The basic idea is simple to understand, but I am trying to make sure that I also understand the details. However what I don't understand is why you would think that after 4 mio. pictures there would be a high risk of collisions? Is this a rule of thumb that after 50 percent of the keys are used the risk of collision gets higher?

On the other hand how likely is it for someone to really have 4 mio pictures? Or that Lychee would still be able to handle that much.

twatzl avatar Jul 28 '19 22:07 twatzl

after 4 mio. pictures there would be a high risk of collisions?

4 billion

Is this a rule of thumb that after 50 percent of the keys are used the risk of collision gets higher?

This might be a good start. According to that, at 4 billion photos the risk of a collision is about 50%.

On the other hand how likely is it for someone to really have 4 ~mio~ billion pictures?

Not very, but worth considering

Or that Lychee would still be able to handle that much.

Depending on the resources available and how they were distributed, I could see this working. I'm a little tempted to work out some stress test once the CLI import is ready and I'm familiar with it...

d7415 avatar Jul 29 '19 09:07 d7415

Is this a rule of thumb that after 50 percent of the keys are used the risk of collision gets higher?

And to complete what @d7415 said, it is not 50%. The number of possible keys is 2^64, 4 billion is 2^32, so it is half the exponent. Half the key space would be 2^63. ;)

ildyria avatar Jul 29 '19 09:07 ildyria

Depending on the resources available and how they were distributed, I could see this working. I'm a little tempted to work out some stress test once the CLI import is ready and I'm familiar with it...

Well, we do get complaints from people with 1000+ photos in an album but my guess is that it's the front end that's the bottleneck...

kamil4 avatar Jul 29 '19 14:07 kamil4

Oh yeah sorry. Miscounted a few of the zeros yesterday. So if it is 4 billion then it is even less likely. I think I am taking photos now since 2013. I am taking many but i have not yet reached the 1 mio. mark.

However I think regardless of the count there would have to be some mechanism to detect and avoid identical hashes. What do you think?

twatzl avatar Jul 29 '19 17:07 twatzl

However I think regardless of the count there would have to be some mechanism to detect and avoid identical hashes. What do you think?

You mean like this one? :laughing: https://github.com/LycheeOrg/Lychee-Laravel/blob/master/app/ModelFunctions/PhotoFunctions.php#L491

ildyria avatar Jul 29 '19 18:07 ildyria

Oh yeah sorry. Miscounted a few of the zeros yesterday. So if it is 4 billion then it is even less likely. I think I am taking photos now since 2013. I am taking many but i have not yet reached the 1 mio. mark.

You may assume a multi-user set-up. You can rack up a lot of pictures pretty quickly (with a lot of users but still. :) )

Also note that we are safe... because we are using a 64-bit index. If we used a normal 32-bit one then the thresholds gets down to 65 536 pictures...

ildyria avatar Jul 29 '19 18:07 ildyria