torako icon indicating copy to clipboard operation
torako copied to clipboard

Move sha256sum field off _images to the board table

Open antonizoon opened this issue 4 years ago • 2 comments

In a discussion with other archive admins, we determined that to fulfill the purpose of protection against hash collision, the sha256sum hash and the sha256 thumb field should not be in the _images table.

Instead, the sha256sum hash fields should be in the <board> table. This is because actually, if the sha256sum fields was moved there, there is no need for an _images table at all, as even the md5sum and original seen 4chan filename is in the <board> table.

This therefore ensures protection against hash collision, because otherwise md5sum is the only way to link to the images table.

Since there have not yet been large scale deployments of torako yet, hopefully this is not too huge of a schema change to move a field between tables.

Please jump on the group chat to discuss with us further.

antonizoon avatar Feb 08 '21 02:02 antonizoon

We are also considering using a different hash for the thumb, because hash collision protection is no longer as much of a problem with thumbs, and sha256sum is rather long (big problem for indexes).

Probably sha1sum, as it is hardware accelerated in Intel and AMD SHA hardware extensions, and only 8 characters longer than md5.

Let us know what you think.

These are the sizes of various hash algorithms when placed in MySQL binary field.

https://stackoverflow.com/a/16680423

antonizoon avatar Feb 08 '21 03:02 antonizoon

This can be done in a backwards compatible manner:

Step 1

Add the new columns (media_sha256 and preview_sha256) to the board table.

Step 2

Rewrite frontend software to pull images from the board table.

Step 3

Torako stops writing to images table

w.r.t. hash collisions, I think sha256 everywhere is fine; if it's stored as a binary type it's only 32 bits (the same length as storing md5 as a string).

miyachan avatar Feb 10 '21 06:02 miyachan