core icon indicating copy to clipboard operation
core copied to clipboard

feat: Store blobs in subdirs with random names (#4309)

Open iequidoo opened this issue 1 year ago • 5 comments

Close #4309 Related discussions: #4555, #4583. TODO: If this is merged, consider reverting #5778.

iequidoo avatar Apr 25 '24 02:04 iequidoo

imu, the PR just always adds a suffix to the stem, not just sometimes

Exactly

deltachat.h still says "File name may be mangled" - but that should be "File name will be mangled", or?

Yes, forgot to move the documentation changes to this PR from the follow-up one. EDIT: Done.

wondering what is left in #4583 - i am a little confused by the the two PR :)

It removes the original file stem at all, we don't need it as we have dc_msg_get_filename() and the app should use it when exporting attachments (maybe not only, but at least). And also the next PR fixes #5338.

It's still a question what to do if we need to display a file (with random name) in some external program. This could be solved by preserving file names, but storing files in dirs having random names to avoid copying. But afaik there are plans to encrypt the whole blobstorage (especially on Desktop because there's no sandboxing), so copying can't be avoided anyway.

ideally, before this gets merged, imu UIs should check their usage of dc_msg_get_file() and change "affected" code to BLOBDATA+dc_msg_get_filename() or dc_msg_save_file() - again imu, UIs can test that already now by forcing a suffix by sending the same file twice @iequidoo can you please check/refine and file issues for that in all UIs? this is useful in any case

I think this one can be merged even before switching all UIs to the new API (though it can break sending multipart archives as @adbenitez noted), but definitely not the next PR. I can check Desktop and Android on my own, maybe also iOS, but i don't use it and never looked into its sources :)

iequidoo avatar Apr 29 '24 04:04 iequidoo

So far i see the following tasks for the UIs:

  • Android: switch to using dc_msg_save_file() for saving attachments.
  • Desktop: the same (jsonrpc's save_msg_file()).

(I'm going to file tasks, but need to study the apps code a bit more. DONE: https://github.com/deltachat/deltachat-desktop/issues/3851, https://github.com/deltachat/deltachat-android/issues/3091.) But they look unrelated to the PR so it can be merged. The more important question is

... what to do if we need to display a file (with random name) in some external program. This could be solved by preserving file names, but storing files in dirs having random names to avoid copying. But afaik there are plans to encrypt the whole blobstorage (especially on Desktop because there's no sandboxing), so copying can't be avoided anyway.

iequidoo avatar May 23 '24 19:05 iequidoo

This is likely will never be merged because it breaks opening attachments in external programs (they will always have weird names). The previous approach (copying to a tmp file) failed as well, we don't want to do extra work in general and also it's difficult to maintain that UIs don't cause blob copying when it's not needed, e.g. the UI may want to get the blob path to display an image, this mustn't cause copying as the path isn't passed to an external program, so we need two separate API functions for this.

So my suggestion is to save blobs to separate dirs having random names. These dirs should be garbage-collected in housekeeping (as far as i can tell this is already done in remove_unused_files()). Note that for Desktop, if we plan to encrypt the whole blobstorage there, copying/exporting blobs still will be needed at least in case of passing them to external programs. @adbenitez also mentioned that there's a case of multipart archives, but i think it should be fine to export blobs to a separate dir before passing them to an unarchiver? And we already have an API for this -- CommandApi::save_msg_file().

iequidoo avatar Aug 30 '24 17:08 iequidoo

I think if we change blob storage, then we should already go for a solution that has support for deduplication and encryption.

I think it could make sense to have a second sqlite db (maybe even with a different page size more optimised for storing files, or compressed like in https://sqlite.org/sqlar.html)

Then that sqlite db could have two tables, one with filenames and hash and one with hash, size and content or sth like that. Or just an SQLite Archive (as described in https://sqlite.org/sqlar.html) archive with hashes as filenames and reusing the blobs table in the existing database.

-> https://sqlite.org/fasterthanfs.html

Simon-Laux avatar Sep 01 '24 02:09 Simon-Laux

I think if we change blob storage, then we should already go for a solution that has support for deduplication and encryption.

I still think this may be excessive on platforms with apps sandboxing like Android. And this makes useless the fs-level dedup if any (my Android just uses ext4, but maybe there are devices using Btrfs or so, though i don't see it on https://source.android.com/docs/core/architecture/android-kernel-file-system-support). Fs-level dedup is useful if files are exported outside of Delta Chat.

If we want to use SQLite Archive, then maybe just switch to it? Reading the "Disadvantages" section on https://sqlite.org/sqlar.html, i don't see any critical ones. Still measurements make sense. And on some platforms we still may want to store blobs directly on fs, moreover there may be already a block device encryption or fs-level encryption.

For now i suggest to just store blobs in separate dirs, it's not that huge change and should resolve all issues with file names intersection. EDIT: Done.

iequidoo avatar Sep 27 '24 20:09 iequidoo

regarding #6265 it is no longer clear that it makes sense to create lots of subdirectories inside the blobdirectory, tbh, i have the gut feeling that it is better to leave things flat.

beside of outgoing attachments, things seems to be solved mostly. and for outgoing attachments, as of https://github.com/deltachat/deltachat-android/issues/3091#issuecomment-2500838453 , this PR is not strictly necessary as UI anyway has to create a directory

@Hocuri @iequidoo i therefore suggest to close this PR and cherry-pick things or re-opening as needed when targeting #6265 , subsequent discussion is also maybe better done there, so things stay focused

r10s avatar Nov 29 '24 13:11 r10s

and for outgoing attachments, as of deltachat/deltachat-android#3091 (comment) , this PR is not strictly necessary as UI anyway has to create a directory

This PR allows to create a directory right inside the blobdir which allows in turn:

  • To preserve the original filename so that no new API is needed to pass it to Core.
  • To avoid in-core copying to the blobdir when sending.

Another thing solved here is passing attachments to external programs w/o copying but with original filename. Probably this isn't useful on all platforms, but on Linux it is. #6265 isn't about this.

EDIT: Citing a part of https://github.com/deltachat/deltachat-core-rust/issues/6265#issuecomment-2507824615 to make it more clear why the PR is closed:

#5495 would also only prevent copying if at the same time, things are set read-only, which is not that easy as it sounds iirc.

also, the copying is not that much of an issue as it affects exporting files only - not showing images or playing audio/videos inside delta chat. exporting is not done that often and only on direct user action, taking anyways a moment

EDIT: Just in case if we reopen this later, we can use file hashes instead of random subdir names, maybe that will help to implement #6265.

iequidoo avatar Nov 29 '24 15:11 iequidoo

as said, let's see if this PR gets revived or cherry picked.

To preserve the original filename so that no new API is needed to pass it to Core

this is also possible without the help of core, UI could also use temp directory for that, also with paths if needed

To avoid in-core copying to the blobdir when sending [...] passing attachments to external programs w/o copying

that might be an advantages, yes. but it is not clear if that justifies the additional complexity of subdirectories in core.

as said, maybe we should go that way, i closed this PR mainly as that part is not decided yet - and should not be regarded "as settled" when thinking about the new issue https://github.com/deltachat/deltachat-core-rust/issues/6265

r10s avatar Nov 29 '24 16:11 r10s