feat: Store blobs in subdirs with random names (#4309)
Close #4309 Related discussions: #4555, #4583. TODO: If this is merged, consider reverting #5778.
imu, the PR just always adds a suffix to the stem, not just sometimes
Exactly
deltachat.h still says "File name may be mangled" - but that should be "File name will be mangled", or?
Yes, forgot to move the documentation changes to this PR from the follow-up one. EDIT: Done.
wondering what is left in #4583 - i am a little confused by the the two PR :)
It removes the original file stem at all, we don't need it as we have dc_msg_get_filename() and the app should use it when exporting attachments (maybe not only, but at least). And also the next PR fixes #5338.
It's still a question what to do if we need to display a file (with random name) in some external program. This could be solved by preserving file names, but storing files in dirs having random names to avoid copying. But afaik there are plans to encrypt the whole blobstorage (especially on Desktop because there's no sandboxing), so copying can't be avoided anyway.
ideally, before this gets merged, imu UIs should check their usage of
dc_msg_get_file()and change "affected" code to BLOBDATA+dc_msg_get_filename()ordc_msg_save_file()- again imu, UIs can test that already now by forcing a suffix by sending the same file twice @iequidoo can you please check/refine and file issues for that in all UIs? this is useful in any case
I think this one can be merged even before switching all UIs to the new API (though it can break sending multipart archives as @adbenitez noted), but definitely not the next PR. I can check Desktop and Android on my own, maybe also iOS, but i don't use it and never looked into its sources :)
So far i see the following tasks for the UIs:
- Android: switch to using
dc_msg_save_file()for saving attachments. - Desktop: the same (jsonrpc's
save_msg_file()).
(I'm going to file tasks, but need to study the apps code a bit more. DONE: https://github.com/deltachat/deltachat-desktop/issues/3851, https://github.com/deltachat/deltachat-android/issues/3091.) But they look unrelated to the PR so it can be merged. The more important question is
... what to do if we need to display a file (with random name) in some external program. This could be solved by preserving file names, but storing files in dirs having random names to avoid copying. But afaik there are plans to encrypt the whole blobstorage (especially on Desktop because there's no sandboxing), so copying can't be avoided anyway.
This is likely will never be merged because it breaks opening attachments in external programs (they will always have weird names). The previous approach (copying to a tmp file) failed as well, we don't want to do extra work in general and also it's difficult to maintain that UIs don't cause blob copying when it's not needed, e.g. the UI may want to get the blob path to display an image, this mustn't cause copying as the path isn't passed to an external program, so we need two separate API functions for this.
So my suggestion is to save blobs to separate dirs having random names. These dirs should be garbage-collected in housekeeping (as far as i can tell this is already done in remove_unused_files()). Note that for Desktop, if we plan to encrypt the whole blobstorage there, copying/exporting blobs still will be needed at least in case of passing them to external programs. @adbenitez also mentioned that there's a case of multipart archives, but i think it should be fine to export blobs to a separate dir before passing them to an unarchiver? And we already have an API for this -- CommandApi::save_msg_file().
I think if we change blob storage, then we should already go for a solution that has support for deduplication and encryption.
I think it could make sense to have a second sqlite db (maybe even with a different page size more optimised for storing files, or compressed like in https://sqlite.org/sqlar.html)
Then that sqlite db could have two tables, one with filenames and hash and one with hash, size and content or sth like that. Or just an SQLite Archive (as described in https://sqlite.org/sqlar.html) archive with hashes as filenames and reusing the blobs table in the existing database.
-> https://sqlite.org/fasterthanfs.html
I think if we change blob storage, then we should already go for a solution that has support for deduplication and encryption.
I still think this may be excessive on platforms with apps sandboxing like Android. And this makes useless the fs-level dedup if any (my Android just uses ext4, but maybe there are devices using Btrfs or so, though i don't see it on https://source.android.com/docs/core/architecture/android-kernel-file-system-support). Fs-level dedup is useful if files are exported outside of Delta Chat.
If we want to use SQLite Archive, then maybe just switch to it? Reading the "Disadvantages" section on https://sqlite.org/sqlar.html, i don't see any critical ones. Still measurements make sense. And on some platforms we still may want to store blobs directly on fs, moreover there may be already a block device encryption or fs-level encryption.
For now i suggest to just store blobs in separate dirs, it's not that huge change and should resolve all issues with file names intersection. EDIT: Done.
regarding #6265 it is no longer clear that it makes sense to create lots of subdirectories inside the blobdirectory, tbh, i have the gut feeling that it is better to leave things flat.
beside of outgoing attachments, things seems to be solved mostly. and for outgoing attachments, as of https://github.com/deltachat/deltachat-android/issues/3091#issuecomment-2500838453 , this PR is not strictly necessary as UI anyway has to create a directory
@Hocuri @iequidoo i therefore suggest to close this PR and cherry-pick things or re-opening as needed when targeting #6265 , subsequent discussion is also maybe better done there, so things stay focused
and for outgoing attachments, as of deltachat/deltachat-android#3091 (comment) , this PR is not strictly necessary as UI anyway has to create a directory
This PR allows to create a directory right inside the blobdir which allows in turn:
- To preserve the original filename so that no new API is needed to pass it to Core.
- To avoid in-core copying to the blobdir when sending.
Another thing solved here is passing attachments to external programs w/o copying but with original filename. Probably this isn't useful on all platforms, but on Linux it is. #6265 isn't about this.
EDIT: Citing a part of https://github.com/deltachat/deltachat-core-rust/issues/6265#issuecomment-2507824615 to make it more clear why the PR is closed:
#5495 would also only prevent copying if at the same time, things are set read-only, which is not that easy as it sounds iirc.
also, the copying is not that much of an issue as it affects exporting files only - not showing images or playing audio/videos inside delta chat. exporting is not done that often and only on direct user action, taking anyways a moment
EDIT: Just in case if we reopen this later, we can use file hashes instead of random subdir names, maybe that will help to implement #6265.
as said, let's see if this PR gets revived or cherry picked.
To preserve the original filename so that no new API is needed to pass it to Core
this is also possible without the help of core, UI could also use temp directory for that, also with paths if needed
To avoid in-core copying to the blobdir when sending [...] passing attachments to external programs w/o copying
that might be an advantages, yes. but it is not clear if that justifies the additional complexity of subdirectories in core.
as said, maybe we should go that way, i closed this PR mainly as that part is not decided yet - and should not be regarded "as settled" when thinking about the new issue https://github.com/deltachat/deltachat-core-rust/issues/6265