deltachat-core icon indicating copy to clipboard operation
deltachat-core copied to clipboard

Extensive memory usage on backup import

Open csb0730 opened this issue 6 years ago • 46 comments

This is the continuitation of last comments to "Limit retained history" #244. While discussing that this issue here appeared.

csb0730 avatar Sep 05 '18 23:09 csb0730

Last test with a new binary which uses following code in dc_imex.c

dc_sqlite3_execute(context->sql, "DROP TABLE backup_blobs;"); dc_sqlite3_execute(context->sql, "END TRANSACTION;"); dc_sqlite3_execute(context->sql, "VACUUM;");

shows following result in logcat:

09-06 00:22:53.971 E/DeltaChat(17171): Cannot excecute "END TRANSACTION;". SQLite says: cannot commit - no transaction is active
09-06 00:22:54.062 E/DeltaChat(17171): Cannot excecute "VACUUM;". SQLite says: disk I/O error
09-06 00:22:54.062 I/DeltaChat(17171): Rewriting paths from '/data/data/com.b44t.messenger.beta/files/messenger.db-blobs' to '/data/data/com.b44t.messenger.beta/files/messenger.db-blobs' ...
09-06 00:23:34.773 I/DeltaChat(17171): -------------------- on receive timer --------------------
09-06 00:24:34.813 I/DeltaChat(17171): -------------------- on receive timer --------------------
09-06 00:24:54.782 I/DeltaChat(17171): Import/export completed.

Some background informations:

  • I imported a backup database of 1GB while having free space of 2.6 GB before import starts.
  • After importing db I see a memory consumption of dc of around 1.93GB !
  • Reloading db to PC by adb directly after import shows a db of exactly the same size as the backup (1GB). ==> means that vacuum didn't work after importing backup and space isn't free'd (can be seen at logcat).

csb0730 avatar Sep 05 '18 23:09 csb0730

As I can estimate the memory consumption while importing backup, see this consideration:

  • starting point: backup db size 1GB
  • while importing db, this is copied so we come to a summary memory consumption of ==> 2GB (messenger.db + backup.db)
  • now files are copied to directory (95% of db space needed to copy files out of db into blob dir) ==> 2GB + 0.95GB = 2.95GB in summary (messenger.db + backup.db + exported file attachments).
  • now db wants to execute vacuum: ==> 2.95GB already committed and now we need another ~1GB for doing vacuum (!): ===> 3.95GB reached !
  • If all processes running good we should see a messenger.db of around 100MB and a summary of memory consumption of ~1GB or a little more.

At my device I see after importing backup db a memory consumption of 1.93GB and a messenger.db of 1GB !

Conclusion: In spite of free memory of 2.6GB at device it's not possible to import a backup db of 1GB size!

Can anyone confirm this behaviour and memory consumption? What's the best approach to handle this process better?

csb0730 avatar Sep 05 '18 23:09 csb0730

At PC, pulled messenger.db (adb) can be vauum'd (but not with the pure command directly!) First I need to launch "end transaction" then "vacuum" ! That works and shrinks db from 1GB to 138MB !

csb0730 avatar Sep 05 '18 23:09 csb0730

09-06 00:22:54.062 E/DeltaChat(17171): Cannot excecute "VACUUM;". SQLite says: disk I/O error

i think, this is the important line. VACUUM needs at least as many free space as the size of the database. obviously even a bit more.

i think, the solution is quite clear and described already here, https://github.com/deltachat/deltachat-core/issues/244#issuecomment-417848121:

this could be optimized by importing to a new database using ATTACH and INSERT INTO so that a final VACUUM is not needed and temporary less space is needed, however, before we optimize on this, imo things as #149 should be solved.

r10s avatar Sep 06 '18 11:09 r10s

As a containment action I pushed the external vacuum'd db back to the device. I hope this really helps because I'm not able to check the db into the device and device still shows the old memory consumption of 1.93GB (this could be wrong now, or not?) I don't know :-(

Regarding solution: It could be a way too, to use a separate database file for dealing with the attachments. Maybe easier.

UPDATE: Achieved now to check the original db at device with adb :-) The bad message is, that even with adp push ... and no error message, the vacuum'd smaller db didn't replaced the big existing one of 1GB at device! This means, that workaround by "backup -> external vacuum -> push back db again" is not working as expected :-(

Will add a small howto how doing these steps at wiki.

csb0730 avatar Sep 06 '18 14:09 csb0730

Regarding solution: It could be a way too, to use a separate database file for dealing with the attachments. Maybe easier.

easier to program, maybe, but more difficult for the user who has to copy two files. i think putting both into the same file is fine.

another idea that just came into my mind, is to put the original sqlite-database as a blob into the database that already contains all the images as blobs. might be very simple and straight-forward (but incompatible to the existing backups, so one needs to leave the old import code at least for a while)

r10s avatar Sep 06 '18 14:09 r10s

First one word to the following line: 09-06 00:22:53.971 E/DeltaChat(17171): Cannot excecute "END TRANSACTION;". SQLite says: cannot commit - no transaction is active You see the word commit here. I described in detail that this is not possible in that moment with the PC client even! I needed to do "end transaction" explicitly and not "commit"! To see this here is strange because I do "end transaction" explicitly in the code!

csb0730 avatar Sep 07 '18 23:09 csb0730

Regarding the problem: There are two possible solutions to reduce or prevent the issue:

  1. prevent extensive memory consumption on backup import - and
  2. move or use external storage for backup file(s)

In any case it could be a good idea to do a vacuum of the database before backup because then only the text messages has to be vacuum'd and not all blob data. This would prevent from extensive memory consumption in case of vacuum.

Beside that it would be a good idea to store (or at least import) a backup from the external SD card. So this would save internal memory at any case.

In the moment all memory, for backup file and all other stuff has to be provided by internal memory of the device! And that leads (in my case with an old phone) to the limited possibilities at the end!

We can attack 1) and 2) separately :-)

csb0730 avatar Sep 08 '18 00:09 csb0730

I needed to do "end transaction" explicitly and not "commit"!

END TRANSACTION is the same as COMMIT.

also, a transaction can only be commited from the same process. it is not possible to start a transaction on one device A and commit it on device B - if the transaction fails on devcie A, it won't be present on device B.

r10s avatar Sep 08 '18 00:09 r10s

Oh, we are online ;-) I did the experience with sqlitebrowser and an offline file. From the documentation point of view it is the same, but from the reality there seems to be a difference! I wondered but I tested that ! :-/ At PC: "end transaction" worked. "commit" raised an error!

csb0730 avatar Sep 08 '18 00:09 csb0730

Additional informations regarding used file and software for the trials:

  • File: backup file from dc at download directory of device (copied to PC)
  • Software: sqlitebrowser from debian linux

csb0730 avatar Sep 08 '18 00:09 csb0730

Ok You proposed that for future backup file:

[blob of original database file][attachments, ...]

At first point of view this should be easy too to implement.

But at the end ?

Update: This would have at least the disadvantage that then it would not be possible to directly examine the database backup with an sqlitebrowser regarding messages (but this is the only disadvantage I see now).

csb0730 avatar Sep 08 '18 00:09 csb0730

At PC: "end transaction" worked. "commit" raised an error!

hm. the sqlite doc says the one is an alias for the other: https://www.sqlite.org/lang_transaction.html

r10s avatar Sep 08 '18 08:09 r10s

yes, I know. I was surprised too. but if you look at log excerpt you can see that "end transaction" doesn't seem to work at the device in the same Situation.

Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.

csb0730 avatar Sep 08 '18 11:09 csb0730

@r10s The more I think about your idea the better I find it :) But see the updated comment above!

But too, if we would do now copy the messenger.db as a blob into the database then we are in a situation where we simply save files into an archive file. And we come directly to next thought: Why not using zip or similar to do that?

  • Advantage: file can be encoded that algorithm (zip) (future, would help #149).
  • Advantage: original db can be easily exported from that with a file manager
  • Disadvantage: I suspect that we need an additional library for that into the project

Ok now enough brain storming. What's the lowest hanging fruit? :-)

Last sentence: A pure vacuum should be implemented manually and beside the backup function.

csb0730 avatar Sep 08 '18 15:09 csb0730

Another comment to the log a few days ago:

09-06 00:22:53.971 E/DeltaChat(17171): Cannot excecute "END TRANSACTION;". SQLite says: cannot commit - no transaction is active
09-06 00:22:54.062 E/DeltaChat(17171): Cannot excecute "VACUUM;". SQLite says: disk I/O error

It wonders me why the time between first and second log is so short while in reality at PC I saw such a message around half a minute later while the rollback file of sqlite is growing (until disk is full). Ok PC has a slow harddisk but I would await at least several seconds because there are 600MB to commit... Strange!

csb0730 avatar Sep 11 '18 22:09 csb0730

In any case I find this problem really important and will focus on that now. At phones with big internal memory not important (maybe) but on phones with moderate memory sooner or later a K.O. criteria.

Three proposals on the table right now:

  1. Using two files for backup. One for message database, second for attachment files.
  2. Copy original message db as a blob to backup database.
  3. Copy original message db and attachments to a non database file (zip?)

Every approach has it's own advantages/disadvantages. To keep the advantage to look directly into database I vote (a little bit more) for 1. or 3.

csb0730 avatar Sep 11 '18 22:09 csb0730

For all who need it and until a technical solution is done here an intermediate containment solution how to get messenger.db vacuum'd at external PC by use of adb:

compress messenger.db at external PC

First of all stop delta chat by app manger and go offline.

  1. At device activate developer options and usb debugging
  2. Connect device by usb cable with PC
  3. launch "adb pull /data/data/com.b44t.messenger.beta/files/messenger.db" This downloads db to local pc current dir.
  4. open db at pc with (an) sqlitebrowser launch sql "end transaction;" launch sql "vacuum;" close db and save all changes

Some hints: "adb push ..." directly to /data/data ... is not possible. Therefore several additional steps are needed:

Launch following commands at shell:

  1. adb push messenger.db /sdcard/Download/messenger.db
  2. adb shell
  3. run-as com.b44t.messenger.beta
  4. cd files
  5. mv messenger.db messenger1.db (or delete file by rm if space is short!)
  6. mv /sdcard/Download/messenger.db ./messenger.db
  7. delete intermediate files by rm /sdcard/Download/messenger.db ./messenger1.db (if not done or delted before)

That's it, You can start dc now again

Now vacuumed shrinked db is in use and storage is available again.

:-)

csb0730 avatar Sep 15 '18 22:09 csb0730

it seems to be the following:

  • N mb: bak-file to import in the downloads directory. as this is never deleted by delta, this additional space is always required. the bak-file includes all blobs.
  • plus N mb: copy the bak-file to the internal directory
  • plus <N mb: write the blobs to separate files

after that, the import is done and can be used.

but the copy of the bak-file is larger as needed - the blobs are deleted from db, however, a sqlite-vacuum is needed to free up the space. this vacuum requires again at least +N mb, possibly up to 2*N mb.

if this succeeds, only N mb is used internally by delta.

it's esp. the vacuum that needs much additional space.

possible fixes:

  1. if we decide to go for saving blobs directly in the database, vacuum is no longer needed.

  2. as an alternative, the import could be done by insert-into commands instead of copying the bak-file, however, this requires some more work as the database has to be reconstructed.

r10s avatar Jan 14 '19 12:01 r10s

  1. alternative: using two backup files -> copying message database as it is and saving blobs in separate backup database file!

This would prevent doing vacuum with a huge message database which includes blobs (current situation) while importing data. If we are very strict we can say: no vacuum needed any more for backup import.

Result: We need 2*N mb in summary for backup and online files together while importing a backup.

I don't want to keep disadvantage back: This method is not as smart as having all in one backup file, but easy to implement and saves much memory space!

A last point: Even while importing process it's possible to delete the first processed file before continuing with second one can save more memory and we need even << 2*N mb in summary for import :-)

What's the best?

csb0730 avatar Jan 14 '19 23:01 csb0730

i think having two backup files is not a smart solution ux-wise.

however, also for 2. the memory usage is as small as it can be: for a 100 MB backup-file, the import will need 100 MB (for copying all data of the backup - first all blobs and then creating the tables - vacuum would not be needed). it would be fine, it's just a bit more tricky as one has to recreate the tables, and source and destination may have different rows due to different versions. many things to test and to take care of.

but a 2.b: might be easier: copy all blobs to a new database with only a blob-table and add the original database also as a blob. importing would be easier then, vacuum not needed. however, the drawback would be that it is not compatible to the current backup (we have to support two import options at least for a while).

everything's possible, there are no real technical problems here :)

r10s avatar Jan 15 '19 00:01 r10s

2.b: I intended to keep possibility to directly examine backup file by an external sqlite browser. Maybe this is possible with this 2.b too?

Will do some tests and come back :-)

btw: blob only db file was in my mind too :)

csb0730 avatar Jan 15 '19 15:01 csb0730

I intended to keep possibility to directly examine backup file by an external sqlite browser. Maybe this is possible with this 2.b too?

you could export the blob and examine it from there. however, if once the backup will be encrypted, the way of easy examination will disappear anyway.

r10s avatar Jan 15 '19 20:01 r10s

by the way: what means ux-wise?

csb0730 avatar Jan 15 '19 23:01 csb0730

ux = user experience

r10s avatar Jan 15 '19 23:01 r10s

thanks ... agree with you - test just in progress ;-) need more time (disk full :-( )

csb0730 avatar Jan 16 '19 00:01 csb0730

Ok, tests with a 2.b szenario now finished with success :-)

Test szenario:

Software sqlitebrowser to work with the backup file.

  • I exported a backup with current function (1.2GB) and added a messenger.db (150MB, 10000msgs) to the backup_blobs table. This needs some time but it works.
  • Then exported the messenger.db blob to a file again. Needs time and works too.

So I would agree proposal 2.b is the perfect solution for reworking backup function:

  1. create an empty db with only one table to keep all files (and a config table which keeps backup version?)
  2. insert messenger.db as a blob to this table
  3. insert all files from blob dir to this table

This smart approach has many advantages:

  • Implementation can be done with only less effort in adapting current code
  • No vacuum need while import. Memory consumption drops to around 2*N mb in summary :-)
  • Only one backup file (can be encrypted as a whole).

For a transitional period a switch can be used to determine backup file version and call old/new import function. To examine messenger.db easily I could imagine an additional expert funtion to export/import the messenger.db file without blobs! Blob limits should not be a problem as limits are not existing or very high (see sqlite.org) We can use SQLite archive format? This would ease handling again :-)

csb0730 avatar Jan 17 '19 00:01 csb0730

Hi @r10s I can do the necessary updates but first I want to specify in detail what and how to change. I'll make a proposal ...

csb0730 avatar Jan 19 '19 09:01 csb0730

@r10s sounds good :) you're talking about 2b then? we should keep in mind that we ...

  1. have to maintain the old import-format also for a while to have a smart upgrade path.

  2. maintaining the old export-format otoh would be needed for downgrades.

while 1. is a must imho, maybe also 2. is easy to have by just exporting the whole db as it is currently and add the whole db-blob additionally. the import-routine can pick the one it prefers then. this would result in having the non-blob data twice, however, they are small - all messages and meta data may be only as small as a single video file.

r10s avatar Jan 19 '19 10:01 r10s

@r10s yes 2b.

But I intend to write a new sqlite_open() function to create a proper new db file. It can be derived from current one by stripping our all the table creation code. As I see, all the necessary information in current import and export functions is there so I see no difficulty in implementing.

To handle old/new style backup format we need whether a switch in code or let user manually decide? I think for import an automatic test of file and dispatching to the correct code is possible. For export the user should do it manually because DC don't know if user needs to go back or not. Default should be new format.

But thats only the im/ex process.

The details about file format leads me to the questions:

  • how to distinguish backup versions?
  • is it necessary to have backup-version information (like messenger.db). Then a config table would be needed.
  • is it necessary to have the backup timestamp in file (suspect yes => config table)
  • is SQLite3 archive format a possible solution for file format?

csb0730 avatar Jan 19 '19 21:01 csb0730