gmvault icon indicating copy to clipboard operation
gmvault copied to clipboard

Successive syncs create duplicate chat objects

Open netvor opened this issue 12 years ago • 19 comments

I have an issue whereby successive (full) syncs cause the same chat object to be downloaded multiple times. The compressed objects are all different, yet the plaintext contents are the same:

$ cd gmvault-db/db/chats/

$ md5sum */1437017132845624821.eml.gz
93db2cdf56f65c3393f48a7ac3822a89  subchats-2/1437017132845624821.eml.gz
7d891df7672a2346741de39773dc9810  subchats-3/1437017132845624821.eml.gz
cc2dac40e35bdfbcd06db0e6785d1f77  subchats-4/1437017132845624821.eml.gz

$ cp subchats-2/1437017132845624821.eml.gz /tmp/sc2.gz
$ cp subchats-3/1437017132845624821.eml.gz /tmp/sc3.gz
$ cp subchats-4/1437017132845624821.eml.gz /tmp/sc4.gz
$ gunzip /tmp/sc2.gz
$ gunzip /tmp/sc3.gz
$ gunzip /tmp/sc4.gz

$ md5sum /tmp/sc*
8f96d8ec223ea64c13a028cc9038a694  /tmp/sc2
8f96d8ec223ea64c13a028cc9038a694  /tmp/sc3
8f96d8ec223ea64c13a028cc9038a694  /tmp/sc4

These duplicates are not created every time. Generally when there is nothing to update (no new emails or chats) it does not happen, but when there is a new chat recorded, I usually get a duplicate. As far as I know this only affects chat objects, not mail objects.

To localize the problem better, I disabled compression and did a series of --chats-only syncs.

  1. Initial sync: gmvault sync --no-compression --chats-only. 1267 chats stored in subchats-1
  2. Force an update, i.e. send a chat message (I use a 3rd-party Jabber client, not the native Google app)
  3. gmvault sync --no-compression --chats-only. This time 1268 chats stored in both subchats-1 and subchats-2
  4. gmvault sync --no-compression --chats-only. This time no change.
  5. Force an update
  6. gmvault sync --no-compression --chats-only. This time 1268 chats stored in subchats-1 and subchats-2, 1269 chats stored in subchats-3
  7. gmvault sync --no-compression --chats-only (so no update). This time 1268 in subchats-1 and -2, 1269 in -3 and 538 (huh?) in subchats-4

So you see the behavior is not very predictable. Another observation is that the different md5sum of the .gz duplicates is only a side-effect of gzip storing the timestamp of the .eml in the .gz file.

As to the duplicates, after accumulating these four subchats- folders, I discovered they are not always identical: if they are Content-Type: multipart/alternative, then the "boundary" string differs between duplicates. The .meta files are always identical.

I suppose my main question is: what is the logic behind creating new subchats- directories?

netvor avatar Jun 05 '13 17:06 netvor

This is happening to me too. Each time I run a full sync, gmvault downloads all chats on new folders.

Nyr avatar Jun 06 '13 11:06 Nyr

@netvor @Nyr Just got back from a long vacation. This seems to be a bug. I will check it asap and let you know. Thks

gaubert avatar Jun 17 '13 09:06 gaubert

any news on this, I'm getting the same sort of behavior.

neogenix avatar Jul 21 '14 23:07 neogenix

I think that this should be fixed by #224

adept avatar Sep 11 '15 23:09 adept

I can confirm that duplication of chats is still occurring in 1.9.1/master on Windows (running from github source), when doing a full sync. I assume that it can't find the existing item in the subchats-x folders, and creates a new file. Emails might work fine due to then having folder names based on metadata?

I have a lot of chats, so I can test setting the limit per directory to something larger than my number of chats if that's interesting.

ls -R .\chats\ -name 1454569519569329389.meta subchats-1\1454569519569329389.meta subchats-6\1454569519569329389.meta

EDIT: I set the upper limit per directory to something much larger than my actual number of chats, moved all chat files to subchats-1, and deleted the remaining subchats-XX folders.

The problem seems to be with the _common_sync function only looking at the current subchat folder, not all subchat folders. It only sends the current folder to check_email_on_disk. I could probably make a fix, but a real Python dev will surely do it faster and better.

maneatingduck avatar Dec 27 '15 18:12 maneatingduck

Yes I can confirm @maneatingduck's comment, both being still an issue with 1.9.1/master and the analysis.

atppp avatar Jan 16 '17 22:01 atppp

ok I need to have a look at that one. It is on my todo list for 1.9.2

gaubert avatar Jan 17 '17 10:01 gaubert

ok found the bug. Will fix it in the next coming days

gaubert avatar Jan 18 '17 13:01 gaubert

@atppp @maneatingduck @netvor @Nyr @neogenix @adept don't know why I missed that bug for so long :-1: Anyway I believe I fixed it and it will be 1.9.2 . Is one of you ready to test the beta version of 1.9.2 ? Please let me know and on which platform you are and I will ship you next week the beta version.

Many thanks !!!

gaubert avatar Jan 20 '17 08:01 gaubert

I am on Mac. I could test gmv-1.9.2 branch this weekend if that's equivalent? Thank YOU for fixing it.

atppp avatar Jan 20 '17 15:01 atppp

Yes you can do that or take the following functional gmv-1.9.2 version. There are still few things to fix but it should work. It is available from here: https://www.dropbox.com/s/dkya69kkef9xwa9/gmvault-v1.9.2-beta-macosx-intel.tar.gz?dl=0

Let me know if there are some issues.

On 20 January 2017 at 16:21, atppp [email protected] wrote:

I am on Mac. I could test gmv-1.9.2 branch this weekend if that's equivalent? Thank YOU for fixing it.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/gaubert/gmvault/issues/135#issuecomment-274097596, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAFynb1lmsGpQJF6Vi9ELmjSAF6AMpyks5rUND0gaJpZM4Atgwu .

gaubert avatar Jan 21 '17 14:01 gaubert

Thanks @gaubert . It didn't quite work because the chat functions now return abs path while some other code paths expect relative path. I verified a fix (noted above) but it's pretty ugly so I don't even want to request a pull.. :)

atppp avatar Jan 22 '17 04:01 atppp

@atppp the build process on mac seems to be broken but it was working when I tested it. If you have the logs of your issue that would be helpful. I will check. If you pull the branch and run it as a python prog it will work. Will let you know when the exe is ready

gaubert avatar Jan 22 '17 11:01 gaubert

Thanks @gaubert . The other commit I mixed in was a completely different issue which I separated to https://github.com/gaubert/gmvault/issues/289 . Sorry for mixing these up. I pushed a clean commit to demo my verified fix for this issue, although you might be able to come up with better way if you want.

atppp avatar Jan 22 '17 14:01 atppp

@atppp Ok I reviewed my copy as before I did it without refreshing my memory about the code logic. Now it should work. Please test with branch gmv-1.9.2. Still have a problem to create a good Mac os X exe but I will solve it. Still need to fix the setup.py but I will do it tomorrow. Let me know if you can test the chat sync. Many thanks.

gaubert avatar Jan 23 '17 20:01 gaubert

yup tested.. gmv-1.9.2 looks great now with chat syncing. thanks for fixing it.

(except a small syntax bug in setup.py :)

atppp avatar Jan 24 '17 03:01 atppp

Yep I need to test the setup.py I haven't done it. Will do today or tomorrow and will also work on the packaging.

On 24 January 2017 at 04:39, Xinan Wu [email protected] wrote:

yup tested.. gmv-1.9.2 looks great now with chat syncing. thanks for fixing it.

(except a small syntax bug in setup.py :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gaubert/gmvault/issues/135#issuecomment-274692951, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAFyhOzo5PEBABzFHu1wCZfqIJP8UAHks5rVXJ5gaJpZM4Atgwu .

gaubert avatar Jan 24 '17 07:01 gaubert

@maneatingduck @adept @netvor @Nyr you can test the latest beta version here (beta windows and mac os x package available): https://www.dropbox.com/sh/d5ceo77juacr03y/AACUGcTt6Um-6j6JmBizGPA2a?dl=0

or from the branch gmv-1.9.2

Help required for testing to see if I missed something. Many thanks for the testing.

gaubert avatar Feb 05 '17 14:02 gaubert

I am also experiencing this issue with v1.9.1 running on Linux (installed in a virtualenv with pip).
@gaubert, are you still looking for v1.9.2 testers?

hansdg1 avatar Apr 27 '17 20:04 hansdg1