Sync file blew up with thousands of rows and many duplicates
I noticed some odd behaviors, so I opened my _sync file and took at look at it. It has blown up in size. My main sheet, Sheet1, seems normal, and has 4168 contacts, which is the right number. (Also in both my personal and my work account, in contacts.google.com, I have that same number. There's no apparent damage to the data.)
On the second sheet, for my work account, there's nothing - it's empty.
On the third sheet, for my personal gmail account, there are over 375,000 rows. Many duplicates of the same contact.
I'm not sure what's going on here. Any help on where to start investigating/troubleshooting?
I "think" that:
- The third sheet represents perceived changes that need to be made in one account or the other, but not sure which direction.
- I suspect most of those changes are "non-changes" after all; it's caused by Google's weird behavior where it says my contact was updated when nothing really was updated at all. So most of those "changes" are OK to just delete.
I'm not sure how to do this, procedurally. Can I just delete all the data from that third tab and carry on?
Both sheets should be empty unless there are contacts queued to sync. Once they are sync'd the account specific sheets should be empty again. It sounds like there is a permission error with your personal account.
You can delete all of that info, but any differences will not be sync'd unless you make a change to the contact again at which point it would sync, but changes could be lost. from what is in there now.
Thanks @fleapower So, to confirm: if there are rows in the third sheet (the one that's named for my personal account), this means there are changes from my work account that need to be written to my personal account. Correct?
The script has never thrown errors saying there was a problem accessing my account. I don't think that's it.
The script has thrown lots of errors saying the sync token has expired. If that happens, does the script error out and stop processing those pending updates? And so those pending updates stay in the sheet and pile up until next run? Then at some point the number of updates can get large enough that the script (even when it doesn't hit the sync token error) can run out of time, and then more updates would stay queued up.... and then it snowballs. Is that a valid theory?
Thanks @fleapower So, to confirm: if there are rows in the third sheet (the one that's named for my personal account), this means there are changes from my work account that need to be written to my personal account. Correct?
No. Those are changes you made in your personal contacts which have yet to be sync'd with your work account (or any other accounts.
The script has never thrown errors saying there was a problem accessing my account. I don't think that's it.
When you look at the log, how many contacts are being sync'd each time?
The script has thrown lots of errors saying the sync token has expired. If that happens, does the script error out and stop processing those pending updates? And so those pending updates stay in the sheet and pile up until next run? Then at some point the number of updates can get large enough that the script (even when it doesn't hit the sync token error) can run out of time, and then more updates would stay queued up.... and then it snowballs. Is that a valid theory?
Are there timeout errors in the log?
It blew up again last night, and I have more insights this time. I added a bit more logging to the script, so now I can see a bit more about what happened in each execution.
Yesterday 6:51pm, syncContacts ran without issue. There were zero new rows in the source account. In this run, RefreshSyncToken reported "syncTokenExpired is false", so nothing to do there either.
Then yesterday at 7:21pm, syncContacts ran again. This time, getUpdatedContacts found 4172 rows -- i.e. every contact I have. WHY? (No idea, seems pretty clear this is Google weirdness about what is "updated" in a contact that isn't actually updated at all.) So at this point, my source contact tab has 4172 rows in it. UpdatesMerge runs for 300 seconds and quits. SpreadsheetToContacts runs for 302 seconds and quits. RefreshSyncToken says "syncTokenExpired is false". Then it logs that it's "Iterating connections.forEach." a few times -- and then the whole script times out and errors. So, apparently the script never gets to the end of RefreshSyncToken where it actually saves the new token.
Then at 7:51, syncContacts runs again. This is where things go really bad on us. Once again, GetUpdatedContacts finds 4172 "updated" contacts, and it presumably adds them (again) to the source account tab. Now I have 7000+ rows in that tab. The rest of this run looks like the previous run at 7:21. Times out again without completing RefreshSyncToken.
So now I'm in an endless repeat loop, where the source account tab will grow endlessly (this morning it has over 100k rows).
So.... I think the root problem is google's stupidity about what constitutes an "updated" contact, and the fact that sometimes it will consider every contact I have to be "updated". But I don't think we'll get Google to change that. So, how can we idiot-proof the script to work around this?
Google automatically terminates the script after five minutes (300 seconds). The script is designed to work within that constraint and will pick up where it left off when it runs again. Honestly, it sounds like Google is doing something funky with your contacts where it is identifying them as new contacts with IDs vice existing contacts. There is no reason it would identify them all as new contacts. How long had the contacts been sync'ing correctly (minutes, hours, weeks)?
So... Forgive me, I think I figured out what happened, and it was (sorta) my fault. I think the issue was that I opened the sync token file in the google drive viewer, and it made a copy of the file in google docs format. So then the script was finding the "wrong" file with the wrong token, and that was screwing up the continuity of the query.
(I deleted that file, and I zeroed out the real sync token file, and I reset the spreadsheet, and deleted all contacts in my secondary account, and started all over. Now things are working ok again.)
I think this might suggest a change to be made: Maybe the sync token should be stored in the User properties of the script, rather than in google drive. I think that would be safer. I've done that with my own scripts, and it's not hard to do.
Glad it is working and thanks for the suggestion. I'll have to do that!
I have code for setting/retrieving user properties if you want it.
I'm not adept at branching and submitting in github tho.
I'm closing this b/c it was related to having duplicate synctoken files. I'm opening a new ticket about "too many updates" and have better evidence of the problem there.