sms-ie
sms-ie copied to clipboard
Importing contacts results in duplicates
The jq command returned 279 contacts as expected. Tried it in android emulator to import, 836 imported. And all the contacts are imported twice. Indeed, the first contact is correct (with image and all fields) and the second is completely empty, only the name-title is correct.
Originally posted by @thanasistrisp in https://github.com/tmo1/sms-ie/issues/50#issuecomment-1233276607
It's going to be difficult to figure this out without being able to reproduce the problem. Android is supposed to "aggregate" "matching" contacts, but I don't see a definition of "matching" or a precise specification for the "aggregation" procedure.
How many contacts are actually present after import? Assuming you import your 279 exported contacts into an empty contacts list (e.g., in a fresh emulator image) and then turn around and export them, how many are reported as exported?
Exported again from your app says that 567 exported.
567 is more than twice 279, so it's not just a neat case of each contact appearing twice.
Your app in the initial export showed that 279 exported, however when importing from the app said 836, again export said 567. The 279 is the correct number that it should imported...
567 is more than twice 279, so it's not just a neat case of each contact appearing twice.
As I saw in general, twice contacts exist, but maybe some apps are shown three times as I can understand
I may have a solution for this, and I starting implementing it in code, but I can't really test or debug it without a contacts collection that displays the problem. Are you willing to post a redacted version of yours? You can do the following:
- Create a smaller collection that still has the problem, using the
max-records
/max_messages
preference setting. (The latest commit changed its name from the latter to the former, and enabled it in non-debug builds.) - Redact any information you consider private / personal / sensitive. The following command (where
contacts-nnnn-nn-nn.json
is the original file exported by the app, andcontacts-redacted.json
will be the redacted version) will remove much / most of such information:
jq 'walk(if type=="object" then with_entries(if ((.key | startswith("display_name")) or (.key | startswith("sort_key")) or (.key | startswith("data")) or (.key == "account_name")) then .value |= "REDACTED" else . end) else . end)' contacts-nnnn-nn-nn.json > contacts-redacted.json
You should still go through the redacted version to make sure there's nothing you don't want there, and I can take no responsibility for any sensitive information leaking through.
Hey I've noticed that I get several duplicates from this, I think this should be a good enough sample. Often times I get as many as four duplicates, and I think this is why:
grep 'account_type' ./contacts-redacted.json | sort | uniq
"account_type": "com.google",
"account_type": "com.google.android.apps.tachyon",
"account_type": "com.whatsapp",
"account_type": "org.thoughtcrime.securesms",
(removed)
My thought is it might be more useful to sha256sum+truncate each field instead of redacting it, but I think I'd need to write some actual code for that as I don't believe jq can do that.
edit: In fact I'll do something better...
edit 2: Something like this works? contacts-2024-05-23-chirodacted.json
Some useful stuff:
key: account_name value: Meet -> "dunno_0807"
key: account_name value: Signal -> "dunno_0335"
key: account_name value: WhatsApp -> "dunno_0540"
Script: https://github.com/1Dragoon/chirodactor/
Basically it finds interesting fields and attempts to normalize them, then stores them in an ordered and deduped array, then inserts the order offset in its place along with a guess of what type of data it is. Not perfect, but should be good enough easily determine which contacts are related to each other.