WhatsApp-Chat-Exporter icon indicating copy to clipboard operation
WhatsApp-Chat-Exporter copied to clipboard

Fix vcard decoding errors

Open tomballgithub opened this issue 4 months ago • 3 comments

Related Issue

Fixes issue #173

Description of Changes

When using the "Enriching Contact from vCard" feature, the contacts.vcf that I exported from my Android v16 contacts crashed per issue #173. After reporting #173, the following commit to vcards_contacts.py was made that improved things, but as reported in #173 there was still a Traceback. reader = vobject.readComponents(f, ignoreUnreadable=True)

The Traceback was caused by this VCARD entry 'FN' that spans two lines:

FN;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=52=75=73=73=65=6C=6C=20=43=68=61=70=6D=61=6E=20=E2=9A=BD=E2=9A=BD=E2=
=9A=BD
Traceback (most recent call last):
  File "C:\Python\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Python\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Python\Scripts\wtsexporter.exe\__main__.py", line 6, in <module>
  File "C:\Python\lib\site-packages\Whatsapp_Chat_Exporter\__main__.py", line 769, in main
    contact_store = setup_contact_store(args)
  File "C:\Python\lib\site-packages\Whatsapp_Chat_Exporter\__main__.py", line 443, in setup_contact_store
    contact_store.load_vcf_file(
  File "C:\Python\lib\site-packages\Whatsapp_Chat_Exporter\vcards_contacts.py", line 20, in load_vcf_file
    self.contact_mapping = read_vcards_file(vcf_file_path, default_country_code)
  File "C:\Python\lib\site-packages\Whatsapp_Chat_Exporter\vcards_contacts.py", line 40, in read_vcards_file
    for row in reader:
  File "C:\Python\lib\site-packages\vobject\base.py", line 1096, in readComponents
    vline = textLineToContentLine(line, n)
  File "C:\Python\lib\site-packages\vobject\base.py", line 929, in textLineToContentLine
    return ContentLine(*parseLine(text, n), **{'encoded': True,
  File "C:\Python\lib\site-packages\vobject\base.py", line 344, in __init__
    self.value = codecs.decode(self.value.encode("utf-8"), "quoted-printable").decode(self.params['CHARSET'][0])
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 22: unexpected end of data

The 'ignoreUnreadable' fix also generated many errors (that didn't cause a crash) such as:

[ERROR] Skipped line 2707, message: At line 2707: Failed to parse line: =33=33=2D=37=33=30=30=0A=43=6F=6E=74=69=6E=65=6E=74=61=6C=20=38=30=30=vobject.base ERROR Skipped line 2709, message: At line 2709: Failed to parse line: =2D=35=32=35=2D=30=32=38=30=20=20=20=0A=44=65=6C=74=61=20=41=69=72=6C=
[ERROR] Skipped line 2709, message: At line 2709: Failed to parse line: =2D=35=32=35=2D=30=32=38=30=20=20=20=0A=44=65=6C=74=61=20=41=69=72=6C=vobject.base ERROR Skipped line 2711, message: At line 2711: Failed to parse line: =69=6E=65=73=20=38=30=30=2D=32=32=31=2D=31=32=31=32=0A=4A=65=74=42=6C=
[ERROR] Skipped line 2711, message: At line 2711: Failed to parse line: =69=6E=65=73=20=38=30=30=2D=32=32=31=2D=31=32=31=32=0A=4A=65=74=42=6C=vobject.base ERROR Skipped line 2713, message: At line 2713: Failed to parse line: =75=65=20=38=30=30=2D=35=33=38=2D=32=35=38=33=0A=4E=6F=72=74=68=77=65=

which were caused by other entries that spanned multiple lines, such as :

NOTE;ENCODING=QUOTED-PRINTABLE:=41=6D=65=72=69=63=61=6E=20=41=69=72=6C=69=6E=65=73=20=38=30=30=2D=34=
=33=33=2D=37=33=30=30=0A=43=6F=6E=74=69=6E=65=6E=74=61=6C=20=38=30=30=
=2D=35=32=35=2D=30=32=38=30=20=20=20=0A=44=65=6C=74=61=20=41=69=72=6C=
=69=6E=65=73=20=38=30=30=2D=32=32=31=2D=31=32=31=32=0A=4A=65=74=42=6C=
=75=65=20=38=30=30=2D=35=33=38=2D=32=35=38=33=0A=4E=6F=72=74=68=77=65=
=73=74=20=41=69=72=6C=69=6E=65=73=20=38=30=30=2D=32=32=35=2D=32=35=32=
=35=0A=53=6F=75=74=68=77=65=73=74=20=41=69=72=6C=69=6E=65=73=20=38=30=
=30=2D=34=33=35=2D=39=37=39=32=0A=53=70=69=72=69=74=20=41=69=72=6C=69=
=6E=65=73=20=38=30=30=2D=37=37=32=2D=37=31=31=37=20=0A=55=6E=69=74=65=
=64=20=41=69=72=6C=69=6E=65=73=20=38=30=30=2D=32=34=31=2D=36=35=32=32=
=0A=55=53=20=41=69=72=77=61=79=73=20=38=30=30=2D=34=32=38=2D=34=33=32=
=32
X-SAMSUNGADR;ENCODING=QUOTED-PRINTABLE:;;=31=32=39=20=52=75=73=73=65=74=20=42=65=6E=64=20=50=6C=20=4E=6F=72=74=
=68;=4D=6F=6E=74=67=6F=6D=65=72=79;=54=58;=37=37=33=31=36;=55=6E=69=74=65=64=20=53=74=61=74=65=73;=31=32=39=20=52=75=73=73=65=74=20=42=65=6E=64=20=50=6C=20=4E=6F=72=74=
=68=0A=4D=6F=6E=74=67=6F=6D=65=72=79=2C=20=54=58=20=37=37=33=31=36=0A=
=55=6E=69=74=65=64=20=53=74=61=74=65=73

Here is a link to the vcard spec

The fix I implemented was to abandon the use of vobject, which I couldn't get to work despite many attempts to foolproof the code. Rather, the fixed code extracts each vcard and processes them individually.

I ran the current code base on my contacts.vcf, which was manually edited to remove the portions causing the Tracebacks and compared it to the new code. These positive changes were noticed, and no negative changes:

  1. No errors or crashes
  2. The # of imported contacts increased from 667 to 682, because the original code wasn't capturing a contact for business entries, which can use only 'ORG' instead of 'N' or 'FN'
BEGIN:VCARD
VERSION:2.1
TEL;WORK;PREF:XXXXXXXXXX
ORG:Business Name
END:VCARD
  1. Some names were more 'correct', such as:
BEGIN:VCARD
VERSION:2.1
N:Lopez;Yard Lawn Guy;Jose;;
FN:Yard Lawn Guy, Jose Lopez
TEL;HOME:XXXXXXXXXX
TEL;HOME:XXXXXXXXXX
END:VCARD

new: {'full_name': 'Yard Lawn Guy, Jose Lopez', 'numbers': ['XXXXXXXXXX', 'XXXXXXXXXX']}, 
old: {'full_name': 'Yard Lawn Guy', 'numbers': ['XXXXXXXXXX', 'XXXXXXXXXX']}, 
BEGIN:VCARD
VERSION:2.1
N:O'Leary;Joseph;;;
FN:O'Leary, Joseph
X-ANDROID-CUSTOM:vnd.android.cursor.item/nickname;olearyj;1;;;;;;;;;;;;;
TEL;CELL;PREF:XXXXXXXXXX
END:VCARD

new: {'full_name': "O'Leary, Joseph", 'numbers': ['XXXXXXXXXX']}, 
old: {'full_name': "O'Leary", 'numbers': ['XXXXXXXXXX']}, 

tomballgithub avatar Nov 30 '25 02:11 tomballgithub

Deleted post. Need to test with changed above, not current release.

tomballgithub avatar Dec 14 '25 18:12 tomballgithub

@KnugiHK I've messed around with this for an hour, and I simply can't figure out how to pull your dev branch and test this PR. I was able to get the dev branch and PR installed locally, but Windows keeps wanting to run the preexisting 'wtsexporter' using files from site-packages. I can send you my file directly to test against.

tomballgithub avatar Dec 14 '25 18:12 tomballgithub

I was able to brute force verification of the changes and verify that it imported 682 contacts, and they look correct.

Approved!

tomballgithub avatar Dec 14 '25 18:12 tomballgithub

Everything looks good! Thanks for your contribution and taking the initiative to remove vobject as a dependency!

KnugiHK avatar Dec 15 '25 15:12 KnugiHK