Fix vcard decoding errors
Related Issue
Fixes issue #173
Description of Changes
When using the "Enriching Contact from vCard" feature, the contacts.vcf that I exported from my Android v16 contacts crashed per issue #173. After reporting #173, the following commit to vcards_contacts.py was made that improved things, but as reported in #173 there was still a Traceback.
reader = vobject.readComponents(f, ignoreUnreadable=True)
The Traceback was caused by this VCARD entry 'FN' that spans two lines:
FN;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=52=75=73=73=65=6C=6C=20=43=68=61=70=6D=61=6E=20=E2=9A=BD=E2=9A=BD=E2=
=9A=BD
Traceback (most recent call last):
File "C:\Python\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Python\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Python\Scripts\wtsexporter.exe\__main__.py", line 6, in <module>
File "C:\Python\lib\site-packages\Whatsapp_Chat_Exporter\__main__.py", line 769, in main
contact_store = setup_contact_store(args)
File "C:\Python\lib\site-packages\Whatsapp_Chat_Exporter\__main__.py", line 443, in setup_contact_store
contact_store.load_vcf_file(
File "C:\Python\lib\site-packages\Whatsapp_Chat_Exporter\vcards_contacts.py", line 20, in load_vcf_file
self.contact_mapping = read_vcards_file(vcf_file_path, default_country_code)
File "C:\Python\lib\site-packages\Whatsapp_Chat_Exporter\vcards_contacts.py", line 40, in read_vcards_file
for row in reader:
File "C:\Python\lib\site-packages\vobject\base.py", line 1096, in readComponents
vline = textLineToContentLine(line, n)
File "C:\Python\lib\site-packages\vobject\base.py", line 929, in textLineToContentLine
return ContentLine(*parseLine(text, n), **{'encoded': True,
File "C:\Python\lib\site-packages\vobject\base.py", line 344, in __init__
self.value = codecs.decode(self.value.encode("utf-8"), "quoted-printable").decode(self.params['CHARSET'][0])
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 22: unexpected end of data
The 'ignoreUnreadable' fix also generated many errors (that didn't cause a crash) such as:
[ERROR] Skipped line 2707, message: At line 2707: Failed to parse line: =33=33=2D=37=33=30=30=0A=43=6F=6E=74=69=6E=65=6E=74=61=6C=20=38=30=30=vobject.base ERROR Skipped line 2709, message: At line 2709: Failed to parse line: =2D=35=32=35=2D=30=32=38=30=20=20=20=0A=44=65=6C=74=61=20=41=69=72=6C=
[ERROR] Skipped line 2709, message: At line 2709: Failed to parse line: =2D=35=32=35=2D=30=32=38=30=20=20=20=0A=44=65=6C=74=61=20=41=69=72=6C=vobject.base ERROR Skipped line 2711, message: At line 2711: Failed to parse line: =69=6E=65=73=20=38=30=30=2D=32=32=31=2D=31=32=31=32=0A=4A=65=74=42=6C=
[ERROR] Skipped line 2711, message: At line 2711: Failed to parse line: =69=6E=65=73=20=38=30=30=2D=32=32=31=2D=31=32=31=32=0A=4A=65=74=42=6C=vobject.base ERROR Skipped line 2713, message: At line 2713: Failed to parse line: =75=65=20=38=30=30=2D=35=33=38=2D=32=35=38=33=0A=4E=6F=72=74=68=77=65=
which were caused by other entries that spanned multiple lines, such as :
NOTE;ENCODING=QUOTED-PRINTABLE:=41=6D=65=72=69=63=61=6E=20=41=69=72=6C=69=6E=65=73=20=38=30=30=2D=34=
=33=33=2D=37=33=30=30=0A=43=6F=6E=74=69=6E=65=6E=74=61=6C=20=38=30=30=
=2D=35=32=35=2D=30=32=38=30=20=20=20=0A=44=65=6C=74=61=20=41=69=72=6C=
=69=6E=65=73=20=38=30=30=2D=32=32=31=2D=31=32=31=32=0A=4A=65=74=42=6C=
=75=65=20=38=30=30=2D=35=33=38=2D=32=35=38=33=0A=4E=6F=72=74=68=77=65=
=73=74=20=41=69=72=6C=69=6E=65=73=20=38=30=30=2D=32=32=35=2D=32=35=32=
=35=0A=53=6F=75=74=68=77=65=73=74=20=41=69=72=6C=69=6E=65=73=20=38=30=
=30=2D=34=33=35=2D=39=37=39=32=0A=53=70=69=72=69=74=20=41=69=72=6C=69=
=6E=65=73=20=38=30=30=2D=37=37=32=2D=37=31=31=37=20=0A=55=6E=69=74=65=
=64=20=41=69=72=6C=69=6E=65=73=20=38=30=30=2D=32=34=31=2D=36=35=32=32=
=0A=55=53=20=41=69=72=77=61=79=73=20=38=30=30=2D=34=32=38=2D=34=33=32=
=32
X-SAMSUNGADR;ENCODING=QUOTED-PRINTABLE:;;=31=32=39=20=52=75=73=73=65=74=20=42=65=6E=64=20=50=6C=20=4E=6F=72=74=
=68;=4D=6F=6E=74=67=6F=6D=65=72=79;=54=58;=37=37=33=31=36;=55=6E=69=74=65=64=20=53=74=61=74=65=73;=31=32=39=20=52=75=73=73=65=74=20=42=65=6E=64=20=50=6C=20=4E=6F=72=74=
=68=0A=4D=6F=6E=74=67=6F=6D=65=72=79=2C=20=54=58=20=37=37=33=31=36=0A=
=55=6E=69=74=65=64=20=53=74=61=74=65=73
Here is a link to the vcard spec
The fix I implemented was to abandon the use of vobject, which I couldn't get to work despite many attempts to foolproof the code. Rather, the fixed code extracts each vcard and processes them individually.
I ran the current code base on my contacts.vcf, which was manually edited to remove the portions causing the Tracebacks and compared it to the new code. These positive changes were noticed, and no negative changes:
- No errors or crashes
- The # of imported contacts increased from 667 to 682, because the original code wasn't capturing a contact for business entries, which can use only 'ORG' instead of 'N' or 'FN'
BEGIN:VCARD
VERSION:2.1
TEL;WORK;PREF:XXXXXXXXXX
ORG:Business Name
END:VCARD
- Some names were more 'correct', such as:
BEGIN:VCARD
VERSION:2.1
N:Lopez;Yard Lawn Guy;Jose;;
FN:Yard Lawn Guy, Jose Lopez
TEL;HOME:XXXXXXXXXX
TEL;HOME:XXXXXXXXXX
END:VCARD
new: {'full_name': 'Yard Lawn Guy, Jose Lopez', 'numbers': ['XXXXXXXXXX', 'XXXXXXXXXX']},
old: {'full_name': 'Yard Lawn Guy', 'numbers': ['XXXXXXXXXX', 'XXXXXXXXXX']},
BEGIN:VCARD
VERSION:2.1
N:O'Leary;Joseph;;;
FN:O'Leary, Joseph
X-ANDROID-CUSTOM:vnd.android.cursor.item/nickname;olearyj;1;;;;;;;;;;;;;
TEL;CELL;PREF:XXXXXXXXXX
END:VCARD
new: {'full_name': "O'Leary, Joseph", 'numbers': ['XXXXXXXXXX']},
old: {'full_name': "O'Leary", 'numbers': ['XXXXXXXXXX']},
Deleted post. Need to test with changed above, not current release.
@KnugiHK I've messed around with this for an hour, and I simply can't figure out how to pull your dev branch and test this PR. I was able to get the dev branch and PR installed locally, but Windows keeps wanting to run the preexisting 'wtsexporter' using files from site-packages. I can send you my file directly to test against.
I was able to brute force verification of the changes and verify that it imported 682 contacts, and they look correct.
Approved!
Everything looks good! Thanks for your contribution and taking the initiative to remove vobject as a dependency!