WhatsApp-Chat-Exporter
WhatsApp-Chat-Exporter copied to clipboard
sqlite3.OperationalError: Could not decode to UTF-8 column 'data' with text
Hello,
I suspect that this is due to some of the database not being in UTF-8.
I got around this on another program that also exports Whatsapp messages by adding the following code:
db.text_factory = lambda b: b.decode(errors = 'ignore')
after this code which was already there,
db = sqlite3.connect(file_path)
But when I tried to do this in this program,
I couldn't find It in the extract python files.
I instead found,
this....
if os.path.isfile(contact_db):
with sqlite3.connect(contact_db) as db:
contacts(db, data)
if os.path.isfile(msg_db):
with sqlite3.connect(msg_db) as db:
messages(db, data)
media(db, data, media_folder)
vcard(db, data)
create_html(data, output_folder)
Which is beyond my understanding.
You can place the line as follow:
if os.path.isfile(contact_db):
with sqlite3.connect(contact_db) as db:
db.text_factory = lambda b: b.decode(errors = 'ignore') # here
contacts(db, data)
if os.path.isfile(msg_db):
with sqlite3.connect(msg_db) as db:
db.text_factory = lambda b: b.decode(errors = 'ignore') # and here
messages(db, data)
media(db, data, media_folder)
vcard(db, data)
create_html(data, output_folder)
Also, could you provide me a stack trace related to this error and the version of the exporter you are using?
Oh thanks, sorry for not providing It before.
Traceback (most recent call last):)
File "/home/user/.local/bin/wtsexporter", line 8, in <module>
sys.exit(main())
File "/home/user/.local/lib/python3.9/site-packages/Whatsapp_Chat_Exporter/__main__.py", line 240 in main
messages(db, data)
File "/home/user/.local/lib/python3.9/site-packages/Whatsapp_Chat_Exporter/extract.py", line 380 in messages
content = c.fetchone()
sqlite3.OperationalError: Could not decode to UTF-8 column 'data' with text '{placeholder}'
Oh thanks, sorry for not providing It before.
Traceback (most recent call last):) File "/home/user/.local/bin/wtsexporter", line 8, in <module> sys.exit(main()) File "/home/user/.local/lib/python3.9/site-packages/Whatsapp_Chat_Exporter/__main__.py", line 240 in main messages(db, data) File "/home/user/.local/lib/python3.9/site-packages/Whatsapp_Chat_Exporter/extract.py", line 380 in messages content = c.fetchone() sqlite3.OperationalError: Could not decode to UTF-8 column 'data' with text '{placeholder}'
No problem.
Do you know what's the language and what encoding the message is used?
I am using the latest one. When I tried to use It before your update, It was giving me that message table error but now that you have updated It, that's gone. This is literally the terminal output of the text: Keep it up����������������������������������� I think that there are some emojis in this text. Like thumbs up, y'know? That would make sense but since I don't remember this message, I can't say for sure.
I have just run a grep command on all of the messages and after seeing the messages I think that this is most likely this emoji "💪" times 4.
Adding this line didn't help. But I only added It to the extract command and since the error is specifying the data column, this could be because of that. Any thoughts?
You can place the line as follow:
if os.path.isfile(contact_db): with sqlite3.connect(contact_db) as db: db.text_factory = lambda b: b.decode(errors = 'ignore') # here contacts(db, data) if os.path.isfile(msg_db): with sqlite3.connect(msg_db) as db: db.text_factory = lambda b: b.decode(errors = 'ignore') # and here messages(db, data) media(db, data, media_folder) vcard(db, data) create_html(data, output_folder)
Also, could you provide me a stack trace related to this error and the version of the exporter you are using?
I tried to reproduce it but failed, the message shown in the HTML correctly.
You can use a SQLite browser to extract that message out and convert it into HEX. Doing so may enable me to reproduce the problem.
I have tried to search for It using sqlitebrowser in the message table and data column but I couldn't find It. Can I find out which row It's erroring out on?
Wait, I think that I have found It! I just searched for the question mark shard and found 3 messages. There is one which just has BLOB written on It. And when I opened It, It showed me the Keep It up message and a bunch of dots. It is also saying that the type of data that's currently in the cell is Binary.
I have tried to search for It using sqlitebrowser in the message table and data column but I couldn't find It. Can I find out which row It's erroring out on?
Replace
i += 1 # around line 377 of extract.py?
if i % 1000 == 0:
print(f"Gathering messages...({i}/{total_row_number})", end="\r")
content = c.fetchone()
with
i += 1
print(content["_id"])
content = c.fetchone()
When you run that, the next row the printed _id
should be the row that causing the problem.
This is the HEX and ASCII: 0000 4b 65 65 70 20 69 74 20 75 70 f0 9f 92 90 f0 9f Keep it up...... 0010 92 90 f0 9f 92 90 f0 9f 8c b9 f0 9f 8c b9 f2 a3 ................ 0020 b0 bd f0 b7 a0 bd f0 b7 a0 bd ed b2 9e .............
This is the HEX and ASCII: 0000 4b 65 65 70 20 69 74 20 75 70 f0 9f 92 90 f0 9f Keep it up...... 0010 92 90 f0 9f 92 90 f0 9f 8c b9 f0 9f 8c b9 f2 a3 ................ 0020 b0 bd f0 b7 a0 bd f0 b7 a0 bd ed b2 9e .............
Also, can you figure out the actual message shown in WhatsApp? Btw, just a reminder, do not post it publicly if it is a sensitive message.
You see, I have lost the Whatsapp encryption key and was only able to get the key file and this was before I deleted all of the Whatsapp backups. Do you see anything sensitive in that message? I don't know HEX.
Hey, I am worried now, what is that HEX? 😰
Hey so I did this:
echo 4b656570206974207570f09f9290f09f9290f09f9290f09f8cb9f09f8cb9f2a3b0bdf0b7a0bdf0b7a0bdedb29e | xxd -r -p
And got this:
Keep it up💐💐💐🌹🌹
Is there a way to ignore this error and let It run?
Hey, I am worried now, what is that HEX? cold_sweat
I don't mean to scare you. I have yet to look into the hex before you post the result tbh.
Is there a way to ignore this error and let It run?
The easiest way is use a try except block and skip it but it is not an ideal solution.
I am still trying to reproduce it because the exporter failed after inserting a row with your binary content, but I got a different error than yours.
Traceback (most recent call last):
...
File "...\Whatsapp_Chat_Exporter\extract.py", line 371, in messages
if "\r\n" in msg:
TypeError: a bytes-like object is required, not 'str'
Does my screenshot below have any difference from yours?
Yes there is! The first row is missing a 9f at the end. And the 2nd row is missing an a3 at the end.
Oh. They are just being hidden😂. Here you are:
Hah, there is no difference than. Did you try changing It from binary mode to text or RTL text?
Hah, there is no difference than. Did you try changing It from binary mode to text or RTL text?
They can't be displayed in text or RTL mode. Try this and post the output:
SELECT quote(text_data) from message WHERE _id=<the message id>;
If you are using Linux/WSL:
$ sqlite3 msgstore.db
> SELECT quote(text_data) from message WHERE _id=<the message id>;
'Keep it up💐💐💐🌹🌹���' Thanks. I got this.
'Keep it upbouquetbouquetbouquetroserose���' Thanks. I got this.
Hmmm, you are supposed to get a hex string. Something like this:
I didn't get a hex. What version of sqlite3 command line are you using? I am on 3.34.1.
Why is there no id in your command?
I didn't get a hex. What version of sqlite3 command line are you using? I am on 3.34.1.
3.40.1
Why is there no id in your command?
Look closer and you will find that it is just redacted😂. It should be an integer anyway.
Oh. Why would you redact that?! I have just tried It with version 3.42.0 and It's still not giving me the hex.
Oh. Why would you redact that?! I have just tried It with version 3.42.0 and It's still not giving me the hex.
I guess that's mean we have different data in that cell.
Why are getting a hex anyway, the cell should contain the message, not the hex, right?