sms-ie icon indicating copy to clipboard operation
sms-ie copied to clipboard

CSV Conversion

Open tmo1 opened this issue 1 year ago • 15 comments

Hi. If at all possible, I am looking for help, please. I have a CSV file of old (10-15 years ago, pre-smart phone) SMS messages that I am trying to import to my current SMS app (QKSMS). Each row is as follows (all punctuation as in original): Row 1 of CSV- "type","address","body","date" For the avoidance of doubt, in my file type is 1 or 2 to show whether SMS was received or sent, address is a phone number beginning with either 0 or a +, body is the content of the SMS text, date is Unix time. So, an example from row 2 downwards is- 2,"00447779877777","No, but I am leaving soon!","1120755000000"

How do I get this in to the correct format (such as, do the quotation marks need removing? or adding to the type?) and convert to a json that the app can read and export back out to my current SMS app? All my attempts have ended in failure. Or should I be using sms-db or something else? Thanks very much!

Originally posted by @davebeep in https://github.com/tmo1/sms-ie/discussions/55#discussioncomment-5090764

tmo1 avatar Feb 24 '23 18:02 tmo1

This can be done in a single line of Python! Run this script as follows (messages.csv should be your CSV file, and messages.json will contain the SMS I/E compatible JSON):

csv-convert.py < messages.csv > messages.json

tmo1 avatar Feb 26 '23 04:02 tmo1

I will give it a go! Thank you

davebeep avatar Feb 27 '23 13:02 davebeep

The conversion works but there is obviously something wrong with the formatting or schema in my CSV because when I try to import to SMS I/E I'm getting 'error parsing JSON'.

davebeep avatar Mar 01 '23 22:03 davebeep

Please post some of the (redacted) converted JSON, as per the instructions here.

tmo1 avatar Mar 01 '23 22:03 tmo1

Here it is: [ { "\u00ef\u00bb\u00bf"""type""": "1,"07777777987","No, but I'll be leaving soon.","1120755000000"", "address""": null, "body""": null, "date"""": null }, { "\u00ef\u00bb\u00bf"""type""": "2,"00447787878781","I am not going tomorrow. Wow","1120756200000"", "address""": null, "body""": null, "date"""": null }, { "\u00ef\u00bb\u00bf"""type""": "1,"07888888888","This is jess's phone so i only have your number!","1120756500000"", "address""": null, "body""": null, "date"""": null }, { "\u00ef\u00bb\u00bf"""type""": "2,"00447777777987","Thanks 4 checking","1120756680000"", "address""": null, "body""": null, "date"""": null } ]

davebeep avatar Mar 04 '23 12:03 davebeep

Hm. Is your CSV file ASCII, or something else (e.g., UTF encoded Unicode)? If the latter, try saving it as ASCII and seeing if the script produces proper JSON. If it doesn't contain sensitive information, you can post it here and I can take a look at it.

tmo1 avatar Mar 05 '23 00:03 tmo1

OK, I tried saving in the different format and it seemed promising with no parsing errors. It looks like this: [ { ""type","address","body","date"": "1,"07777777987","No, but I'll be leaving soon.","1120755000000"" }, { ""type","address","body","date"": "2,"00447787878781","I am not going tomorrow. Wow","1120756200000"" }, { ""type","address","body","date"": "1,"07888888888","This is jess's phone so i only have your number!","1120756500000"" }, { ""type","address","body","date"": "2,"00447777777987","Thanks 4 checking","1120756680000"" } ] But my SMS app can't read the result. It just shows as 'null' with blank messages. Thanks for all your help and patience!

davebeep avatar Mar 05 '23 21:03 davebeep

At this point, I really need to see your original CSV file. Sensitive data can be redacted, if desired, but I need to see the more or less original version of the file.

tmo1 avatar Mar 06 '23 03:03 tmo1

At this point, I really need to see your original CSV file. Sensitive data can be redacted, if desired, but I need to see the more or less original version of the file.

This is the test file I used: test.csv

davebeep avatar Mar 06 '23 12:03 davebeep

Well, there's your problem. In your initial report, you said that your file had lines like these:

"type","address","body","date"
2,"00447779877777","No, but I am leaving soon!","1120755000000"

This is standard CSV, and the conversion script runs correctly on it.

The file you just posted, however, has lines like these, full of extra double quote marks:

"""type"",""address"",""body"",""date"""
"1,""07777777987"",""No, but I'll be leaving soon."",""1120755000000"""

tmo1 avatar Mar 06 '23 13:03 tmo1

Well, there's your problem. In your initial report, you said that your file had lines like these:

"type","address","body","date"
2,"00447779877777","No, but I am leaving soon!","1120755000000"

This is standard CSV, and the conversion script runs correctly on it.

The file you just posted, however, has lines like these, full of extra double quote marks:

"""type"",""address"",""body"",""date"""
"1,""07777777987"",""No, but I'll be leaving soon."",""1120755000000"""

So if the conversion process is adding those extra quotation marks, I need to remove them from the original CSV file? (Apologies for my slowness)

davebeep avatar Mar 06 '23 14:03 davebeep

So if the conversion process is adding those extra quotation marks, I need to remove them from the original CSV file?

You didn't mention how you converted the file to UTF-8 from the original before feeding it to my CSV-JSON conversion script, but a correct conversion shouldn't be adding extra quotation marks. See here for various conversion tools and methods. I use Linux, so I suppose I'd use iconv. There are also some tools for Windows mentioned in that discussion.

To be clear:

  1. Start with your original, correct CSV file.
  2. Convert it properly to UTF-8 encoding using one of the tools / methods in the thread I linked to.
  3. Feed the converted CSV file, now encoded in UTF-8, to my conversion script.

tmo1 avatar Mar 06 '23 17:03 tmo1

Hi @tmo1, I'm really grateful for all your help and wanted to let you know that, with a bit of trial and error, I think I have done it (I'm on Windows and used Notepad++). The only thing that isn't rendered correctly are the letters with diacritics, but I can probably live with that. I tried UTF-8-BOM encoding for this and, although it looks like the diacritics are preserved, once in QKSMS, the characters are corrupted. Thanks for all your hard work with your apps and for taking the time to reply to me.

davebeep avatar Mar 08 '23 22:03 davebeep

I'm glad you got it working! For the diacritics, see if running the conversion script as follows solves the problem:

PYTHONIOENCODING=utf-8 csv-convert.py < messages.csv > messages.json

tmo1 avatar Mar 09 '23 04:03 tmo1

I'm glad you got it working! For the diacritics, see if running the conversion script as follows solves the problem:

PYTHONIOENCODING=utf-8 csv-convert.py < messages.csv > messages.json

That doesn't seem to work, unfortunately. It says no such file or directory. There is definitely a file in the same directory as when I ran the previous script with the correct name, encoded in UTF-8 as before. But, honestly, I don't expect you to go on answering me forever. It's just me... I'll see if I can find some solution or live with it

davebeep avatar Mar 09 '23 21:03 davebeep