pyreadr icon indicating copy to clipboard operation
pyreadr copied to clipboard

it shows me this error LibrdataError: Unable to convert string to the requested encoding (invalid byte sequence)

Open 69hed opened this issue 4 years ago • 12 comments

I want to open below dataset in python, but it keeps showing me an error. The codes are:

  import pyreadr
  result = pyreadr.read_r(r"~/Desktop/review2020.rda")
  print(result.keys())
  df1 = result["df1"]

The error: ~/opt/anaconda3/lib/python3.8/site-packages/pyreadr/pyreadr.py in read_r(path, use_objects, timezone) 46 if not os.path.isfile(path): 47 raise PyreadrError("File {0} does not exist!".format(path)) ---> 48 parser.parse(path) 49 50 result = OrderedDict()

~/opt/anaconda3/lib/python3.8/site-packages/pyreadr/librdata.pyx in pyreadr.librdata.Parser.parse()

~/opt/anaconda3/lib/python3.8/site-packages/pyreadr/librdata.pyx in pyreadr.librdata.Parser.parse()

LibrdataError: Unable to convert string to the requested encoding (invalid byte sequence) #

How I can fix this?

69hed avatar Jan 28 '21 01:01 69hed

as suggested in the issue template, please include a file (with no sensitive data) so that I can reproduce the issue. If I cannot reproduce the issue I cannot fix it.

ofajardo avatar Jan 28 '21 07:01 ofajardo

HEDIEH KARACHI has shared a OneDrive for Business file with you. To view it, click the link below. https://deakin365-my.sharepoint.com/personal/hkarachi_deakin_edu_au/Documents/Attachments/tip2020.rda [https://r1.res.office365.com/owa/prem/images/dc-generic_20.png]https://deakin365-my.sharepoint.com/personal/hkarachi_deakin_edu_au/Documents/Attachments/tip2020.rda tip2020.rdahttps://deakin365-my.sharepoint.com/personal/hkarachi_deakin_edu_au/Documents/Attachments/tip2020.rda

Thanks for reply. Please find the attached file.

Best, Hedieh


From: Otto Fajardo [email protected] Sent: Thursday, January 28, 2021 6:37 PM To: ofajardo/pyreadr [email protected] Cc: HEDIEH KARACHI [email protected]; Author [email protected] Subject: Re: [ofajardo/pyreadr] it shows me this error LibrdataError: Unable to convert string to the requested encoding (invalid byte sequence) (#64)

as suggested in the issue template, please include a file (with no sensitive data) so that I can reproduce the issue. If I cannot reproduce the issue I cannot fix it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/ofajardo/pyreadr/issues/64#issuecomment-768862577, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASUA2CX3M2LBZKK56EQ4ZM3S4EH4TANCNFSM4WWGP7UQ.

69hed avatar Jan 28 '21 07:01 69hed

I can't access the file, it gives me an error. Please zip it and drag and drop here directly.

ofajardo avatar Jan 28 '21 08:01 ofajardo

HEDIEH KARACHI has shared a OneDrive for Business file with you. To view it, click the link below. https://deakin365-my.sharepoint.com/personal/hkarachi_deakin_edu_au/Documents/Attachments/rdaFile.zip [https://r1.res.office365.com/owa/prem/images/dc-zip_20.png]https://deakin365-my.sharepoint.com/personal/hkarachi_deakin_edu_au/Documents/Attachments/rdaFile.zip rdaFile.ziphttps://deakin365-my.sharepoint.com/personal/hkarachi_deakin_edu_au/Documents/Attachments/rdaFile.zip

I hope it works now. As the file is already compressed, when I zip it, it doesn't make it smaller. Let me know if you still can't open it.

Best, Hedieh


From: Otto Fajardo [email protected] Sent: Thursday, January 28, 2021 7:17 PM To: ofajardo/pyreadr [email protected] Cc: HEDIEH KARACHI [email protected]; Author [email protected] Subject: Re: [ofajardo/pyreadr] it shows me this error LibrdataError: Unable to convert string to the requested encoding (invalid byte sequence) (#64)

I can't access the file, it gives me an error. Please zip it and drag and drop here directly.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/ofajardo/pyreadr/issues/64#issuecomment-768880867, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASUA2CVGQG6EALEKNHIF7I3S4EMR3ANCNFSM4WWGP7UQ.

69hed avatar Jan 28 '21 22:01 69hed

After signing in it keeps me giving a permission denied error. Please attach the file here in github (you need to zip it not to reduce the size, but because github accepts zip files) or look for another way to share it.

ofajardo avatar Jan 29 '21 08:01 ofajardo

Hopefully you can access the file now. I couldn't share in github, as the file is bigger than 10mb.

https://www.dropbox.com/s/650m9kxkb8dzglw/tip2020.rda.zip?dl=0 [https://www.dropbox.com/static/images/spectrum-icons/generated/content/content-zip-large.png]https://www.dropbox.com/s/650m9kxkb8dzglw/tip2020.rda.zip?dl=0 tip2020.rda.ziphttps://www.dropbox.com/s/650m9kxkb8dzglw/tip2020.rda.zip?dl=0 Shared with Dropbox www.dropbox.com


From: Otto Fajardo [email protected] Sent: Saturday, January 30, 2021 12:18 AM To: ofajardo/pyreadr [email protected] Cc: HEDIEH KARACHI [email protected]; Author [email protected] Subject: Re: [ofajardo/pyreadr] it shows me this error LibrdataError: Unable to convert string to the requested encoding (invalid byte sequence) (#64)

Reopened #64https://github.com/ofajardo/pyreadr/issues/64.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/ofajardo/pyreadr/issues/64#event-4266110766, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASUA2CWHV6MUHDUWESMEU7DS4KYQ5ANCNFSM4WWGP7UQ.

69hed avatar Jan 31 '21 23:01 69hed

I managed to download the file and reproduce the error. Reading the first bytes of the file I got this:

b'RDX3\nX\n\x00\x00\x00\x03\x00\x03\x06\x01\x00\x03\x05\x00\x00\x00\x00\x06CP1252\x00'

I think CP1252 is the encoding, meaning Windows-1252. Right now as indicated in the Known limitations section of the README of this repo, pyreadr does not support other encodings different from UTF-8.

Cannot read RData or rds files in encodings other than utf-8.

That means this file is not supported.

This limitation comes from the C backend librdata. Looking at the C source code I have the feeling the error message should be different, so I am going to make an issue there for them to take a look. I will also ask if other encodings could be supported. It may come at some point in the future.

If you have control over the generation of the rda files, then try saving them with utf-8 encoding.

ofajardo avatar Feb 01 '21 08:02 ofajardo

Thanks so much for your help. I really appreciate it.

Best, Hedieh


From: Otto Fajardo [email protected] Sent: Monday, February 1, 2021 7:51 PM To: ofajardo/pyreadr [email protected] Cc: HEDIEH KARACHI [email protected]; Author [email protected] Subject: Re: [ofajardo/pyreadr] it shows me this error LibrdataError: Unable to convert string to the requested encoding (invalid byte sequence) (#64)

I managed to download the file and reproduce the error. Reading the first bytes of the file I got this:

b'RDX3\nX\n\x00\x00\x00\x03\x00\x03\x06\x01\x00\x03\x05\x00\x00\x00\x00\x06CP1252\x00'

I think CP1252 is the encoding, meaning Windows-1252. Right now as indicated in the Known limitations section, pyreadr does not support other encodings different from UTF-8.

Cannot read RData or rds files in encodings other than utf-8.

That means this file is not supported.

This limitation comes from the C backend librdata. Looking at the C source code I have the feeling the error message should be different, so I am going to make an issue there for them to take a look. I will also ask if other encodings could be supported. It may come at some point in the future.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/ofajardo/pyreadr/issues/64#issuecomment-770686655, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASUA2CQRKWJHA55BNJKQV3TS4ZTSFANCNFSM4WWGP7UQ.

69hed avatar Feb 05 '21 00:02 69hed

@69hed could you please share the file again? It has been deleted from dropbox.

ofajardo avatar Mar 27 '21 14:03 ofajardo

@69hed recovered the file and hosted it here: https://github.com/ofajardo/readstat_test_files/blob/master/tip2020.rda for easier sharing with librdata people, who is looking at it.

ofajardo avatar Mar 29 '21 13:03 ofajardo