librdata icon indicating copy to clipboard operation
librdata copied to clipboard

Unicode error coming from factors

Open ofajardo opened this issue 4 years ago • 2 comments

While trying to read this apparently simple rdata file, the following error arises:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb1 in position 1: invalid start byte

The problem comes apparently from the fact most information is stored as levels (all variables are factors). IF transforming those factors into characters, then the file is read ok:

i <- sapply(rlvnc2, is.factor)
rlvnc2[i] <- lapply(rlvnc2[i], as.character)

ofajardo avatar Dec 14 '21 14:12 ofajardo

apparently the offending bit is in the position 0xb1, while R apparently starts reading from the next position 0xb2 (or at least R shows the information on screen starting from 0xb2)

ofajardo avatar Dec 14 '21 14:12 ofajardo

The user reports this file is very old. I read the file in R and saved it again with R 4.02 on a mac. The file looks completely different under a hex editor, but the error is the same, in the same position. See attached. test9.RData.zip

ofajardo avatar Dec 14 '21 14:12 ofajardo