LineReader icon indicating copy to clipboard operation
LineReader copied to clipboard

LineReader stops reading when it hits a character like "É" or "ñ"

Open pkamb opened this issue 12 years ago • 11 comments

So you have a textfile such as:

diner restaurant lunch-spot greasy spoon café // "é" character coffee shop cafeteria

LineReader stops reading when it hits the "café" line above. Never gets to "coffee shop".

pkamb avatar Sep 14 '11 17:09 pkamb

Maybe the file is not encoded using UTF-8? I use NSUTF8StringEncoding in the FileReader. See (NSString*)readLine in line 72. Maybe you can find a way to discover the encoding type of the file before you start reading its content. You are welcome to fork the project.

johnjohndoe avatar Sep 15 '11 09:09 johnjohndoe

Hi, i still have this problems

ZuzooVn avatar Feb 13 '13 08:02 ZuzooVn

Have you verified which character encoding is used by the file you are trying to read?

johnjohndoe avatar Feb 13 '13 08:02 johnjohndoe

Hi, it's Unicode (UTF-8)

ZuzooVn avatar Feb 13 '13 08:02 ZuzooVn

Could you can upload a zipped sample somewhere? Then I will find the time to take a look at it in a few days.

johnjohndoe avatar Feb 13 '13 16:02 johnjohndoe

I think you can create new document with some character like í, é, ñ ..... Or i will update some sample data

ZuzooVn avatar Feb 13 '13 16:02 ZuzooVn

I think you should really upload an example file somewhere. I can write an ñ both into an ASCII or UTF-8 encoded file. You can also find out yourself about the character encoding used in the file with an editor. If you are using Windows I recommend Notepad++. On MacOSX or Linux run the following command in a shell: $ file filename.

johnjohndoe avatar Feb 13 '13 19:02 johnjohndoe

This is file's info: Non-ISO extended-ASCII English text, with very long lines, with CRLF line terminators.

This is the file: http://www.mediafire.com/?1cwr4if28w504md

It have "î" character

ZuzooVn avatar Feb 14 '13 05:02 ZuzooVn

Agreed. As I suspected the file is not encoded as UTF-8.

notepadplusplus

I converted the file to UTF-8 using Notepad++ (options are visible in the menu) so you can try again with this file.

johnjohndoe avatar Feb 15 '13 16:02 johnjohndoe

Maybe we must automatically convert all file to UTF-8 before start reading its content

ZuzooVn avatar Feb 16 '13 09:02 ZuzooVn

I suggest that you look for a way to recognize the character encoding in front. Feel free to add it to the LineReader.

johnjohndoe avatar Feb 16 '13 20:02 johnjohndoe