eventmachine-tail icon indicating copy to clipboard operation
eventmachine-tail copied to clipboard

How should we deal with invalid characters on 1.9?

Open eric opened this issue 13 years ago • 8 comments

We ran into issues on 1.9 with a file that is supposed to be UTF-8 having invalid characters in it.

A fix was suggested for remote_syslog that should clearly go directly into eventmachine-tail, but I haven't been able to figure out exactly how I would want to fix it.

Here is the discussion that we've had so far: https://github.com/papertrail/remote_syslog/pull/13

Any thoughts would be welcome for the best way to solve this.

eric avatar Sep 06 '11 08:09 eric

Since em-tail doesn't display or do any calculations on characters, really, I don't think it should care what encoding the data has that it is reading - so if it's breaking on some input, I think it's a bug in em-tail.

Can you publish a sample file with some bad data? Otherwise I'll try to reproduce and hack on a fix.

jordansissel avatar Sep 06 '11 14:09 jordansissel

I was just playing around with this: https://gist.github.com/1169737

eric avatar Sep 06 '11 17:09 eric

I've hit this too. I've tried the iconv workaround in the remote_syslog pull request, as well as something like:

data = data.encode!( 'UTF-8', invalid: :replace, undef: :replace )

And I'm still getting the following error:

/usr/lib/ruby/gems/1.9.1/gems/eventmachine-0.12.10/lib/em/buftok.rb:66:in `split': invalid byte sequence in UTF-8 (ArgumentError)

Any ideas?

mblair avatar Sep 12 '11 04:09 mblair

Any news about this issue?

vihai avatar Mar 13 '12 09:03 vihai

Probably should just read into a buffer that is set explicitly to binary mode, and let the consumer of the data care about the encoding.

I'll get to fixing this eventually if nobody else does.

jordansissel avatar Mar 13 '12 17:03 jordansissel

Resurrecting this after a few years :) We just ran into this as well.

For us, we launched an app with start-stop-daemon and didn't pass the LC_ALL variable set to something UTF-8'ish

--> Ruby uses POSIX/ASCII and blows up when having to touch and UTF-8 char

rb2k avatar Sep 04 '14 14:09 rb2k

This project has been replaced by the filewatch library. Last I knew, event machine was abandoned as a project (most recent release is 1.5 years ago), so I recommend not using em-tail.

sorry for the bugs, but this project is probably not worth resurrecting.

Recommend you check out the filewatch library instead, maybe?

On Thursday, September 4, 2014, Marc Seeger [email protected] wrote:

Resurrecting this after a few years :) We just ran into this as well

— Reply to this email directly or view it on GitHub https://github.com/jordansissel/eventmachine-tail/issues/13#issuecomment-54489680 .

jordansissel avatar Sep 05 '14 02:09 jordansissel

Sure, probably a good choice :)

Although I don't see an integrated way of actually tailing a file, rather than just being notified that something changed? But maybe it's just too early in the morning ;)

rb2k avatar Sep 06 '14 14:09 rb2k