chardet2 icon indicating copy to clipboard operation
chardet2 copied to clipboard

Encoding::CompatibilityError: incompatible encoding regexp match

Open orbanbotond opened this issue 11 years ago • 7 comments

Encoding::CompatibilityError: incompatible encoding regexp match (ASCII-8BIT regexp with UTF-8 string) from /Users/boti/.rvm/gems/ruby-1.9.3-p327@search_server/gems/chardet2-1.0.1/lib/UniversalDetector.rb:134:in =~' from /Users/boti/.rvm/gems/ruby-1.9.3-p327@search_server/gems/chardet2-1.0.1/lib/UniversalDetector.rb:134:infeed' from /Users/boti/.rvm/gems/ruby-1.9.3-p327@search_server/gems/chardet2-1.0.1/lib/UniversalDetector.rb:46:in `chardet' from (irb):12

orbanbotond avatar May 24 '13 10:05 orbanbotond

same issue testing with:

UniversalDetector.chardet("∀,∈,≠,Ω,∑,∏,ɔ,⍴,€,ζ,π,ป่")

which should return utf8 as the encoding type.

saneshark avatar Sep 03 '13 17:09 saneshark

Did you solve your issue ?

mremond avatar Feb 22 '14 10:02 mremond

Hi,

It is now a deprecated project. But despite that the issue is still there. The lib didn't return me the proper encoding.

On 22 February 2014 12:54, Mickaël Rémond [email protected] wrote:

Did you solve your issue ?

— Reply to this email directly or view it on GitHubhttps://github.com/janx/chardet2/issues/8#issuecomment-35799973 .

orbanbotond avatar Feb 22 '14 11:02 orbanbotond

Thanks ! I guess I have to find an alternative way of detecting encoding then.

mremond avatar Feb 22 '14 11:02 mremond

Well no... I tried 3 other libs and then I decided to manually specify the encoding...

On 22 February 2014 13:08, Mickaël Rémond [email protected] wrote:

Thanks ! I guess I have to find an alternative way of detecting encoding then.

— Reply to this email directly or view it on GitHubhttps://github.com/janx/chardet2/issues/8#issuecomment-35800192 .

orbanbotond avatar Feb 22 '14 11:02 orbanbotond

I think it was a hard case.

On 22 February 2014 13:09, Botond Orbán [email protected] wrote:

Well no... I tried 3 other libs and then I decided to manually specify the encoding...

On 22 February 2014 13:08, Mickaël Rémond [email protected]:

Thanks ! I guess I have to find an alternative way of detecting encoding then.

— Reply to this email directly or view it on GitHubhttps://github.com/janx/chardet2/issues/8#issuecomment-35800192 .

orbanbotond avatar Feb 22 '14 11:02 orbanbotond

I just patched rchardet, an older library. Although I'm thinking one could just write the string to a temp file and use the system:

 encoding = `file --mime-encoding string.tmp | awk '{print $2}'`.strip.upcase
 string.force_encoding(encoding)

saneshark avatar Feb 27 '14 01:02 saneshark