chardet2
chardet2 copied to clipboard
Encoding::CompatibilityError: incompatible encoding regexp match
Encoding::CompatibilityError: incompatible encoding regexp match (ASCII-8BIT regexp with UTF-8 string)
from /Users/boti/.rvm/gems/ruby-1.9.3-p327@search_server/gems/chardet2-1.0.1/lib/UniversalDetector.rb:134:in =~' from /Users/boti/.rvm/gems/ruby-1.9.3-p327@search_server/gems/chardet2-1.0.1/lib/UniversalDetector.rb:134:in
feed'
from /Users/boti/.rvm/gems/ruby-1.9.3-p327@search_server/gems/chardet2-1.0.1/lib/UniversalDetector.rb:46:in `chardet'
from (irb):12
same issue testing with:
UniversalDetector.chardet("∀,∈,≠,Ω,∑,∏,ɔ,⍴,€,ζ,π,ป่")
which should return utf8 as the encoding type.
Did you solve your issue ?
Hi,
It is now a deprecated project. But despite that the issue is still there. The lib didn't return me the proper encoding.
On 22 February 2014 12:54, Mickaël Rémond [email protected] wrote:
Did you solve your issue ?
— Reply to this email directly or view it on GitHubhttps://github.com/janx/chardet2/issues/8#issuecomment-35799973 .
Thanks ! I guess I have to find an alternative way of detecting encoding then.
Well no... I tried 3 other libs and then I decided to manually specify the encoding...
On 22 February 2014 13:08, Mickaël Rémond [email protected] wrote:
Thanks ! I guess I have to find an alternative way of detecting encoding then.
— Reply to this email directly or view it on GitHubhttps://github.com/janx/chardet2/issues/8#issuecomment-35800192 .
I think it was a hard case.
On 22 February 2014 13:09, Botond Orbán [email protected] wrote:
Well no... I tried 3 other libs and then I decided to manually specify the encoding...
On 22 February 2014 13:08, Mickaël Rémond [email protected]:
Thanks ! I guess I have to find an alternative way of detecting encoding then.
— Reply to this email directly or view it on GitHubhttps://github.com/janx/chardet2/issues/8#issuecomment-35800192 .
I just patched rchardet, an older library. Although I'm thinking one could just write the string to a temp file and use the system:
encoding = `file --mime-encoding string.tmp | awk '{print $2}'`.strip.upcase
string.force_encoding(encoding)