jschardet
jschardet copied to clipboard
Character encoding auto-detection in JavaScript (port of python's chardet)
It does not work with Romanian subtitle files. OpenSubtitles detects these files as "cp1250", `jschardet` detects the encoding as "windows-1252". Wrong characters: ã þ º Correct romanian special characters: ă...
The following file detects as EUC-JP even though it is not. Seems to be caused by a single `ü` inside that file. File: [QuietLight.tmTheme.txt](https://github.com/aadsm/jschardet/files/879178/QuietLight.tmTheme.txt)
does it have to read the whole buffer? https://github.com/aadsm/jschardet/blob/master/src/init.js#L72-L85 if it has to read the whole buffer, can it return a string as a result? i want to check if...
detect this rss site http://news.baidu.com/n?cmd=1&class=civilnews&tn=rss error ``` { encoding: null, confidence: 0 } ```
It would be great if encoding detection could accept stream as input, trying to detect the encoding by block, and returning a result as soon as a minimum confidence level...
Hi, What does it take to create the statistical model to support win-1256 code pages? Thanks
`const buffer = fs.readFileSync(path);` `const encodingResult = jschardet.detect(buffer);` `console.log(encodingResult.encoding);`
The string `~{` in a UTF8/ASCII document causes detection to fail with `{ encoding: null, confidence: 0 }`.
Got an undefined variable error for `denormalizedEncodings`
请问这是跟webpack 的配置有关吗?怎么配置支持? 