jschardet issues

No more add browser bundles to git

6

Related to #42 Since we don't need browser bundles for tests anymore, we should drop these bundles from git. We could leverage on the `prepublishOnly` task to automate bundles build...

jdesboeufs

SHIFT-JIS not detected in this case

1

Detect attached file. The result will be `windows-1252` [shift-jis.txt](https://github.com/aadsm/jschardet/files/879328/shift-jis.txt) ![image](https://cloud.githubusercontent.com/assets/900690/24459316/1443769e-1450-11e7-9c23-76dd62aae059.png)

bpasero

UTF-8 file guessed as ISO 8859-2

1

Guess the encoding on the attached file. It contains emojis but is a fine UTF-8 file. [strip.sh.zip](https://github.com/aadsm/jschardet/files/5558481/strip.sh.zip)

bpasero

UTF-8 encoding of Degree Symbol

1

The issue I'm having is because of the degree symbol: UTF-8 \xc2\xb0 http://www.fileformat.info/info/unicode/char/b0/index.htm Below, I include the boiled-down calls. My true testing data sample includes properly formatted XML; but through...

glennkitchellcaci

Wrong guess encoding as Windows 1252

2

See https://github.com/Microsoft/vscode/issues/33720 Test case ``` #!/bin/sh foo() { echo "starting …" } ``` Ellipsis symbol `…` makes vscode guess cp1252. UTF8 should have higher priority IMO

Yanpas

GB18030 encoded file incorrectly detected as gb2312

1

https://github.com/atom/encoding-selector/issues/65 ### Steps to Reproduce https://github.com/malice-plugins/yara/blob/17a4fc946febe8b002e285f591bcb21b92a99e9e/rules/userdb_panda.yar - Open in Atom - Select "Auto Detect" encoding, **Expected behavior:** Detects the encoding of the file as GB18030. `iconv -f GB18030 -t UTF-8...

wesinator

GBK not detected in this case

1

* file: [Untitled-1.txt](https://github.com/aadsm/jschardet/files/906100/Untitled-1.txt) * output with debug: ``` EUC-TW prober hit error at byte 0 windows-1251 confidence = 0, below negative shortcut threshhold 0.05 UTF-8 not active SHIFT_JIS confidence =...

bpasero

ISO 8859 not detected in this case

4

Detect attached file. The result will be `windows-1252` [iso-8859-1.txt](https://github.com/aadsm/jschardet/files/879334/iso-8859-1.txt) ![image](https://cloud.githubusercontent.com/assets/900690/24459379/432c3536-1450-11e7-9df4-e0dfe8d23d52.png)

bpasero

Unicode character problem

3

Every message that uses the character `ç` next to another Unicode returns a strange character. **Using encode: UTF-8** `çã` Shows how `згo` `çõ` Shows how `уш` This can only be...

SombraRO

Result of euc-kr is different from python chardet library

[MY EUC-KR DATA](https://github.com/aadsm/jschardet/files/1466005/1.txt) This file has been encoded in `EUC-KR` and it is detected as `ISO-8859-2`. However, `chardet` which is python library detects it correctly as `EUC-KR`.

hotohoto

jschardet
jschardet copied to clipboard

Metadata

No more add browser bundles to git

SHIFT-JIS not detected in this case

UTF-8 file guessed as ISO 8859-2

UTF-8 encoding of Degree Symbol

Wrong guess encoding as Windows 1252

GB18030 encoded file incorrectly detected as gb2312

GBK not detected in this case

ISO 8859 not detected in this case

Unicode character problem

Result of euc-kr is different from python chardet library

← Metadata

Owner

Metadata

jschardet jschardet copied to clipboard

Metadata

← Metadata

Owner

Metadata

jschardet
jschardet copied to clipboard