graph-pattern-learner
graph-pattern-learner copied to clipboard
Bump chardet from 3.0.3 to 5.0.0
Bumps chardet from 3.0.3 to 5.0.0.
Release notes
Sourced from chardet's releases.
chardet 5.0.0
⚠️ This release is the first release of chardet that no longer supports Python < 3.6 ⚠️
In addition to that change, it features the following user-facing changes:
- Added a prober for Johab Korean (#207,
@grizlupo
)- Added a prober for UTF-16/32 BE/LE (#109, #206,
@jpz
)- Added test data for Croatian, Czech, Hungarian, Polish, Slovak, Slovene, Greek, and Turkish, which should help prevent future errors with those languages
- Improved XML tag filtering, which should improve accuracy for XML files (#208)
- Tweaked
SingleByteCharSetProber
confidence to match latest uchardet (#209)- Made
detect_all
return child prober confidences (#210)- Updated examples in docs (#223,
@domdfcoding
)- Documentation fixes (#212, #224, #225, #226, #220, #221, #244 from too many to mention)
- Minor performance improvements (#252,
@deedy5
)- Add support for Python 3.10 when testing (#232,
@jdufresne
)- Lots of little development cycle improvements, mostly thanks to
@jdufresne
chardet 4.0.0
⚠️ This will be the last release of chardet to support Python 2.7. chardet 5.0 will only support 3.6+ ⚠️
Major Changes
This release is multiple years in the making, and provides some quality of life improvements to chardet. The primary user-facing changes are:
- Single-byte charset probers now use nested dictionaries under the hood, so they are usually a little faster than before. (See #121 for details)
- The
CharsetGroupProber
class now properly short-circuits when one of the probers in the group is considered a definite match. This lead to a substantial speedup.- There is now a
chardet.detect_all
function that returns a list of possible encodings for the input with associated confidences.- We have dropped support for Python 2.6, 3.4, and 3.5 as they are all past end-of-life.
The changes in this release have also laid the groundwork for retraining the models to make them more accurate, and to support some more encodings/languages (see #99 for progress). This is our main focus for chardet 5.0 (beyond dropping Python 2 support).
Benchmarks
Running on a MacBook Pro (15-inch, 2018) with 2.2GHz 6-core i7 processor and 32GB RAM
old version (chardet 3.0.4)
Benchmarking chardet 3.0.4 on CPython 3.7.5 (default, Sep 8 2020, 12:19:42) [Clang 11.0.3 (clang-1103.0.32.62)] -------------------------------------------------------------------------------- Calls per second for each encoding: ascii: 25559.439366240098 big5: 7.187002209518091 cp932: 4.71090956645177 cp949: 2.937256786994428 euc-jp: 4.870580412090848 euc-kr: 6.6910755971933416 euc-tw: 87.71098043480079 </tr></table>
... (truncated)
Commits
ff5dcb2
Merge pull request #254 from chardet/master3222295
Linter fixes (#253)85c96d3
Bump version to 5.0.057abbca
Rebased and cleaned up version of UTF-16/32 BE/LE PR (#206)eca9558
Fix missing black formattingf1f9d42
slight increase in performance (#252)f9ef56c
Use Python-3 super() syntax in Latin1Prober (#240)c5e5d5a
Simple maintenance improvements (#244)49b8341
Configure setuptools using the declarative syntax in setup.cfg (#239)5c73bfc
Run all pre-commit hooks on pull requests (#236)- Additional commits viewable in compare view
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
-
@dependabot rebase
will rebase this PR -
@dependabot recreate
will recreate this PR, overwriting any edits that have been made to it -
@dependabot merge
will merge this PR after your CI passes on it -
@dependabot squash and merge
will squash and merge this PR after your CI passes on it -
@dependabot cancel merge
will cancel a previously requested merge and block automerging -
@dependabot reopen
will reopen this PR if it is closed -
@dependabot close
will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually -
@dependabot ignore this major version
will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) -
@dependabot ignore this minor version
will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) -
@dependabot ignore this dependency
will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)