FastBertTokenizer icon indicating copy to clipboard operation
FastBertTokenizer copied to clipboard

Fast and memory-efficient library for WordPiece tokenization as it is used by BERT.

Results 6 FastBertTokenizer issues
Sort by recently updated
recently updated
newest added

Bumps [SharpToken](https://github.com/dmitry-brazhenko/SharpToken) from 2.0.2 to 2.0.3. Release notes Sourced from SharpToken's releases. Release 2.0.3 Release of version 2.0.3 Commits 27eef74 [duplicate] Support for o200k_base and gpt-4o (omni) model (#43) See...

dependencies
.NET

Bumps [Verify.Xunit](https://github.com/VerifyTests/Verify) from 24.1.0 to 24.2.0. Commits See full diff in compare view [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=Verify.Xunit&package-manager=nuget&previous-version=24.1.0&new-version=24.2.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter...

dependencies
.NET

Bumps [xunit](https://github.com/xunit/xunit) from 2.7.1 to 2.8.0. Commits be260b3 v2.8.0 a8ceb66 #783: Add -useansicolor flag to console runner (v2) 7b0ff93 Don't show /aggressive with unlimited threads 46cdf06 Support parallel algorithm in...

dependencies
.NET

Bumps [xunit.runner.visualstudio](https://github.com/xunit/visualstudio.xunit) from 2.5.8 to 2.8.0. Commits 6438bb8 v2.8.0 2afd4cd Pick up latest dependencies b8be108 Add multiplier format support to RunSettings 3c2e493 Update to 2.7.2-pre.17 and support Xunit.ParallelAlgorithm in RunSetttings...

dependencies
.NET

Bumps [FastBertTokenizer](https://github.com/georg-jung/FastBertTokenizer) from 0.5.18-alpha to 1.0.28. Release notes Sourced from FastBertTokenizer's releases. v1.0.28 Highlights The API surface is considered stable now (except the parts explicitly marked as experimental). Support for...

dependencies
.NET

Hi, I came with the issue that I need to get the "string" representations of the tokens returned by the Encode method. Alternative can be to get the offsets. Is...