unicode-data icon indicating copy to clipboard operation
unicode-data copied to clipboard

Access unicode character database

README

The unicode-data packages family

This repository provides packages to use the Unicode character database (UCD):

  • unicode-data for general character properties.
  • unicode-data-names for characters names and aliases.
  • unicode-data-scripts for characters scripts.
  • unicode-data-security for security mechanisms.

The Haskell data structures are generated programmatically from the UCD files. The latest Unicode version supported by these libraries is 15.0.0.

unicode-data

unicode-data provides Haskell APIs to efficiently access the Unicode character database. Performance is the primary goal in the design of this package.

Please see the Haddock documentation for reference documentation.

unicode-data-names

unicode-data-names provides Haskell APIs to efficiently access the Unicode character names from the Unicode character database.

Please see the Haddock documentation for reference documentation.

unicode-data-scripts

unicode-data-scripts provides Haskell APIs to efficiently access the Unicode character scripts from the Unicode character database.

Please see the Haddock documentation for reference documentation.

unicode-data-security

unicode-data-security provides Haskell APIs to efficiently access the Unicode security mechanisms database.

Please see the Haddock documentation for reference documentation.

Performance

unicode-data is up to 5 times faster than base.

Unicode database version update

See unicode-data’s guide.

Unicode version in some major libraries

The following sections tracks the Unicode versions used in some major libraries. While unicode-data packages do not depend on the Unicode version used in these packages, there may be some mismatches when using them together.

GHC / base

GHC version base version Unicode version
8.8 4.13 12.0
8.10.[1-4] 4.14.[0-1] 12.0
8.10.5+ 4.14.2+ 13.0
9.0.[1-2] 4.15.0 12.1
9.2.[1-6] 4.16.0 14.0
9.4.[1-4] 4.17.0 14.0
9.6.[1-3] 4.18.[0-1] 15.0
9.6.4-5 4.18.2+ 15.1
9.8.1 4.19.0 15.1
9.10.1 4.20.0 15.1

text

text version Unicode version
1.2.5.0 13.0
2.0.[0-2] 14.0
2.1.[0-1] 14.0

Licensing

unicode-data* packages are an open source project available under a liberal Apache-2.0 license.

Contributing

As an open project we welcome contributions.