abydos
abydos copied to clipboard
Abydos NLP/IR library for Python
Abydos
+------------------+------------------------------------------------------+ | CI & Test Status | |travis| |circle| |azure| |semaphore| |coveralls| | +------------------+------------------------------------------------------+ | Code Quality | |codeclimate| |scrutinizer| |codacy| |codefactor| | +------------------+------------------------------------------------------+ | Dependencies | |requires| |snyk| |pyup| |cii| |black| | +------------------+------------------------------------------------------+ | Local Analysis | |pylint| |flake8| |pydocstyle| |sloccount| |mypy| | +------------------+------------------------------------------------------+ | Usage | |docs| |mybinder| |license| |sourcerank| |zenodo| | +------------------+------------------------------------------------------+ | Contribution | |openhub| |gh-commits| |gh-issues| |gh-stars| | +------------------+------------------------------------------------------+ | PyPI | |pypi| |pypi-dl| |pypi-ver| | +------------------+------------------------------------------------------+ | conda-forge | |conda| |conda-dl| |conda-platforms| | +------------------+------------------------------------------------------+
.. |travis| image:: https://travis-ci.org/chrislit/abydos.svg?branch=master :target: https://travis-ci.org/chrislit/abydos :alt: Travis-CI Build Status
.. |circle| image:: https://circleci.com/gh/chrislit/abydos/tree/master.svg?style=shield :target: https://circleci.com/gh/chrislit/abydos/tree/master :alt: Circle-CI Build Status
.. |azure| image:: https://dev.azure.com/chrislit/abydos/_apis/build/status/chrislit.abydos?branchName=master :target: https://dev.azure.com/chrislit/abydos/_build/latest?definitionId=1 :alt: Azure Pipelines Build Status
.. |semaphore| image:: https://semaphoreci.com/api/v1/chrislit/abydos/branches/master/shields_badge.svg :target: https://semaphoreci.com/chrislit/abydos :alt: Semaphore Build Status
.. |coveralls| image:: https://coveralls.io/repos/github/chrislit/abydos/badge.svg?branch=master :target: https://coveralls.io/github/chrislit/abydos?branch=master :alt: Coverage Status
.. |codeclimate| image:: https://codeclimate.com/github/chrislit/abydos/badges/gpa.svg :target: https://codeclimate.com/github/chrislit/abydos :alt: Code Climate
.. |scrutinizer| image:: https://scrutinizer-ci.com/g/chrislit/abydos/badges/quality-score.png?b=master :target: https://scrutinizer-ci.com/g/chrislit/abydos/?branch=master :alt: Scrutinizer
.. |codacy| image:: https://api.codacy.com/project/badge/Grade/db79f2c31ea142fb9b5938abe87b0854 :target: https://www.codacy.com/app/chrislit/abydos?utm_source=github.com&utm_medium=referral&utm_content=chrislit/abydos&utm_campaign=Badge_Grade :alt: Codacy
.. |codefactor| image:: https://www.codefactor.io/repository/github/chrislit/abydos/badge :target: https://www.codefactor.io/repository/github/chrislit/abydos :alt: CodeFactor
.. |requires| image:: https://requires.io/github/chrislit/abydos/requirements.svg?branch=master :target: https://requires.io/github/chrislit/abydos/requirements/?branch=master :alt: Requirements Status
.. |snyk| image:: https://snyk.io/test/github/chrislit/abydos/badge.svg?targetFile=requirements.txt :target: https://snyk.io/test/github/chrislit/abydos?targetFile=requirements.txt :alt: Known Vulnerabilities
.. |pyup| image:: https://pyup.io/repos/github/chrislit/abydos/shield.svg :target: https://pyup.io/repos/github/chrislit/abydos/ :alt: Updates
.. |cii| image:: https://bestpractices.coreinfrastructure.org/projects/1598/badge :target: https://bestpractices.coreinfrastructure.org/projects/1598 :alt: CII Best Practices
.. |black| image:: https://img.shields.io/badge/code%20style-black-000000.svg :target: https://github.com/ambv/black :alt: black
.. |pylint| image:: https://img.shields.io/badge/Pylint-9.13/10-yellowgreen.svg :target: # :alt: Pylint Score
.. |flake8| image:: https://img.shields.io/badge/flake8-0-brightgreen.svg :target: # :alt: flake8 Errors
.. |pydocstyle| image:: https://img.shields.io/badge/pydocstyle-0-brightgreen.svg :target: # :alt: pydocstyle Errors
.. |sloccount| image:: https://img.shields.io/badge/SLOCCount-40,079-blue.svg :target: # :alt: SLOCCount
.. |mypy| image:: https://img.shields.io/badge/mypy-1.87%25%20imprecise-1F5082.svg :target: # :alt: mypy Imprecision
.. |docs| image:: https://readthedocs.org/projects/abydos/badge/?version=latest :target: https://abydos.readthedocs.org/en/latest/ :alt: Documentation Status
.. |mybinder| image:: https://img.shields.io/badge/launch-binder-579aca.svg?logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAFkAAABZCAMAAABi1XidAAAB8lBMVEX///9XmsrmZYH1olJXmsr1olJXmsrmZYH1olJXmsr1olJXmsrmZYH1olL1olJXmsr1olJXmsrmZYH1olL1olJXmsrmZYH1olJXmsr1olL1olJXmsrmZYH1olL1olJXmsrmZYH1olL1olL0nFf1olJXmsrmZYH1olJXmsq8dZb1olJXmsrmZYH1olJXmspXmspXmsr1olL1olJXmsrmZYH1olJXmsr1olL1olJXmsrmZYH1olL1olLeaIVXmsrmZYH1olL1olL1olJXmsrmZYH1olLna31Xmsr1olJXmsr1olJXmsrmZYH1olLqoVr1olJXmsr1olJXmsrmZYH1olL1olKkfaPobXvviGabgadXmsqThKuofKHmZ4Dobnr1olJXmsr1olJXmspXmsr1olJXmsrfZ4TuhWn1olL1olJXmsqBi7X1olJXmspZmslbmMhbmsdemsVfl8ZgmsNim8Jpk8F0m7R4m7F5nLB6jbh7jbiDirOEibOGnKaMhq+PnaCVg6qWg6qegKaff6WhnpKofKGtnomxeZy3noG6dZi+n3vCcpPDcpPGn3bLb4/Mb47UbIrVa4rYoGjdaIbeaIXhoWHmZYHobXvpcHjqdHXreHLroVrsfG/uhGnuh2bwj2Hxk17yl1vzmljzm1j0nlX1olL3AJXWAAAAbXRSTlMAEBAQHx8gICAuLjAwMDw9PUBAQEpQUFBXV1hgYGBkcHBwcXl8gICAgoiIkJCQlJicnJ2goKCmqK+wsLC4usDAwMjP0NDQ1NbW3Nzg4ODi5+3v8PDw8/T09PX29vb39/f5+fr7+/z8/Pz9/v7+zczCxgAABC5JREFUeAHN1ul3k0UUBvCb1CTVpmpaitAGSLSpSuKCLWpbTKNJFGlcSMAFF63iUmRccNG6gLbuxkXU66JAUef/9LSpmXnyLr3T5AO/rzl5zj137p136BISy44fKJXuGN/d19PUfYeO67Znqtf2KH33Id1psXoFdW30sPZ1sMvs2D060AHqws4FHeJojLZqnw53cmfvg+XR8mC0OEjuxrXEkX5ydeVJLVIlV0e10PXk5k7dYeHu7Cj1j+49uKg7uLU61tGLw1lq27ugQYlclHC4bgv7VQ+TAyj5Zc/UjsPvs1sd5cWryWObtvWT2EPa4rtnWW3JkpjggEpbOsPr7F7EyNewtpBIslA7p43HCsnwooXTEc3UmPmCNn5lrqTJxy6nRmcavGZVt/3Da2pD5NHvsOHJCrdc1G2r3DITpU7yic7w/7Rxnjc0kt5GC4djiv2Sz3Fb2iEZg41/ddsFDoyuYrIkmFehz0HR2thPgQqMyQYb2OtB0WxsZ3BeG3+wpRb1vzl2UYBog8FfGhttFKjtAclnZYrRo9ryG9uG/FZQU4AEg8ZE9LjGMzTmqKXPLnlWVnIlQQTvxJf8ip7VgjZjyVPrjw1te5otM7RmP7xm+sK2Gv9I8Gi++BRbEkR9EBw8zRUcKxwp73xkaLiqQb+kGduJTNHG72zcW9LoJgqQxpP3/Tj//c3yB0tqzaml05/+orHLksVO+95kX7/7qgJvnjlrfr2Ggsyx0eoy9uPzN5SPd86aXggOsEKW2Prz7du3VID3/tzs/sSRs2w7ovVHKtjrX2pd7ZMlTxAYfBAL9jiDwfLkq55Tm7ifhMlTGPyCAs7RFRhn47JnlcB9RM5T97ASuZXIcVNuUDIndpDbdsfrqsOppeXl5Y+XVKdjFCTh+zGaVuj0d9zy05PPK3QzBamxdwtTCrzyg/2Rvf2EstUjordGwa/kx9mSJLr8mLLtCW8HHGJc2R5hS219IiF6PnTusOqcMl57gm0Z8kanKMAQg0qSyuZfn7zItsbGyO9QlnxY0eCuD1XL2ys/MsrQhltE7Ug0uFOzufJFE2PxBo/YAx8XPPdDwWN0MrDRYIZF0mSMKCNHgaIVFoBbNoLJ7tEQDKxGF0kcLQimojCZopv0OkNOyWCCg9XMVAi7ARJzQdM2QUh0gmBozjc3Skg6dSBRqDGYSUOu66Zg+I2fNZs/M3/f/Grl/XnyF1Gw3VKCez0PN5IUfFLqvgUN4C0qNqYs5YhPL+aVZYDE4IpUk57oSFnJm4FyCqqOE0jhY2SMyLFoo56zyo6becOS5UVDdj7Vih0zp+tcMhwRpBeLyqtIjlJKAIZSbI8SGSF3k0pA3mR5tHuwPFoa7N7reoq2bqCsAk1HqCu5uvI1n6JuRXI+S1Mco54YmYTwcn6Aeic+kssXi8XpXC4V3t7/ADuTNKaQJdScAAAAAElFTkSuQmCC :target: https://mybinder.org/v2/gh/chrislit/abydos/master?filepath=binder :alt: Binder
.. |license| image:: https://img.shields.io/badge/License-GPL%20v3+-blue.svg?logo=gnu :target: https://www.gnu.org/licenses/gpl-3.0 :alt: License: GPL v3.0+
.. |sourcerank| image:: https://img.shields.io/librariesio/sourcerank/pypi/abydos.svg :target: https://libraries.io/pypi/abydos :alt: Libraries.io SourceRank
.. |zenodo| image:: https://zenodo.org/badge/DOI/10.5281/zenodo.3603514.svg :target: https://doi.org/10.5281/zenodo.3603514 :alt: Zenodo
.. |openhub| image:: https://www.openhub.net/p/abydosnlp/widgets/project_thin_badge.gif :target: https://www.openhub.net/p/abydosnlp :alt: OpenHUB
.. |gh-commits| image:: https://img.shields.io/github/commit-activity/y/chrislit/abydos.svg?logo=github :target: https://github.com/chrislit/abydos/graphs/commit-activity :alt: GitHub Commits
.. |gh-issues| image:: https://img.shields.io/github/issues-closed/chrislit/abydos.svg?logo=github :target: https://github.com/chrislit/abydos/issues?q= :alt: GitHub Issues Closed
.. |gh-stars| image:: https://img.shields.io/github/stars/chrislit/abydos.svg?logo=github :target: https://github.com/chrislit/abydos/stargazers :alt: GitHub Stars
.. |pypi| image:: https://img.shields.io/pypi/v/abydos.svg?logo=python&logoColor=white :target: https://pypi.python.org/pypi/abydos :alt: PyPI
.. |pypi-dl| image:: https://img.shields.io/pypi/dm/abydos.svg?logo=python&logoColor=white :target: https://pypi.python.org/pypi/abydos :alt: PyPI downloads/month
.. |pypi-ver| image:: https://img.shields.io/pypi/pyversions/abydos.svg?logo=python&logoColor=white :target: https://pypi.python.org/pypi/abydos :alt: PyPI versions
.. |conda| image:: https://img.shields.io/conda/vn/conda-forge/abydos.svg?logo=conda-forge :target: https://anaconda.org/conda-forge/abydos :alt: conda-forge
.. |conda-dl| image:: https://img.shields.io/conda/dn/conda-forge/abydos.svg?logo=conda-forge :target: https://anaconda.org/conda-forge/abydos :alt: conda-forge downloads
.. |conda-platforms| image:: https://img.shields.io/conda/pn/conda-forge/abydos.svg?logo=conda-forge :target: https://anaconda.org/conda-forge/abydos :alt: conda-forge platforms
|
.. image:: https://raw.githubusercontent.com/chrislit/abydos/master/abydos-small.png :target: https://github.com/chrislit/abydos :alt: abydos :align: right
|
| Abydos NLP/IR library <https://github.com/chrislit/abydos>
_
| Copyright 2014-2020 by Christopher C. Little
Abydos is a library of phonetic algorithms, string distance measures & metrics, stemmers, and string fingerprinters including:
-
Phonetic algorithms
- Robert C. Russell's Index
- American Soundex
- Refined Soundex
- Daitch-Mokotoff Soundex
- Kölner Phonetik
- NYSIIS
- Match Rating Algorithm
- Metaphone
- Double Metaphone
- Caverphone
- Alpha Search Inquiry System
- Fuzzy Soundex
- Phonex
- Phonem
- Phonix
- SfinxBis
- phonet
- Standardized Phonetic Frequency Code
- Statistics Canada
- Lein
- Roger Root
- Oxford Name Compression Algorithm (ONCA)
- Eudex phonetic hash
- Haase Phonetik
- Reth-Schek Phonetik
- FONEM
- Parmar-Kumbharana
- Davidson's Consonant Code
- SoundD
- PSHP Soundex/Viewex Coding
- an early version of Henry Code
- Norphone
- Dolby Code
- Phonetic Spanish
- Spanish Metaphone
- MetaSoundex
- SoundexBR
- NRL English-to-phoneme
- Beider-Morse Phonetic Matching
-
String distance metrics
- Levenshtein distance
- Optimal String Alignment distance
- Levenshtein-Damerau distance
- Hamming distance
- Tversky index
- Sørensen–Dice coefficient & distance
- Jaccard similarity coefficient & distance
- overlap similarity & distance
- Tanimoto coefficient & distance
- Minkowski distance & similarity
- Manhattan distance & similarity
- Euclidean distance & similarity
- Chebyshev distance
- cosine similarity & distance
- Jaro distance
- Jaro-Winkler distance (incl. the strcmp95 algorithm variant)
- Longest common substring
- Ratcliff-Obershelp similarity & distance
- Match Rating Algorithm similarity
- Normalized Compression Distance (NCD) & similarity
- Monge-Elkan similarity & distance
- Matrix similarity
- Needleman-Wunsch score
- Smith-Waterman score
- Gotoh score
- Length similarity
- Prefix, Suffix, and Identity similarity & distance
- Modified Language-Independent Product Name Search (MLIPNS) similarity & distance
- Bag distance
- Editex distance
- Eudex distances
- Sift4 distance
- Baystat distance & similarity
- Typo distance
- Indel distance
- Synoname
-
Stemmers
- the Lovins stemmer
- the Porter and Porter2 (Snowball English) stemmers
- Snowball stemmers for German, Dutch, Norwegian, Swedish, and Danish
- CLEF German, German plus, and Swedish stemmers
- Caumann's German stemmer
- UEA-Lite Stemmer
- Paice-Husk Stemmer
- Schinke Latin stemmer
- S stemmer
-
String Fingerprints
- string fingerprint
- q-gram fingerprint
- phonetic fingerprint
- Pollock & Zomora's skeleton key
- Pollock & Zomora's omission key
- Cisłak & Grabowski's occurrence fingerprint
- Cisłak & Grabowski's occurrence halved fingerprint
- Cisłak & Grabowski's count fingerprint
- Cisłak & Grabowski's position fingerprint
- Synoname Toolcode
Installation
Required libraries:
- NumPy
- deprecation
Optional libraries (all available on PyPI, some available on conda or conda-forge):
-
SyllabiPy <http://syllabipy.com/>
_ -
NLTK <https://www.nltk.org/>
_ -
PyLZSS <https://github.com/rumbah/pylzss>
_ -
paq <https://github.com/observerss/paq>
_
To install Abydos (master) from Github source::
git clone https://github.com/chrislit/abydos.git --recursive cd abydos python setup install
If your default python command calls Python 2.7 but you want to install for Python 3, you may instead need to call::
python3 setup install
To install Abydos (latest release) from PyPI using pip::
pip install abydos
To install from conda-forge <https://anaconda.org/conda-forge/abydos>
_::
conda install abydos
It should run on Python 3.5-3.8.
Testing & Contributing
To run the whole test-suite just call tox::
tox
The tox setup has the following environments: black, py37, doctest, regression, fuzz, pylint, pydocstyle, flake8, doc8, docs, sloccount, badges, & build. So if you only want to generate documentation (in HTML, EPUB, & PDF formats), just call::
tox -e docs
In order to only run & generate Flake8 reports, call::
tox -e flake8
Contributions such as bug reports, PRs, suggestions, desired new features, etc.
are welcome through Github
Issues <https://github.com/chrislit/abydos/issues>
_ &
Pull requests <https://github.com/chrislit/abydos/pulls>
_.