python-wordsegment issues

Using with Additional corpus of spelling mistakes.

3

I’m pondering on using this as a service to an app for disabled people who we support who would use this to communicate. We see a lot of users who...

willwade

Support for Other Languages

2

The LDC has the Web 1T 5-gram 10 European Languages published at https://catalog.ldc.upenn.edu/LDC2009T25 Is there any plan to support these languages? If not, can I jump in and contribute? Would...

ykhatami

Bump wheel from 0.29.0 to 0.38.1

Bumps [wheel](https://github.com/pypa/wheel) from 0.29.0 to 0.38.1. Changelog Sourced from wheel's changelog. Release Notes UNRELEASED Updated vendored packaging to 22.0 0.38.4 (2022-11-09) Fixed PKG-INFO conversion in bdist_wheel mangling UTF-8 header values...

dependabot[bot]

dependencies

Moved the metadata out of `setup.py` into `setup.cfg`.

2

Added `pyproject.toml`. Replaced importing the version variable with reading it from the file using `read_version`. If we drop `python3 ./setup.py test`, then `setup.py` can be removed completely since now (to...

KOLANICH

RecursionError on segment call

6

Hi, I'm having trouble with following code: ``` import wordsegment wordsegment.load() text = "The article went on to say, “For in the pizza shops rich and poor harmoniously congregate; they...

irmo322

Add CHUNK_SIZE attribute to customize isegment()

Allows for customization in #33 .

grantjenks

feature_request(mode): preserve all punctuation marks

3

### 1. Summary It would be nice, if WordSegment at least at CLI mode will have the option to preserve all punctuation marks: `.`, `,`, `’` and so on. ###...

Kristinita

Support for maintaining original case

1

- commit 1: test coverage for maintaining original character casing - commit 2: optional cmd line arg for maintaining case in file input (defaults to original, lower cased segment output)...

esilgard

Please allow separation of numbers from text

1

"a frail 88-year old man" is being outputed as ["a","frail88","year","old"] This doesn't help at all. Having numbers in a block of text is so common in any domain. It's sad...

prabhatM

Correctly merge lowercase and uppercase bigrams

Some entries in the wordsegment/bigrams.txt file used to be duplicated. In particular, each bigrams was lowercased, but since some bigrams had an uppercase and lowercase appearance, the same bigram appeared...

kvakil

python-wordsegment
python-wordsegment copied to clipboard

Metadata

Using with Additional corpus of spelling mistakes.

Support for Other Languages

Bump wheel from 0.29.0 to 0.38.1

Moved the metadata out of `setup.py` into `setup.cfg`.

RecursionError on segment call

Add CHUNK_SIZE attribute to customize isegment()

feature_request(mode): preserve all punctuation marks

Support for maintaining original case

Please allow separation of numbers from text

Correctly merge lowercase and uppercase bigrams

← Metadata

Owner

Metadata

python-wordsegment python-wordsegment copied to clipboard

Metadata

← Metadata

Owner

Metadata

python-wordsegment
python-wordsegment copied to clipboard