normalise issues

UserWarning re: LabelPropagation

1

`UserWarning: Trying to unpickle estimator LabelPropagation from version 0.18 when using version 0.22. This might lead to breaking code or invalid results. Use at your own risk`

bbookman

Module not found 'sklearn.semi_supervised.label_propagation'

6

- Normalised was successfully installed using pip - normalise function not recognized in the code - I tried "Import normalise" , received the following error: 'sklearn.semi_supervised.label_propagation'. Installing this module :...

dimaelzein

FutureWarning re: sklearn.semi_supervised.label_propagation

2

` FutureWarning: The sklearn.semi_supervised.label_propagation module is deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.semi_supervised. Anything that...

bbookman

Warning: Careful using a custom tokenizer...

I tried to use the spaCy tokenizer, nltk `word_tokenizer`, `sacremoses` `MosesTokenizer`, nltk `TreebankWordTokenizer`, and nltk `TweetTokenizer`. For this example, `"inch BBL, unquote, cost $29.95"` they will all output `['inch', 'BBL',...

PetrochukM

IndexError: list index out of range

IndexError Traceback (most recent call last) in () 7 } 8 ----> 9 pprint(normalise(cleaned_corpus)) 7 frames /usr/local/lib/python3.6/dist-packages/normalise/normalisation.py in normalise(text, tokenizer, verbose, variety, user_abbrevs) 155 return insert(tokenizer(text), verbose=verbose, variety=variety, user_abbrevs=user_abbrevs) 156...

NouamaneTazi

inaccuracy

Classifier to figure out if text is AmE or BrE (in order to better do dates)

1

Can use spelling variations, lexical items, incidents of dates where ordering unambiguous (e.g. '15/2/96').

EFord36

enhancement

normalise
normalise copied to clipboard

Metadata

UserWarning re: LabelPropagation

Module not found 'sklearn.semi_supervised.label_propagation'

FutureWarning re: sklearn.semi_supervised.label_propagation

Warning: Careful using a custom tokenizer...

IndexError: list index out of range

Add functionality to be able disable modules

US and international phone numbers, eg. +44 (0)1223 760812, (905) 513-7480

Unable to expand scientific formats eg. 4.321768×10^3, 10^−27, −5.3×10^4, get deleted

gen_candidates('abbrv') doesn't return anything (should return at least 'abbreviation')

Classifier to figure out if text is AmE or BrE (in order to better do dates)

← Metadata

Owner

Metadata

normalise normalise copied to clipboard

Metadata

← Metadata

Owner

Metadata

normalise
normalise copied to clipboard