python-wordsegment icon indicating copy to clipboard operation
python-wordsegment copied to clipboard

Return a list of the most probable segmentations.

Open rafaveguim opened this issue 7 years ago • 3 comments

It would be great if wordsegment returned that. For instance:

> ws.rank("nobodyelse")
[ ["nobody", "else"], ["no", "body", "else"], ...]

or

> ws.probabilities("nobodyelse")
[
[ ["nobody", "else"], ["no", "body", "else"], ...],
[ 0.727362, 0.0012372, ...]
]

rafaveguim avatar Mar 02 '18 14:03 rafaveguim

Why? What's your use case? And how do you define "most probable"?

grantjenks avatar Mar 02 '18 23:03 grantjenks

Your algorithm is probabilistic, or at least it uses some sort of score. If I'm using wordsegment as part of a language modelling pipeline, I may want to propagate that measure of uncertainty.

I'm using it to model language in passwords. After a password is segmented, part-of-speech and semantic tags are inferred. The model sees these features and updates its beliefs. The devil is in ambiguity: therapistfinder has two meanings depending on segmentation. Ideally, the model should account for both.

rafaveguim avatar Mar 03 '18 05:03 rafaveguim

Ok, pull request welcome.

grantjenks avatar Mar 09 '18 01:03 grantjenks