python-wordsegment
python-wordsegment copied to clipboard
Return a list of the most probable segmentations.
It would be great if wordsegment returned that. For instance:
> ws.rank("nobodyelse")
[ ["nobody", "else"], ["no", "body", "else"], ...]
or
> ws.probabilities("nobodyelse")
[
[ ["nobody", "else"], ["no", "body", "else"], ...],
[ 0.727362, 0.0012372, ...]
]
Why? What's your use case? And how do you define "most probable"?
Your algorithm is probabilistic, or at least it uses some sort of score. If I'm using wordsegment as part of a language modelling pipeline, I may want to propagate that measure of uncertainty.
I'm using it to model language in passwords. After a password is segmented, part-of-speech and semantic tags are inferred. The model sees these features and updates its beliefs. The devil is in ambiguity: therapistfinder has two meanings depending on segmentation. Ideally, the model should account for both.
Ok, pull request welcome.