duckling_old add a confidence key

add a confidence key

Open gnardari opened this issue 7 years ago • 4 comments

I'm trying to add a confidence key to duckling's output.

{:dim :number, :body "43", :value {:type "value", :value 43}, :start 0, :end 2, :confidence 1.0}

I'm using Math/exp to convert the log probability

https://github.com/gnardari/duckling/blob/confidence/src/duckling/engine.clj#L236

I'm trying to normalize the probability with

P(d|c) / P(d) where 
P(d) = P(d|c1) + P(d|c2) + ... P(d|cn) and
P(d|c) = P(x1|c) . P(x2|c) ... (Pxn|c) . P(c)

https://github.com/gnardari/duckling/blob/confidence/src/duckling/ml/naivebayes.clj#L21

but couldn't get it right. Maybe someone here could help me..

Mar 10 '17 19:03 gnardari

You don't need to add the confidence key. It's simply enough not to remove the log-prob key that is already there (see select-winners in core.clj).

Besides, log-prob numerically is much nicer than plain probabilities, because products of probabilities like P(d|c) = P(x1|c) . P(x2|c) ... (Pxn|c) . P(c) can underflow rather quickly.

Mar 10 '17 23:03 justinasvd

I see your point, but log probabilities are not as intuitive for API users as [0,1] IMO.

Mar 11 '17 12:03 gnardari

I think that those API users who would care about confidence and -- more importantly! -- know how to use it correctly would prefer plain log-prob.

Consider a sentence "Wake me up at five am tomorrow". Duckling yields these parses:

number ("five"), log-prob: -0.14
distance ("five"), log-prob: -2.31
volume ("five"), log-prob: -2.20
temperature ("five"), log-prob: -2.29
time ("at five am tomorrow"), log-prob: -18.26

If you selected a parse simply by max(log-prob) or even max(exp(log-prob)), you would have to say that the winning parse is #1, and that the user wants to see a number. Certainly, this is incorrect. So instead of using log-prob of a whole parse, you would probably want to use another metric: some kind of measure of confidence per character. For instance, log-prob * exp(-(end - start)) would be a metric that would favor longer parses, and the parse #5 would then be winning.

Moreover, any person who would want to compute confidence, would also have to take into the account whether the parse is latent or not. More likely than not, you would want to disfavor latent parses.

Summa summarum: I don't think that you can appease all the people by adding :confidence key, so it would be best not to do it. A more constructive and more general approach would be to expose :log-prob and let the people do whatever they want with it.

Mar 12 '17 09:03 justinasvd

Thanks for your feedback, decided to use something close to what you suggested on my fork. Maybe someone from wit can comment on this issue since it looks like their version of Duckling running in production has a confidence key with a [0,1] interval.

Mar 18 '17 18:03 gnardari

duckling_old duckling_old copied to clipboard

add a confidence key

duckling_old
duckling_old copied to clipboard