python-crfsuite 'bias' feature in example

I'm very new to CRF so I apologize if my issue is just ignorance...

I was going through the example and noticed that word2features() added a 'bias' to the beginning of each feature set. Does this have a purpose? It seems that since every set of features will contain that 'bias' string that the end result should be the same without it. (or I'm just totally not getting it) I tried looking through the docs here and the crfsuite docs and couldn't find anything that would indicate the purpose.

Oct 19 '17 21:10 tisdall

'bias' is a feature that capture the proportion of a given label in the training set.

Intuitively, if you have no other feature than 'bias' in your model (so your features are just indicator functions of the current label) then the weight of the feature learned will be higher if the label appears more. When you will predict, you will just always return the label with higher weight, which is the one which appear the most during training.

In a 'real' CRF, it is just a way to express that some label are rare by themself and other not, so you may take count of this (for example you can imagine a language in which verbs are mostly avoided and not nouns, so you should express that with the weight of the 'bias' feature lower for verbs labels than nouns).

I hope it is clear...

Jan 19 '18 15:01 Pantamis

@Pantamis , thanks for replying, but I still don't understand. Did you look at the example? The function is just adding 'bias' to the beginning of every feature set regardless.

Jan 19 '18 19:01 tisdall

Ok I will try to be more explicit.

CRFsuite uses transition features of the form I(y_t-1=a, y_t=b) and state features of the form f(x) I(y_t=a). For transitions, the features are automatically created.

With word2features() you specify f(x) in state features. For a feature like word.istitle(), the logs potentials look like I(x.istitle()=True/False)*I(y_t=a) where a take each possible value for label.

The feature biais does not depend of x so by adding biais you are adding in your set of feature all the features of the form cte*I(y_t=a) where a has any value of a label.

So the model is learning those weights associated with labels as if labels where draw independently from a given probability distribution.

Jan 20 '18 12:01 Pantamis

Maybe it'd help if you answered the original question... If you simply remove the 'bias' from the feature set is the end result still the same? Likewise, if you change it to 'bill' will it make any difference?

Jan 22 '18 13:01 tisdall

"Likewise, if you change it to 'bill' will it make any difference?" : No, it is just a name for the feature, you can search for a state feature of the form (a float) B-ORG biais when you print them in the example (with any label name instead of B-ORG and 'bill' if you change it)

"If you simply remove the 'bias' from the feature set is the end result still the same ?" : Honestly i find this question really hard to answer, maybe someone else can help. If you keep it then the model learns some weights which means what I said before (the higher the weights associated with B-ORG biais, the higher the proportion of B-ORG in the training set). But if you remove it then the other features will have different weights to compensate (the high weight of B-ORG biais will increasing all other weights about B-ORG instead).

I am not sure, what happen if you try to run the example by removing 'biais' in word2features() ?

Jan 22 '18 15:01 Pantamis

I applied it for sentence boundary disambiguation, when i remove the bias, my model be more aggresive in segmenting the sentence but when i add bias = 1 it make more better, but i didnt know the reason why

Nov 29 '20 09:11 hanifabd

python-crfsuite python-crfsuite copied to clipboard

'bias' feature in example

python-crfsuite
python-crfsuite copied to clipboard