python-crfsuite
python-crfsuite copied to clipboard
'bias' feature in example
I'm very new to CRF so I apologize if my issue is just ignorance...
I was going through the example and noticed that word2features()
added a 'bias'
to the beginning of each feature set. Does this have a purpose? It seems that since every set of features will contain that 'bias' string that the end result should be the same without it. (or I'm just totally not getting it) I tried looking through the docs here and the crfsuite docs and couldn't find anything that would indicate the purpose.
'bias'
is a feature that capture the proportion of a given label in the training set.
Intuitively, if you have no other feature than 'bias'
in your model (so your features are just indicator functions of the current label) then the weight of the feature learned will be higher if the label appears more. When you will predict, you will just always return the label with higher weight, which is the one which appear the most during training.
In a 'real' CRF, it is just a way to express that some label are rare by themself and other not, so you may take count of this (for example you can imagine a language in which verbs are mostly avoided and not nouns, so you should express that with the weight of the 'bias'
feature lower for verbs labels than nouns).
I hope it is clear...
@Pantamis , thanks for replying, but I still don't understand. Did you look at the example? The function is just adding 'bias'
to the beginning of every feature set regardless.
Ok I will try to be more explicit.
CRFsuite uses transition features of the form I(y_t-1=a, y_t=b)
and state features of the form f(x) I(y_t=a)
. For transitions, the features are automatically created.
With word2features()
you specify f(x)
in state features. For a feature like word.istitle()
, the logs potentials look like I(x.istitle()=True/False)*I(y_t=a)
where a take each possible value for label.
The feature biais
does not depend of x
so by adding biais
you are adding in your set of feature all the features of the form cte*I(y_t=a)
where a
has any value of a label.
So the model is learning those weights associated with labels as if labels where draw independently from a given probability distribution.
Maybe it'd help if you answered the original question... If you simply remove the 'bias'
from the feature set is the end result still the same? Likewise, if you change it to 'bill'
will it make any difference?
"Likewise, if you change it to 'bill'
will it make any difference?" : No, it is just a name for the feature, you can search for a state feature of the form (a float) B-ORG biais
when you print them in the example (with any label name instead of B-ORG
and 'bill'
if you change it)
"If you simply remove the 'bias'
from the feature set is the end result still the same ?" : Honestly i find this question really hard to answer, maybe someone else can help.
If you keep it then the model learns some weights which means what I said before (the higher the weights associated with B-ORG biais
, the higher the proportion of B-ORG
in the training set). But if you remove it then the other features will have different weights to compensate (the high weight of B-ORG biais
will increasing all other weights about B-ORG
instead).
I am not sure, what happen if you try to run the example by removing 'biais'
in word2features()
?
I applied it for sentence boundary disambiguation, when i remove the bias, my model be more aggresive in segmenting the sentence but when i add bias = 1 it make more better, but i didnt know the reason why