entity-fishing
entity-fishing copied to clipboard
Check customisation
The customisation are not behaving as expected (or maybe I didn't understand), here the example:
POST /customisation
value:
{
"customisation": {
"wikipedia": [
105942, 1499966, 4105431
],
"lang": "fr",
"texts": [
"Place de la République, Hotel Moderne, vaste batisse où étaient logées les petites souris grises, d’autres disent « les Salamandres », jeunes allemandes en uniforme. Elles partent et elles ne pouvait emporter qu’un léger bagage à la main. Jetons aux combinards de Vichy et de Washington, en défi, une tête de traître."
],
"description": "customisation for the ww2 french liberation"
}
}
name: ww2fr
But then when analysing the sentence:
Jetons aux combinards de Vichy et de Washington, en défi, une tête de traître.
Vichy is not recognised as Regime de Vichy
but as Vichy
the town, when in the customisation I have added the wikipedia id of the Regime de Vichy
.
It's a very stupid bug, when processing the relatedness the entity in the context was ignored if it was exactly the same as the candidate to be disambiguated :/ So it practices it was doing the exact opposite of what is intended in this case. Fixed in branch branch 0.3.0.
still to be tested when merged !
Maybe I'm missing something but the customisation doesn't seems to be used at all in the branch0.0.3
.
This issue should now have been fixed. Need some thoughful testing.
We need to have two small improvements:
- add validation of the json before storing the customisation
- improve return messages when a customisation is succesfully added
Here another example. The idea was to pick some very strange terms and see whether they were happearing in the result.
customisation:
{
"wikipedia": [
4576770,
2154892,
29044100
],
"language": {
"lang": "en"
},
"description": "customisation test"
}
query:
{
"text": "The president of Washington went for a trip to Vichy. \n His friend Josephine was so happy to see him in the new Mercedes.\n Charles Dickens went with him on a ride on the Canyon with the German Army. ",
"language": {
"lang": "en"
},
"mentions": [
"ner",
"wikipedia"
],
"nbest": false,
"sentence": false,
"customisation": "washington"
}
The result is very strange:

OK if I understand correctly, the customisation is used to build the context which is carried on and used as a feature with the selector to make a selection among the candidates...
double prob = selector.getProbability(candidate.getNerdScore(),
candidate.getLabel().getLinkProbability(),
candidate.getWikiSense().getPriorProbability(),
words.size(),
candidate.getRelatednessScore(),
context.contains(candidate),
isNe,
tf*idf,
dice);
I don't understand then what happens if from the candidates selected for a specific entity, none of them is actually being included in the context, then it seems that the customisation is not taken in consideration.
That means that the customisation might not work if a user decide to bias the context with low probabilty terms. In the example few comments above we've tried something like that.
Another issue to be considered that for the text given above, it found also the error of a Type, for instance the mention 'Vichy' which is recognized as city, it has the type "PERSON".

The test of customisation was done as follows:
- Check whether the documentation for customisation has been updated Nerd's documentation.
Test case:
-
Based on the documentation, the error will appear if we POST the same name of customisation. In order to do so, it has been created a customisation called "test"
"customisation": "test"
. -
Result when the same name of customisation is created:
-
Result: Pass
- The customisation created by the users are then used in the disambiguation service a. Test case: a new customisation is created by a user with the POST command.
- The new customisation is created by the parameters of "name:test" and the "value:" as follow:
{
"wikipedia": [
30557304,
53372101,
1814443,
3540824,
512737
],
"language" : {"lang":"en"},
"description": "customisation test"
}
- Result: Pass
b. Test case: check whether the new customisation is created successfully
-
with the GET command, the service gave the respond in JSON format
-
Result: Pass
c. Test case: check the customisation whether is used in the text or not
{
"text": "The president of Washington went for a trip to Vichy. \n His friend Josephine was so happy to see him in the new Mercedes.\n Charles Dickens went with him on a ride on the Canyon with the German Army. ",
"language": {
"lang": "en"
},
"mentions": [
"ner",
"wikipedia"
],
"nbest": false,
"sentence": false,
"customisation": "test"
}
-
None of these Ids is ever used. The customisation is always directed to
"customisation": "generic"
instead of"customisation": "test"
which is generated as the test case.
-- Result of "customisation": "generic"

-- Result of "customisation": "test"

*) the result of customisation in JSON format:
{
"runtime": 1033,
"nbest": false,
"text": "The president of Washington went for a trip to Vichy. \n His friend Josephine was so happy to see him in the new Mercedes.\n Charles Dickens went with him on a ride on the Canyon with the German Army. ",
"language": {
"lang": "en",
"conf": 0
},
"global_categories": [
{
"weight": 0.03132133013477851,
"source": "wikipedia-en",
"category": "Ghost story writers",
"page_id": 28796407
},
{
"weight": 0.03132133013477851,
"source": "wikipedia-en",
"category": "English short story writers",
"page_id": 1095761
},
{
"weight": 0.03132133013477851,
"source": "wikipedia-en",
"category": "Charles Dickens",
"page_id": 723464
}
],
"entities": [
{
"rawName": "Washington",
"type": "PERSON",
"offsetStart": 17,
"offsetEnd": 27,
"nerd_score": 0.8,
"nerd_selection_score": 0
},
{
"rawName": "Vichy",
"type": "PERSON",
"offsetStart": 47,
"offsetEnd": 52,
"nerd_score": 0.7854,
"nerd_selection_score": 0.5977,
"wikipediaExternalRef": 51169,
"wikidataId": "Q93351",
"domains": [
"Biology",
"Sociology"
]
},
{
"rawName": "Josephine",
"type": "PERSON",
"offsetStart": 67,
"offsetEnd": 76,
"nerd_score": 0.8,
"nerd_selection_score": 0
},
{
"rawName": "Mercedes",
"type": "PERSON",
"offsetStart": 112,
"offsetEnd": 120,
"nerd_score": 0.8,
"nerd_selection_score": 0
},
{
"rawName": "Charles Dickens",
"offsetStart": 123,
"offsetEnd": 138,
"nerd_score": 1,
"nerd_selection_score": 0.809,
"wikipediaExternalRef": 5884,
"wikidataId": "Q5686",
"domains": [
"Biology",
"Administration"
]
},
{
"rawName": "German",
"type": "ORGANISATION",
"offsetStart": 186,
"offsetEnd": 192,
"nerd_score": 0.8,
"nerd_selection_score": 0
},
{
"rawName": "Army",
"type": "ORGANISATION",
"offsetStart": 193,
"offsetEnd": 197,
"nerd_score": 0.8,
"nerd_selection_score": 0
}
]
}
- Result: Fail
The two last cases show the observation in my last comment. Looking at the code it seems that choosing a very unlikely wikipedia entity won't be taken in consideration should this candidate not be selected - which is not the case for unlikely entities.
@tantikristanti could you add in the previous comment the whole queries, the JSON responses and the customisation JSON as plain text?
@kermitt2 could you have a look at it?