entity-fishing
entity-fishing copied to clipboard
Implement selective sentence range for processing
To be checked whether
- I'm doing something wrong, or
- the documentaiton needs to be modified.
Here the issues:
- The ranges seems not supported:
e.g. in the following query:
{
"onlyNER": false,
"nbest": false,
"text": "We are heading to Washington. The cat is on the Table in Milan.",
"processSentence": [0-1],
"sentences": [
{
"offsetStart": 0,
"offsetEnd": 29
},
{
"offsetStart": 29,
"offsetEnd": 63
}
]
}
the "processSentence":[0-1]
would result in
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Unexpected character ('-' (code 45)): was expecting comma to separate Array entries
Two solutions, either we (a) allow only integers like [0,1,2,3] or we modify that item as (b) string like ['0','1-2']
- The processSentences seems ignored, for example the following query, should return only
Washington
:
{
"onlyNER": false,
"nbest": false,
"text": "We are heading to Washington. The cat is on the Table in Milan.",
"processSentence": [0],
"sentences": [
{
"offsetStart": 0,
"offsetEnd": 29
},
{
"offsetStart": 29,
"offsetEnd": 63
}
]
}
After discussing with @lfoppiano, we are agree that we need to change the documentation in order to inform that the ranges of values are not supported.
The problem appears from the mapping process between the JSON query to the Nerd query object. The attribute of this object accept the list of integer processSentence
, so it can not process the ranges.
Fixing the code to support the ranges imply changing input query having processSentences
as String, and internally transform it as a list of integer by parsing the items and the ranges.
Since the API is supposed to be used by services, they can easily generate list of integers when calling the service.
The idea of the specific sentence processing was related to (real-time) interactive disambiguation on edited text, so for just processing a sentence which has been changed (possibly two at most). I don't remember where the non-implemented range comes from, probably a bad idea from me too late in the night.
Still, it would make sense to have the possibility to specify a range of sentences, it's a minor change quick to implement. This could be expressed as:
"processSentence": ["0", "1-2", "5"],
or in a more structured manner, and better/cleaner:
"processSentence": [
{ "index": 0},
{ "from": 1, "to": 2},
{ "index": 5}
],
which simply supposes to add a small POJO object to the NerdQuery class.
Having to deal with the ranges would make sense for 'human readable' interaction, in the real-time process the client is taking care of it.
The change would imply still having the list of sentences and in addition adding a layer to understand the information from the query (parsing and extracting the range, then transforming them to sequence).
While discussing we though it was not worth it the effort, since the client can easily generate the sequence.
You don't need to parse the query, this is done automatically by Jackson, only need to add a pojo to store the range info. Then the range info could be used to filter sentences to process (transformation into sequence is a bit weird).
I think this is a reasonable addition, but not a priority at all of course - I would leave the issue open.
When I talk about 'parsing' I'm referring after the query has been mapped to the NerdQuery.
So even having a POJO it won't be enough (the input are strings and (might be) ranges), so they would need to be parsed and, the ranges, expanded list of integers anyway.
My point is that might not be worth it to add this for an API (not human readable/writable) when the client deal with it :-)
Let's leave it open then, anyway the documentation describes features as it is
With a POJO approach, the structures will be integer automatically from jackson (no string input). I think it's not necessary to expand a list of integers, just add a small filter logic in the POJO to check if a given index is covered in the ranges specified in the POJO - but indeed that part needs to be written in a manner or another.