superfastmatch
superfastmatch copied to clipboard
Search fails to find known match
Here's a script to reproduce the problem (note that this script uses the sfmtest database -- not the default superfastmatch).
lorem="Lorem ipsum dolor sit amet, consectetur adipiscing elit."
curl --data-urlencode "title=Lorem ipsum" --data-urlencode "text=$lorem" http://localhost:8080/document/1/1/
# Wait for the document has been processed
sleep 3
# Show document in the database
mongoexport --db sfmtest --collection documents
# Show document via the API
curl http://localhost:8080/document/1/1/
# False negative search via the API
curl --data-urlencode "text=$lorem" http://localhost:8080/search/
# False netative search via the command line
echo "$lorem" > /tmp/loremipsum.txt
./superfastmatch-linux search /tmp/loremipsum.txt
echo $?
Run with bash -x:
$ bash -x searchtest.sh
+ lorem='Lorem ipsum dolor sit amet, consectetur adipiscing elit.'
+ curl --data-urlencode 'title=Lorem ipsum' --data-urlencode 'text=Lorem ipsum dolor sit amet, consectetur adipiscing elit.' http://localhost:8080/document/1/1/
{"id":"5130fd227da2802b10000009","command":"Add Document","source":null,"target":{"doctype":1,"docid":1},"sourceRange":"","targetRange":"","status":"Queued","error":"","success":true}
+ sleep 3
+ mongoexport --db sfmtest --collection documents
connected to: 127.0.0.1
{ "_id" : { "doctype" : 1, "docid" : 1 }, "title" : "Lorem ipsum", "text" : "Lorem ipsum dolor sit amet, consectetur adipiscing elit.", "length" : 56, "valid" : true, "meta" : {}, "associations" : null }
exported 1 records
+ curl http://localhost:8080/document/1/1/
{"id":{"doctype":1,"docid":1},"title":"Lorem ipsum","text":"Lorem ipsum dolor sit amet, consectetur adipiscing elit.","characters":56,"valid":true}
+ curl --data-urlencode 'text=Lorem ipsum dolor sit amet, consectetur adipiscing elit.' http://localhost:8080/search/
{"success":false,"totalRows":0}
+ echo 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.'
+ ./superfastmatch-linux search /tmp/loremipsum.txt
+ echo 0
0
A comparable script works against the old API:
lorem="Lorem ipsum dolor sit amet, consectetur adipiscing elit."
curl --data-urlencode "title=Lorem ipsum" --data-urlencode "text=$lorem" http://localhost:8080/document/1/1/
# Wait for the document has been processed
sleep 3
# Show document via the API
curl http://localhost:8080/document/1/1/
# False negative search via the API
curl --data-urlencode "text=$lorem" http://localhost:8080/search/
Yields this output:
$ bash -x oldsearchtest.sh
+ lorem='Lorem ipsum dolor sit amet, consectetur adipiscing elit.'
+ curl --data-urlencode 'title=Lorem ipsum' --data-urlencode 'text=Lorem ipsum dolor sit amet, consectetur adipiscing elit.' http://localhost:8080/document/1/1/
{
"success" : true,
"queued" : [
{
"id" : 3,
"status" : "Queued",
"action" : "Add Document",
"priority" : 2,
"doctype" : 1,
"docid" : 1,
"source" : "",
"target" : ""
}
]
}
+ sleep 3
+ curl http://localhost:8080/document/1/1/
{
"success" : true,
"documents" :{
"metaData" :{
"fields" : []
},
"rows" :[
]
},
"characters": 56,
"docid": 1,
"doctype": 1,
"title": "Lorem ipsum",
"text" : "Lorem ipsum dolor sit amet, consectetur adipiscing elit."
}+ curl --data-urlencode 'text=Lorem ipsum dolor sit amet, consectetur adipiscing elit.' http://localhost:8080/search/
{
"success" : true,
"documents" :{
"metaData" :{
"fields" : ["characters","docid","doctype","fragment_count","title"]
},
"rows" :[
{
"fragments" : [[0,0,55,3698939003]],
"characters": 56,
"docid": 1,
"doctype": 1,
"title": "Lorem ipsum",
"fragment_count": 1
}
]
}}