Search fails to find known match

Open dvogel opened this issue 12 years ago • 0 comments

Here's a script to reproduce the problem (note that this script uses the sfmtest database -- not the default superfastmatch).

lorem="Lorem ipsum dolor sit amet, consectetur adipiscing elit."
curl --data-urlencode "title=Lorem ipsum" --data-urlencode "text=$lorem" http://localhost:8080/document/1/1/

# Wait for the document has been processed
sleep 3

# Show document in the database
mongoexport --db sfmtest --collection documents

# Show document via the API
curl http://localhost:8080/document/1/1/

# False negative search via the API
curl --data-urlencode "text=$lorem" http://localhost:8080/search/

# False netative search via the command line
echo "$lorem" > /tmp/loremipsum.txt
./superfastmatch-linux search /tmp/loremipsum.txt
echo $?

Run with bash -x:

$ bash -x searchtest.sh 
+ lorem='Lorem ipsum dolor sit amet, consectetur adipiscing elit.'
+ curl --data-urlencode 'title=Lorem ipsum' --data-urlencode 'text=Lorem ipsum dolor sit amet, consectetur adipiscing elit.' http://localhost:8080/document/1/1/
{"id":"5130fd227da2802b10000009","command":"Add Document","source":null,"target":{"doctype":1,"docid":1},"sourceRange":"","targetRange":"","status":"Queued","error":"","success":true}
+ sleep 3
+ mongoexport --db sfmtest --collection documents
connected to: 127.0.0.1
{ "_id" : { "doctype" : 1, "docid" : 1 }, "title" : "Lorem ipsum", "text" : "Lorem ipsum dolor sit amet, consectetur adipiscing elit.", "length" : 56, "valid" : true, "meta" : {}, "associations" : null }
exported 1 records
+ curl http://localhost:8080/document/1/1/
{"id":{"doctype":1,"docid":1},"title":"Lorem ipsum","text":"Lorem ipsum dolor sit amet, consectetur adipiscing elit.","characters":56,"valid":true}
+ curl --data-urlencode 'text=Lorem ipsum dolor sit amet, consectetur adipiscing elit.' http://localhost:8080/search/
{"success":false,"totalRows":0}
+ echo 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.'
+ ./superfastmatch-linux search /tmp/loremipsum.txt

+ echo 0
0

A comparable script works against the old API:

lorem="Lorem ipsum dolor sit amet, consectetur adipiscing elit."
curl --data-urlencode "title=Lorem ipsum" --data-urlencode "text=$lorem" http://localhost:8080/document/1/1/

# Wait for the document has been processed
sleep 3

# Show document via the API
curl http://localhost:8080/document/1/1/

# False negative search via the API
curl --data-urlencode "text=$lorem" http://localhost:8080/search/

Yields this output:

$ bash -x oldsearchtest.sh 
+ lorem='Lorem ipsum dolor sit amet, consectetur adipiscing elit.'
+ curl --data-urlencode 'title=Lorem ipsum' --data-urlencode 'text=Lorem ipsum dolor sit amet, consectetur adipiscing elit.' http://localhost:8080/document/1/1/
{
    "success" : true,
    "queued" : [
                  {
                    "id"        : 3,
                    "status"    : "Queued",
                    "action"    : "Add Document",
                    "priority"  : 2,
                    "doctype"   : 1,
                    "docid"     : 1,
                    "source"    : "",
                    "target"    : ""
                  }                
               ]
}
+ sleep 3
+ curl http://localhost:8080/document/1/1/
{
    "success" : true,
    "documents" :{
        "metaData"  :{
                        "fields"      : []
        },
        "rows"      :[
                     ]
    },
    "characters": 56,
    "docid": 1,
    "doctype": 1,
    "title": "Lorem ipsum",
    "text"      : "Lorem ipsum dolor sit amet, consectetur adipiscing elit."
}+ curl --data-urlencode 'text=Lorem ipsum dolor sit amet, consectetur adipiscing elit.' http://localhost:8080/search/
{
    "success" : true,
    "documents" :{
        "metaData"  :{
                        "fields"      : ["characters","docid","doctype","fragment_count","title"]
        },
        "rows"      :[
                        {
                          "fragments" : [[0,0,55,3698939003]],
                          "characters": 56,
                          "docid": 1,
                          "doctype": 1,
                          "title": "Lorem ipsum",
                          "fragment_count": 1
                        }
                     ]
    }}

Mar 01 '13 19:03 dvogel