spark
spark copied to clipboard
Searching `Patient` by `Identifier` is slow.
Is your feature request related to a problem? Please describe. I have 100w patient data,and use identifier to search very slowly.
GET /fhir/Patient?identifier=test
Doing a test on one of my test servers with the search below returns in about 140-150 milliseconds. There are 405 patient entries in the database.
GET https://spark.incendi.no/fhir/Patient?identifier=12345
The server is a Ubuntu 20.04 with 1 CPU core and 2GB of memory.
Yes, when the amount of data is still small, the query efficiency is OK, but when the amount of data becomes more, the efficiency is very low. My server is Ubuntu 18.04 with 8 CPU core and 32GB of memory. 100w patient entries search need 3s.
So we are talking about a million patient entries. I was not aware of the suffix w meaning 10 000.
Have you tried adding these indexes? https://github.com/FirelyTeam/spark/blob/r4/master/scripts/CreateIndexes.js
If you are lacking the index at line 2 adding this should help with the lack of performance you are seeing, could be it needs a slight modification since you are effectively not including system when querying the identifier.
As always when working with indexes. Profiling your database should help further pinpointing the issue. Also please test these indexes in a test environment and backup your database before running that script.
Yes,I add a index to Identifier,But it has little effect on the query. I think the problem is here https://github.com/FirelyTeam/spark/blob/6bdd48f30ea13d7b1110f28cb2b8b0d6ae2c8c93/src/Spark.Mongo/Search/Searcher/CriteriaMongoExtensions.cs#L321-L383
It's the slow query log.
{
"s": "I",
"c": "COMMAND",
"id": 51803,
"ctx": "conn581",
"msg": "Slow query",
"attr": {
"type": "command",
"ns": "spark.searchindex",
"command": {
"find": "searchindex",
"filter": {
"internal_level": 0,
"internal_resource": "Patient",
"$or": [
{
"identifier": {
"$elemMatch": {
"code": "test"
}
}
},
{
"identifier": {
"$not": {
"$type": 4
}
},
"identifier.code": "test"
},
{
"$and": [
{
"identifier": {
"$type": 2
}
},
{
"identifier": "test"
}
]
}
]
},
"projection": {
"internal_selflink": 1
},
"$db": "spark",
"lsid": {
"id": {
"$uuid": "748eacb6-cce5-4088-9e40-67b08adea3ed"
}
}
},
"planSummary": "IXSCAN { internal_level: 1, internal_resource: 1 }",
"keysExamined": 1001234,
"docsExamined": 1001234,
"cursorExhausted": true,
"numYields": 1001,
"nreturned": 0,
"queryHash": "F3A3EA5B",
"planCacheKey": "E06A389B",
"reslen": 106,
"locks": {
"ReplicationStateTransition": {
"acquireCount": {
"w": 1002
}
},
"Global": {
"acquireCount": {
"r": 1002
}
},
"Database": {
"acquireCount": {
"r": 1002
}
},
"Collection": {
"acquireCount": {
"r": 1002
}
},
"Mutex": {
"acquireCount": {
"r": 1
}
}
},
"storage": { },
"protocol": "op_msg",
"durationMillis": 2469
}
}
This query used '$or', so it didn't hit the identifier index.