spark Searching `Patient` by `Identifier` is slow.

Is your feature request related to a problem? Please describe. I have 100w patient data,and use identifier to search very slowly.

GET /fhir/Patient?identifier=test

May 06 '22 08:05 whyfate

Doing a test on one of my test servers with the search below returns in about 140-150 milliseconds. There are 405 patient entries in the database.

GET https://spark.incendi.no/fhir/Patient?identifier=12345

The server is a Ubuntu 20.04 with 1 CPU core and 2GB of memory.

May 06 '22 09:05 kennethmyhra

Yes, when the amount of data is still small, the query efficiency is OK, but when the amount of data becomes more, the efficiency is very low. My server is Ubuntu 18.04 with 8 CPU core and 32GB of memory. 100w patient entries search need 3s.

May 06 '22 10:05 whyfate

So we are talking about a million patient entries. I was not aware of the suffix w meaning 10 000.

Have you tried adding these indexes? https://github.com/FirelyTeam/spark/blob/r4/master/scripts/CreateIndexes.js

If you are lacking the index at line 2 adding this should help with the lack of performance you are seeing, could be it needs a slight modification since you are effectively not including system when querying the identifier.

As always when working with indexes. Profiling your database should help further pinpointing the issue. Also please test these indexes in a test environment and backup your database before running that script.

May 06 '22 19:05 kennethmyhra

Yes,I add a index to Identifier,But it has little effect on the query. I think the problem is here https://github.com/FirelyTeam/spark/blob/6bdd48f30ea13d7b1110f28cb2b8b0d6ae2c8c93/src/Spark.Mongo/Search/Searcher/CriteriaMongoExtensions.cs#L321-L383

May 07 '22 00:05 whyfate

It's the slow query log.

{
    "s": "I", 
    "c": "COMMAND", 
    "id": 51803, 
    "ctx": "conn581", 
    "msg": "Slow query", 
    "attr": {
        "type": "command", 
        "ns": "spark.searchindex", 
        "command": {
            "find": "searchindex", 
            "filter": {
                "internal_level": 0, 
                "internal_resource": "Patient", 
                "$or": [
                    {
                        "identifier": {
                            "$elemMatch": {
                                "code": "test"
                            }
                        }
                    }, 
                    {
                        "identifier": {
                            "$not": {
                                "$type": 4
                            }
                        }, 
                        "identifier.code": "test"
                    }, 
                    {
                        "$and": [
                            {
                                "identifier": {
                                    "$type": 2
                                }
                            }, 
                            {
                                "identifier": "test"
                            }
                        ]
                    }
                ]
            }, 
            "projection": {
                "internal_selflink": 1
            }, 
            "$db": "spark", 
            "lsid": {
                "id": {
                    "$uuid": "748eacb6-cce5-4088-9e40-67b08adea3ed"
                }
            }
        }, 
        "planSummary": "IXSCAN { internal_level: 1, internal_resource: 1 }", 
        "keysExamined": 1001234, 
        "docsExamined": 1001234, 
        "cursorExhausted": true, 
        "numYields": 1001, 
        "nreturned": 0, 
        "queryHash": "F3A3EA5B", 
        "planCacheKey": "E06A389B", 
        "reslen": 106, 
        "locks": {
            "ReplicationStateTransition": {
                "acquireCount": {
                    "w": 1002
                }
            }, 
            "Global": {
                "acquireCount": {
                    "r": 1002
                }
            }, 
            "Database": {
                "acquireCount": {
                    "r": 1002
                }
            }, 
            "Collection": {
                "acquireCount": {
                    "r": 1002
                }
            }, 
            "Mutex": {
                "acquireCount": {
                    "r": 1
                }
            }
        }, 
        "storage": { }, 
        "protocol": "op_msg", 
        "durationMillis": 2469
    }
}

This query used '$or', so it didn't hit the identifier index.

May 07 '22 01:05 whyfate

spark spark copied to clipboard

Searching `Patient` by `Identifier` is slow.

spark
spark copied to clipboard