As a User I want to get results in the same order as they were in the input
I think it is possible by setting up a queue using linked list and a listener.
Looks like this approach is not failproof, I am going to leave it in favor of just running one job.
Having 2nd attempt
As an alternative, it could allow a second column in the input file to specify an input unique ID to enable matching after the job completes. e.g.
f8158395-d663-4f0f-b38d-1c6ecb16a8ca, "g:Homo sp:sapiens"
And the response could include:
{
"responseId": "16f235a0-e4a3-529c-9b83-bd15fe722110", # Potentially change from "id" for added clarity
"inputId": "f8158395-d663-4f0f-b38d-1c6ecb16a8ca",
"name": "Homo sapiens",
"cardinality": 2,
"matchType": "FacetedSearch",
...
This way, an explicit map between the inputs and responses can be maintained, which could be preferable to simply matching both by indices.
Hi @thompsonmj , thank you for the feedback! Do you mean using postprocessing to sort results by the unique IDs?
Not necessarily to sort, but postprocessing to match the results to each input query string.
I assume the desire to keep order of results identical to the order of query strings would be to match them together.
Since the query string (e.g. "g:Homo sp:sapiens" or "n:Homo sapiens" or "Homo sapiens" or "tx:Animalia sp:sapiens") give different values for "name:" in the response, it isn't clear how to map responses back to query strings since the "id:" UUID is created based on the "name:" field rather than the query string.
I think I did understand your point @thompsonmj. Do I understand correctly, that you use command line gnfinder tool?
~~Yep! Via the Docker container. It is extremely fast for long lists of names, which is quite nice.~~
Sorry, misread your reply. I use the CLI tool gnverifier.
Oups, my bad, was working on gnfinder yesterday and made a typo. I did mean gnverifier of course @thompsonmj.
Looks like you use file with names in a way I did not expect. I did not think it would be useful for people to run file with 'FacetedSearch' names in bulk, because quite often such searches return a lot of results and would probably require a human manual intervention to separate useful results from the bulk. Good to know that such usecase exist!
can you check if gnverifier -j 1 ... does the trick for you @thompsonmj?
Yes, setting just 1 job gives results back in the same order as they were entered. At ~200 names/sec, the speed is still excellent even for long lists!
Though I feel multiple concurrent jobs with the ability to map results to input strings would be helpful if #115 (optional vernacular names) gets implemented for gnverifier. With the Global Names Resolver API, asking for vernaculars adds considerable overhead. Assuming a similar performance penalty would come with vernaculars in gnverifier, the speed boost from concurrent jobs would be helpful to offset this, and mapping would be needed if the order becomes mixed up.
Looks like you use file with names in a way I did not expect ...
We have a long list of organisms with a wide variety of taxonomic specificity that we want to get fully resolved taxonomic hierarchies for. Our preferred data source is GBIF, but they don't show up in all results, so we're doing some further reconciliation among tied top scoring results in those cases. I'm still determining the best way to get results to be as pinpointed as possible using the gnverifier advanced queries based on the info we have for each organism.