Use of Set for dictionaries
Initially, I used a js object and hasOwnProperty to do the hashmap lookups and then later used Set and has().
It would be nice to standardize this, I'm just not familiar with the performance of Set vs. Object, I think if Set is faster/the same then we should use it.
I think one benefit of Set is that Object can possibly have issues with numeric keys?
cc/ @Joxit thoughts?
I did some quick performance testing and it seems that Set performs better and the performance is more linear as the size of the dictionary increases:
https://jsperf.com/set-vs-object-as-sets/15
I also did some quick perf (Node v8.16.0) testing and it found that :
- Memory print :
Setuses less memory thanObject(for localities Set = 59.9MB and Object = 61MB) - ops/sec :
obj[value]faster thanset.has(value)faster thanobj.hasOwnProperty(value):thinking:
That's strange, in your test obj[value] seems to be the slowest.
I think it's because those benchmarks (which I copied from someone else) also have a value check ( ^ 0 or !!) which means they are doing two operations.
The problem with something like obj[key] is that it returns false for falsy values such as 0 (although in this case they are all strings so it probably doesn't matter).
It looks like they are pretty similar so let's just go all-in on Set? I like that it doesn't coerse the keys to strings and all the other things that Object does which are weird.
For the prefix checks, I will write a little FST memory structure which will make those much faster, in the meantime they can just use iterators and it will be slow.
Performance overall is pretty good, although some complex queries are getting near 10ms which is not great.
Added FST in https://github.com/pelias/parser/pull/17
I tried your branch, it use much more memory (415MB) and seems to be slower than Set (4 times slower than Set) :thinking:
Here is the test : https://gist.github.com/Joxit/32ccd5b5f63b474f30e707d804cbda25
Maybe in the future if we will want to add some metadata e.g if the token is Rue then it may be French. If it's Boulevard it may be English, French, Spanish....