node-geoip-native
node-geoip-native copied to clipboard
More speed and less memory usage via code generation
Hi,
I've made some more progressive enhancements within my fork of 'node-geoip-native'. The basic ideas:
- generating JS code from CSV file in 'build phase' or manual
- generated sources can be used via normal require statement
- thus theres no warmup needed anymore
- thus the memory usage drops heavily
- split the IPs and Countries in two separate arrays,
- thus the memory usage drops heavily
- using more compact version of binary search
- fixed all tests with lookup errors (see unit tests)
In general, I've measured ca. 10% performance increase during lookup and a drop from ca. 50MB to ca. 30MB heap usage.
Please, check out this branch 'code_generation' and tell me what you think.
Tanks in advance Martin
Hello Martin! I have just realized, that geoip return incorrect results sometimes, so I will try your fork instead. Thank you for sharing! Also, latest ip database from MaxMind has different size: http://dev.maxmind.com/geoip/geolite
Looks like your version returns incorrect results as well. For example, check against ip: 63.155.159.123, should be US, but return Japan.
Jup, it seems there are still wrong lookups. Thanks for pointing this out. I've added this 63.155.159.123 to the unit tests (lookuoTest.js) and try to fix it.
Looks like it gives wrong results all the time. :)
Some examples: "63.150.141.128","63.156.193.255","1066831232","1067237887","US","United States" "63.156.194.0","63.156.195.255","1067237888","1067238399","JP","Japan" 63.155.159.123 returns Japan.
"93.32.0.0","93.71.255.255","1562378240","1564999679","IT","Italy" "93.72.0.0","93.79.255.255","1564999680","1565523967","UA","Ukraine" 93.79.113.99 returns Italy.
"138.187.0.0","138.191.255.255","2327511040","2327838719","CH","Switzerland" "138.192.0.0","138.193.255.255","2327838720","2327969791","US","United States" 138.188.103.143 return US.
"213.87.0.0","213.88.127.255","3579248640","3579346943","RU","Russian Federation" "213.88.128.0","213.88.184.255","3579346944","3579361535","SE","Sweden" 213.87.133.200 returns Sweden.
Please, verify my last commit. I figured out, what was wrong. The binary search algorithm, I've implemented before assumed, that the target_ip exists within all records. But thats simply not the case. Thus I've switched to use another binary search approach ('Deferred detection of equality'). This way, the midPoint can be exactly the index, we're search for OR it can be index-1 OR it can be index+1. So I've added this two additional checks. Maybe they can be simplified ;-) The good news: :100: % tests are green
I'm curious, what do you think about this approach "using code generation" in general?
I didn't dig too much into the code, but your variant is working 50% faster for me. Generation of arrays is not a problem and should be done only once. Just need to check the "ip - country" accuracy, but seems fine for now. Thank you!
Uncovered two more bugs: The last and second last IP-range are not correctly detected. I'm working on a fix.
huh, I hope not too many of my players live in these IP-ranges. :)
Done. I've fixed the two bugs. In general, the new lookupTests contains 302 unique tests, which give me a better feeling now :-)
I would be glad, if you consider to accept this pull request.