GeoIP2-java icon indicating copy to clipboard operation
GeoIP2-java copied to clipboard

Iterating the database

Open leus opened this issue 8 years ago • 7 comments
trafficstars

As described in #68, we need to convert the data in the GeoIP2 database to a faster memory structure. In order to do this, I'm thinking of iterating the search tree in the same way the Perl library does it. We are adding the following to the Reader class:

    interface ReaderIterationCallback {
        void OnNode(Reader reader, int nodeNumber);

        void OnData(Reader reader, ByteBuffer buffer, int nodeNumber, long ipNumber, int depth) throws IOException;
    }

    void iterateSearchTree(ReaderIterationCallback callback)
            throws IOException {
        ByteBuffer buffer = this.getBufferHolder().get();
        iterateSearchTree(buffer, 0, 0, 1,
                this.metadata.getIpVersion() == 4 ? 32 : 128, callback);
    }

    void iterateSearchTree(ByteBuffer buffer, int nodeNumber, long ipNum, int depth, int maxDepth,
                           ReaderIterationCallback callback)
            throws IOException {
        callback.OnNode(this, nodeNumber);
        for (int i = 0; i < 2; i++) {
            int value = readNode(buffer, nodeNumber, i);

            // We ignore empty branches of the search tree
            if (value == this.metadata.getNodeCount()) {
                continue;
            }

            if (i == 1) {
                ipNum = ipNum | (1 << (maxDepth - depth));
            }

            if (value <= this.metadata.getNodeCount()) {
                iterateSearchTree(buffer, value, ipNum, depth + 1, maxDepth, callback);
            } else {
                callback.OnData(this, buffer, value, ipNum, depth);
            }
        }
    }

This appears to be working OK, but I would like to ask the following:

  1. Does this approach look "right"? (feel free to comment on anything that smells funny to you)
  2. What is the right way to get "ranges" of IPs that matches a single destination, once we got the full iteration?

leus avatar Aug 18 '17 14:08 leus

It looks good!

To get the IP network/range, you can take ipNum and use depth as the mask. You can see an example doing this (Perl again!) here. That function is the equivalent to an OnData() callback.

To read in the actual data record, I believe you can use the existing resolveDataPointer() method.

I'd be interested in including your addition to this library if you find it is useful to you.

horgh avatar Aug 18 '17 16:08 horgh

Great!

Given that I don't know much about IPv6, I wonder what kind of pitfalls this approach may have. Our database says "6" as version; what is the implicances of this? What special precautions should I implement to deal with IPv4 vs IPv6 here?

leus avatar Aug 18 '17 18:08 leus

The databases can contain both IPv4 and IPv6 addresses. As you have one with ip_version 6, that is the case for you.

The code you included accounts for this already (the getIpVersion() call), at least as far as iteration goes.

I believe the main thing you will need to do is to treat the first 0 to 2^32-1 IPs as IPv4 (ipNum) in the callback you define. The rest you should treat as IPv6.

Other than that, you may want to do something special for the two pointers to the root of the IPv4 space at ::ffff:0:0/96 and 2002::/16. Although you don't necessarily have to do anything.

Also, depending on how you're using the mask, you may want to scale it down for the IPv4 addresses.

There is a section in the spec with a little more information on this too.

horgh avatar Aug 18 '17 21:08 horgh

Any chance of getting this patched in?

re-thc avatar May 29 '18 11:05 re-thc

any update on this?

indraneelb1903 avatar Jul 15 '19 18:07 indraneelb1903

Any plans on merging this soon?

dfmario avatar Oct 09 '19 00:10 dfmario

Please, merge.

vasily-kirichenko avatar Mar 12 '20 08:03 vasily-kirichenko

As of maxmind-db 3.1.0, you can now use that to iterate the database. Given that it is a lower level operation, I don't think we will expose it through geoip2, but you can decode to the model classes from geoip2.

See this example.

oschwald avatar Jan 11 '24 15:01 oschwald