parquet-java icon indicating copy to clipboard operation
parquet-java copied to clipboard

Add IP address logical type

Open asfimport opened this issue 7 years ago • 4 comments

IP addresses can be much more optimally represented as a 64 bit integer, meaning that it's much more efficient for storage and allowing consumers to do equality or subnet (range) comparisons using long-integer arithmetic.

Reporter: Tristan Stevens

Note: This issue was originally created as PARQUET-1363. Please see the migration documentation for further details.

asfimport avatar Jul 30 '18 15:07 asfimport

Uwe Korn / @xhochy: [~tmgstev] You would probably need two types: IPv4 and IPv6

asfimport avatar Jul 30 '18 15:07 asfimport

Tristan Stevens: I guess you've got two options - you could have two types, or you could use the ipv6 representation of ipv4 addresses (i.e. zero padded). My first thought was for one type, but the same argument goes for having both 32 bit integers and 64 bit integers and arguing that the first 2^32 can be represented by zero padding the leading 32 bits!

asfimport avatar Jul 30 '18 16:07 asfimport

In our project, we are rethinking our own storage of IP addresses in Parquet, looking for a representation that might be more upstreamable.

Our current implementation maximizes compatibility with Postgres, and simply stores a postgres struct (basically https://doxygen.postgresql.org/structinet__struct.html ) in a Parquet BYTE_ARRAY. We're looking at a representation that ideally preserves those characteristics:

  • unified type for IPv4 and IPv6: application code is much simpler when you don't need different types for these.
  • subnet mask length: really handy for those address-in-subnet checks.
  • address family flag: while one could pack IPv4 addresses as zero-padded IPv6 addresses, it would be certainly nice to avoid the shim code to adapt the values to/from application code.

We could just stick with what we have, but feel like we should be able to do better than just dumping in a Postgres struct.

Anyway, we wanted to see if anyone else had strong opinions, and this Github issue seems to be where it's at.

daniel-awake avatar Sep 30 '25 22:09 daniel-awake

Closing in favor of https://github.com/apache/parquet-format/issues/521. Parquet-format is where format level (i.e. types) should be addressed.

emkornfield avatar Dec 05 '25 23:12 emkornfield