pinot
pinot copied to clipboard
Add Support for IP Address Function
Label = feature
This PR adds a scalar function isSubnetOff(IP_PREFIX, IP_ADDRESS)
which checks if IP_ADDRESS
is in the subnet of IP_PREFIX
(this function handles both IPv4 and IPv6).
Both IP_PREFIX
and IP_ADDRESS
are STRING
type.
Use Cases
SELECT isSubnetOf('1.2.3.128/26', '1.2.3.129') FROM FOO
--> returns true
SELECT isSubnetOf('64:fa9b::17/64', '64:ffff::17') FROM FOO
--> returns false
Context - One of our observability users wanted subnet-contains and other similar functionality for IP addresses stored as STRINGs. To be used in SQL WHERE clause mostly
Can we add some e2e query execution tests where we use this in query execution (WHERE clause) similar to how you added other transform functions in the past ?
Let's add some tests with meaningful IP addresses and then test out the functionality for both IPv4 and IPv6.
Codecov Report
Merging #9501 (a4093cb) into master (9f5aa89) will increase coverage by
39.29%
. The diff coverage is42.85%
.
@@ Coverage Diff @@
## master #9501 +/- ##
=============================================
+ Coverage 24.62% 63.91% +39.29%
- Complexity 53 4910 +4857
=============================================
Files 1935 1893 -42
Lines 103815 101871 -1944
Branches 15758 15529 -229
=============================================
+ Hits 25560 65107 +39547
+ Misses 75635 31997 -43638
- Partials 2620 4767 +2147
Flag | Coverage Δ | |
---|---|---|
integration2 | ? |
|
unittests1 | 67.40% <42.85%> (?) |
|
unittests2 | 15.76% <0.00%> (?) |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
...not/common/function/scalar/IpAddressFunctions.java | 42.85% <42.85%> (ø) |
|
...va/org/apache/pinot/core/routing/RoutingTable.java | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
...va/org/apache/pinot/common/config/NettyConfig.java | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
...a/org/apache/pinot/common/metrics/MinionMeter.java | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
...g/apache/pinot/common/metrics/ControllerMeter.java | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
.../apache/pinot/common/metrics/BrokerQueryPhase.java | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
.../apache/pinot/common/metrics/MinionQueryPhase.java | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
...ache/pinot/server/access/AccessControlFactory.java | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
...he/pinot/common/messages/SegmentReloadMessage.java | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
...he/pinot/common/messages/TableDeletionMessage.java | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
... and 1578 more |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
We've been using https://github.com/seancfoley/IPAddress for exactly this type of address parsing. It's very well tested, and seems robust. Wondering why you wouldn't want to use it for this functionality (and eventually other IP address functions supported by Presto, such as ip_prefix
).
Nit - PR description says the function name is isSubnetOf
, but code seems to be using is_subnet_of
.
One more comment - I think Presto handles either VARCHAR or IPADDRESS for the ip address. For one of our use cases, we've got IPv4 stored as longs in the Pinot table, which would be more efficient to query as-is versus using a custom function to convert back to a string. We could extend this support in the future, wondering if there would be a reason we shouldn't plan on being able to do that.
@walterddr - please take a look as well
Based on @kkrugler 's suggestion to explore library, I had asked @SabrinaZhaozyf to do some investigation along the following lines.
-
Do we think library will offer comprehensive IP processing functionality besides
subnet_contains
(which is the current focus of your PR but we want to add more more in future) ? -
Suggest taking a look at IP related functions supported by Kusto / Presto / Snowflake / Postgres to get an idea of IP processing landscape
-
Do we think our impl can be simplified by using the library and be less error prone ?
-
Probably not a good idea to use the library if it only supports specific functionality.
-
Some systems support IP address as a data type (NETWORK_ADDRESS IIRC ). We don't want to go there (yet) but good to check once if using the library will force us down a particular approach
@SabrinaZhaozyf , you may want to share any of your findings here for reference / discussion.
@Jackie-Jiang @walterddr @xiangfu0 - both approaches are implemented here. Can we get consensus on how we move forward ? One thing is that we will have to overload this for different types. Have you seen a similar requirement ?
Based on @kkrugler 's suggestion to explore library, I had asked @SabrinaZhaozyf to do some investigation along the following lines.
- Do we think library will offer comprehensive IP processing functionality besides
subnet_contains
(which is the current focus of your PR but we want to add more more in future) ?- Suggest taking a look at IP related functions supported by Kusto / Presto / Snowflake / Postgres to get an idea of IP processing landscape
- Do we think our impl can be simplified by using the library and be less error prone ?
- Probably not a good idea to use the library if it only supports specific functionality.
- Some systems support IP address as a data type (NETWORK_ADDRESS IIRC ). We don't want to go there (yet) but good to check once if using the library will force us down a particular approach
@SabrinaZhaozyf , you may want to share any of your findings here for reference / discussion.
@Jackie-Jiang @walterddr @xiangfu0 - both approaches are implemented here. Can we get consensus on how we move forward ? One thing is that we will have to overload this for different types. Have you seen a similar requirement ?
I have summarized my research on IP processing functions in other systems and the library in this doc: https://docs.google.com/document/d/1SS04jQCojcvCrrIGJX_aHocUEXXy8rpX0uUhwAren4k/edit?usp=sharing
Please feel free to provide feedback / comments and let's get consensus on how we want to proceed from here.
I was about to suggest the same as Rong. Given that we are going toward s postgres with joins, let's emulate the functions as well.
https://www.postgresql.org/docs/current/functions-net.html