arrow
arrow copied to clipboard
ARROW-17289: [C++] Add type category membership checks
See https://issues.apache.org/jira/browse/ARROW-17289
https://issues.apache.org/jira/browse/ARROW-17289
cc @icexelloss, @westonpace
It sounds like it would be better to document the existing predicates, rather than add new ones with different names.
It sounds like it would be better to document the existing predicates, rather than add new ones with different names.
The existing set of predicates is both confusing and incomplete, so I'd argue for both documenting and adding predicates. I'm not sure what to do with the names, considering backwards compatibility, though. Suppose we agree that a predicate corresponding to PrimitiveTypes should be added, since is_primitive is not the one, what should be the name of this predicate? There should be some naming convention here; perhaps *Types() should have a corresponding check_*(type) predicate.
I added docs that clarify the meaning of and correspondence between *Types() and Is*Type() functions.
The existing set of predicates is both confusing and incomplete, so I'd argue for both documenting and adding predicates
Hmm... can you point out the causes for confusion? Perhaps there's a way to improve that.
Same for incompleteness: is the set of Type::type predicates incomplete? if so, we should add the missing and desired predicates.
Suppose we agree that a predicate corresponding to PrimitiveTypes should be added, since is_primitive is not the one
Why is is_primitive not the one? It seems like it is returning the desired information, is it not? (is it missing some types perhaps?)
Hmm... can you point out the causes for confusion? Perhaps there's a way to improve that.
Same for incompleteness: is the set of
Type::typepredicates incomplete? if so, we should add the missing and desired predicates.
Sure. Check out this commit:
is_numericis missing and I added it.is_primitiveexists but does not correspond toPrimitiveTypes(), which is confusing. I addedis_primitive_likewhich does, though I'm not sold on the name.- I added the missing
is_binary,is_string,is_temporal, andis_interval. - There are other
is_*()primitives that do not correspond to any*Types()function -is_base_binary_likeis one example.
So, part of the confusion is the unexpected non-correspondence of is_primitive and part is due to missing and extra predicates with respect to the set of *Types() functions. In this PR, I extended the set of is_*() predicates to cover the correspondences and added Is*Type() predicates that correspond to *Types() functions as expected, while maintaining backwards-compatibility, and I added docs to clarify the meaning and correspondences.
Why is
is_primitivenot the one? It seems like it is returning the desired information, is it not? (is it missing some types perhaps?)
As compared to PrimitiveTypes() (and is_primitive_like() that I added), is_primitive is missing NA, BINARY, STRING, LARGE_BINARY and LARGE_STRING while it adds DURATION, INTERVAL_MONTHS, INTERVAL_MONTH_DAY_NANO, and INTERVAL_DAY_TIME.
Ok, so there are two different areas here:
- complete and/or fix the set of predicates operating on a
Type::typeparameter - add a separate set of predicates operating on a
const DataType¶meter
I think the former (case number 1 above) is the most interesting to solve, because it directly affects readability.
Would you like to defer number 2 to a separate JIRA (and potentially PR, if people agree it's a worthwhile thing to do)?
Also, for the record, functions such as PrimitiveTypes() were added for testing originally, and they were exposed publicly because there didn't seem to be a strong reason against it (the main use case is to be able to write testing loops or parameterized tests e.g. for (const auto& type : PrimitiveTypes()) ...).
Now to answer the problems with the current predicates:
- +1 to adding
is_numeric,is_binary,is_string,is_temporalandis_interval - for
is_primitive: is "primitive" any type that's not nested nested/compound; whether orType::NAshould be included is a bit contentious, but otherwise all temporals, numerics (including decimals) and binary-likes should probably be included - I think
is_primitive_likeshould probably be avoided; granted, we have otheris_*_likefunctions, but they turn out to be confusing, so we should probably not reuse that pattern - therefore I would be in favor of changing
is_primitiveas outlined above (meaning a slight compatibility break unfortunately)
Also note that the is_* predicates can handle parameterized types while the *Types() functions by essence cannot really handle them fully if there is a large or infinite number of parametrizations possible.
Does that make sense?
Also @lidavidm what do you think?
Note #13753 removes the *Types() anyways.
I'm -1 on the PR as is, but I think we should add the missing is_*(Type::type id) functions, and I think it's fine to also add overloads that take const DataType&. (This is probably overkill, but those who rely on IDE autocompletion may also appreciate DataType::is_integer etc.)
"primitive" is hard to define and it depends on what the user is after. I might favor a set of more specific predicates like the ones we have for template metaprogramming, e.g. has_c_type (which I think is roughly what the proposed is_primitive is: NA and nested types don't have a C type, but binary-likes do).
OK, so I'll avoid changes around the "primitive" functions and fix this PR to only:
- add the missing
is_numeric,is_binary,is_string,is_temporalandis_intervalpredicates - add overloads (same names) for all
is_*predicates that acceptconst DataType - test these predicates
Anyone knows what the errors in the Dev jobs mean?
I think you can ignore those, some files need to be bumped post-release
@pitrou, is this good to go?
Sorry for the delay @rtpsw . I rebased from master and did a couple of very minor changes.
Will wait for CI now...
Benchmark runs are scheduled for baseline = cef68940c68ac2f3167cc6cafe5eefdd9f7fab79 and contender = f0688d01c465417e6f3515f9344154ad6f47ba22. f0688d01c465417e6f3515f9344154ad6f47ba22 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished :arrow_down:25.0% :arrow_up:0.0%] ec2-t3-xlarge-us-east-2
[Failed :arrow_down:1.46% :arrow_up:0.44%] test-mac-arm
[Failed] ursa-i9-9960x
[Failed] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] f0688d01 ec2-t3-xlarge-us-east-2
[Failed] f0688d01 test-mac-arm
[Failed] f0688d01 ursa-i9-9960x
[Finished] f0688d01 ursa-thinkcentre-m75q
[Finished] cef68940 ec2-t3-xlarge-us-east-2
[Finished] cef68940 test-mac-arm
[Failed] cef68940 ursa-i9-9960x
[Failed] cef68940 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java