New lint `clippy::allow_attributes`
Is your feature request related to a problem or challenge?
Lint reference: https://rust-lang.github.io/rust-clippy/master/index.html?search=allow_att#allow_attributes
Checks for usage of the
#[allow]attribute and suggests replacing it with the#[expect]attribute (See RFC 2383)
#[expect]attributes suppress the lint emission, but emit a warning, if the expectation is unfulfilled. This can be useful to be notified when the lint is no longer triggered.
We have a quite a few occurrences of #[allow] in our codebase, but should prefer #[expect] instead for the reasons above.
Describe the solution you'd like
Similar to #18503 we should roll this lint our gradually through crates before finally enabling at the workspace level so we don't have one big PR.
Describe alternatives you've considered
Don't enable this lint if we don't see value or if the false positive rate is too high (see below).
Additional context
I tested it for some of our current #[allow]s and it has some false positives, especially for dead_code and deprecated.
Example 1
https://github.com/apache/datafusion/blob/c8d26ba012471e6aece9430642d6a8a923bc344c/datafusion/core/benches/data_utils/mod.rs#L37-L40
- If I switch this to
#[expect(dead_code)]then clippy will flag it asthis lint expectation is unfulfilled. But if I remove the lint then the dead code warning pops up anyway 🤔
Example 2
https://github.com/apache/datafusion/blob/c8d26ba012471e6aece9430642d6a8a923bc344c/datafusion/execution/src/disk_manager.rs#L117-L136
- Switching this to
#[expect(deprecated)]causes similar to above, but also flagsNewOSvariant as using deprecated variant. Something to do with howDefaultis derived?
(I say false positive but it seems more like a bug in clippy 🤔)
Maybe there are solutions for the above, I haven't looked into it much yet.
Another thing to keep in mind is how features interact, for example here:
https://github.com/apache/datafusion/blob/c8d26ba012471e6aece9430642d6a8a923bc344c/datafusion/datasource-parquet/src/reader.rs#L290-L304
- This
#[allow]is for whenparquet_encryptionfeature is not enabled, but if we change it to#[expect]blindly then I believe it'll cause issues whenparquet_encryptionis enabled, so we need to be careful
Tasks
- [x] datafusion https://github.com/apache/datafusion/pull/19133
- [x] datafusion-catalog https://github.com/apache/datafusion/pull/18973
- [x] datafusion-catalog-listing https://github.com/apache/datafusion/pull/19133
- [x] datafusion-common https://github.com/apache/datafusion/pull/18988
- [x] datafusion-common-runtime https://github.com/apache/datafusion/pull/18988
- [x] datafusion-datasource https://github.com/apache/datafusion/pull/19068
- [x] datafusion-datasource-arrow https://github.com/apache/datafusion/pull/19068
- [x] datafusion-datasource-avro https://github.com/apache/datafusion/pull/19068
- [x] datafusion-datasource-csv https://github.com/apache/datafusion/pull/19068
- [x] datafusion-datasource-json https://github.com/apache/datafusion/pull/19068
- [x] datafusion-datasource-parquet https://github.com/apache/datafusion/pull/19068
- [x] datafusion-doc https://github.com/apache/datafusion/pull/19133
- [x] datafusion-execution https://github.com/apache/datafusion/pull/19133
- [x] datafusion-expr https://github.com/apache/datafusion/pull/19133
- [x] datafusion-expr-common https://github.com/apache/datafusion/pull/19133
- [ ] datafusion-ffi
- [x] datafusion-functions https://github.com/apache/datafusion/pull/18986
- [x] datafusion-functions-aggregate https://github.com/apache/datafusion/pull/18986
- [x] datafusion-functions-aggregate-common https://github.com/apache/datafusion/pull/18986
- [x] datafusion-functions-nested https://github.com/apache/datafusion/pull/18986
- [x] datafusion-functions-table https://github.com/apache/datafusion/pull/18986
- [x] datafusion-functions-window https://github.com/apache/datafusion/pull/18986
- [x] datafusion-functions-window-common https://github.com/apache/datafusion/pull/18986
- [ ] datafusion-macros
- [ ] datafusion-optimizer
- [x] datafusion-physical-expr https://github.com/apache/datafusion/pull/18983
- [x] datafusion-physical-expr-adapter https://github.com/apache/datafusion/pull/19185
- [x] datafusion-physical-expr-common https://github.com/apache/datafusion/pull/19185
- [x] datafusion-physical-optimizer https://github.com/apache/datafusion/pull/19185
- [x] datafusion-physical-plan https://github.com/apache/datafusion/pull/18983
- [ ] datafusion-proto
- [x] datafusion-proto-common https://github.com/apache/datafusion/pull/19133
- [ ] datafusion-pruning
- [ ] datafusion-session
- [ ] datafusion-spark
- [ ] datafusion-sql
- [ ] datafusion-substrait
- [ ] Enable it globally after all the above tasks are finished
Note: we would welcome PRs that can do multiple crates in a single PR (so long as the diff isn't too large). Felt a bit overwhelmed by the volume and speed of PRs that came in for #18503 😅
Thoughts @2010YOUY01 ?
If there is a good way to deal with the mentioned feature flag issue, I think we should add this lint rule.
If there is a good way to deal with the mentioned feature flag issue, I think we should add this lint rule.
It would look something like this:
fn get_metadata<'a>(
&'a mut self,
#[cfg_attr(not(feature = "parquet_encryption"), expect(unused_variables))]
options: Option<&'a ArrowReaderOptions>,
) -> BoxFuture<'a, parquet::errors::Result<Arc<ParquetMetaData>>> {
Doable, but a little verbose.
If there is a good way to deal with the mentioned feature flag issue, I think we should add this lint rule.
It would look something like this:
fn get_metadata<'a>( &'a mut self, #[cfg_attr(not(feature = "parquet_encryption"), expect(unused_variables))] options: Option<&'a ArrowReaderOptions>, ) -> BoxFuture<'a, parquet::errors::Result<Arc<ParquetMetaData>>> {Doable, but a little verbose.
I think this marker describes the arg quite precisely: 'this arg is only used in parquet_encryption' feature.
So I'm +1 for this lint. Maybe we should add comment like the above explanation, to deal with the verbosity.
I've marked it as good first issue, but note to anyone interested, please do not assign this issue to yourself as it is meant to be a tracking issue. Just pick a crate (or crates) that is not yet done/in-progress and leave a comment with your intention to work on those crates (to avoid overlap).
Hello, if it's okay I am taking the physical-plan.
Hi! If no one else is working on datafusion-functions-nested, I can take that.
Hi all, I would like to take the following crates:
- [x]
datafusion-catalog - [x]
datafusion-listing - [x]
datafusion-common - [x]
datafusion-common-runtime - [x]
datafusion-datasource - [x]
datafusion-datasource-arrow - [x]
datafusion-datasource-avro - [x]
datafusion-datasource-csv - [x]
datafusion-datasource-json - [x]
datafusion-datasource-parquet - [x]
datafusion-doc - [x]
datafusion-execution - [x]
datafusion-expr - [x]
datafusion-expr-common - [ ]
datafusion-ffi
@Jefffrey, for datafusion-functions-nested it suffices to add #![deny(clippy::allow_attributes)] as there is no usage of #[allow]. The same applies to datafusion-functions-table. If it's ok, I'll create a PR to enforce clippy::allow_attributes for all datafusion-functions-* crates as it looks like usage of #[allow] is low and to avoid raising multiple 1-liner PRs.
@Jefffrey, for
datafusion-functions-nestedit suffices to add#![deny(clippy::allow_attributes)]as there is no usage of#[allow]. The same applies todatafusion-functions-table. If it's ok, I'll create a PR to enforceclippy::allow_attributesfor alldatafusion-functions-*crates as it looks like usage of#[allow]is low and to avoid raising multiple 1-liner PRs.
That sounds good, thanks
I would like to raise the fact that warn also does work for this case, so if there is specific need, you might also resort to it.
I would like to raise the fact that warn also does work for this case, so if there is specific need, you might also resort to it.
I think our CI would flag any warns in the clippy lints anyway 🤔
Hi team. If no one else is working on it, I'll take the following crates:
- [x] datafusion-physical-expr-adapter
- [x] datafusion-physical-expr-common
- [x] datafusion-physical-optimizer
Hi all. When trying to use #[expect] in https://github.com/apache/datafusion/blob/6746007826ebd3fcb5614bf87183674435bbb134/datafusion/physical-optimizer/src/aggregate_statistics.rs#L43-L98 I get the following error when running sh ci/scripts/rust_clippy.sh:
error: this lint expectation is unfulfilled
--> datafusion/physical-optimizer/src/aggregate_statistics.rs:44:14
|
44 | #[expect(clippy::only_used_in_recursion)]
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: `-D unfulfilled-lint-expectations` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(unfulfilled_lint_expectations)]`
error: could not compile `datafusion-physical-optimizer` (lib) due to 1 previous error
Took a look at https://github.com/apache/datafusion/pull/15635 and If I run cargo clippy -p datafusion-physical-optimizer -- -D warnings, I don't get any errors as lint expectation is fulfilled. But if I run cargo clippy cargo clippy -p datafusion-physical-optimizer -p datafusion -- -D warnings, I get the same error as above. Tried with other package combinations and seems like clippy::only_used_in_recursion is only unfulfilled when datafusion is included, thus the unfulfilled lint error when running sh ci/scripts/rust_clippy.sh.
Checked for references of optmize_plan in datafusion crate and it's used in:
- https://github.com/apache/datafusion/blob/6746007826ebd3fcb5614bf87183674435bbb134/datafusion/core/tests/physical_optimizer/aggregate_statistics.rs#L73-L74
- https://github.com/apache/datafusion/blob/6746007826ebd3fcb5614bf87183674435bbb134/datafusion/core/tests/physical_optimizer/aggregate_statistics.rs#L272-L273
- https://github.com/apache/datafusion/blob/6746007826ebd3fcb5614bf87183674435bbb134/datafusion/core/tests/physical_optimizer/aggregate_statistics.rs#L318-L319
Anyone has any insights as to why clippy::only_used_in_recursion is only unfulfilled when datafusion package is included?
Some things I've tried:
- Commented the
optmize_planreferences indatafusionto check if it had any effect on the lint fulfillment but lint error persists, - Removed
clippy::only_used_in_recursionbut then I get
when runningerror: parameter is only used in recursion --> datafusion/physical-optimizer/src/aggregate_statistics.rs:48:9 | 48 | context: &OptimizerContext, | ^^^^^^^ help: if this is intentional, prefix it with an underscore: `_context` | note: parameter used here --> datafusion/physical-optimizer/src/aggregate_statistics.rs:88:47 | 88 | self.optimize_plan(child, context).map(Transformed::yes) | ^^^^^^^ ... 94 | self.optimize_plan(child, context).map(Transformed::yes) | ^^^^^^^ = help: for further information visit https://rust-lang.github.io/rust-clippy/rust-1.91.0/index.html#only_used_in_recursion = note: `-D clippy::only-used-in-recursion` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::only_used_in_recursion)]` error: could not compile `datafusion-physical-optimizer` (lib) due to 1 previous errorcargo clippy -p datafusion-physical-optimizer -- -D warningsand no errors when runningsh ci/scripts/rust_clippy.sh.
Some solutions I'm thinking about:
- Try to implement
optimize_planwithout recursion, - Try to define a helper recursion function within
optimize_planso that it only takes onplanvariables and then prefixcontextwith an underscore. - Maybe there is a simple solution involving some clippy configuration that I'm unable to see right now.
Thanks for checking this @carlosahs
Sounds a bit complex, perhaps the easiest way forward is locally allow the allow, e.g.
impl PhysicalOptimizerRule for AggregateStatistics {
#[cfg_attr(feature = "recursive_protection", recursive::recursive)]
#[allow(clippy::allow_attributes)]
#[allow(clippy::only_used_in_recursion)] // See https://github.com/rust-lang/rust-clippy/issues/14566
fn optimize_plan(
&self,
plan: Arc<dyn ExecutionPlan>,
context: &OptimizerContext,
) -> Result<Arc<dyn ExecutionPlan>> {
Makes sense. For now, raised PR with local allow. Thanks @Jefffrey! Tried to do iterative approach but it's indeed a bit of complex work.
I'm tackling:
-
datafusion-spark
-
datafusion-sql
-
datafusion-substrait
@Jefffrey I opened a PR with changes to three of the crates, do take a look when you get the time.
@Jefffrey Opened PR for few other crates:
- datafusion-ffi
- datafusion-macros
- datafusion-optimizer
@Jefffrey I looked into the reamining three and I think they don't require the changes.
-
datafusion-proto: We should keep the parquetmax_statistics_sizelint as#[allow(deprecated)]because the deprecation isn’t emitted in all builds, and using expect causes unfulfilled-lint failures; thre are no other changes needed. -
datafusion-pruning: The two#[allow(dead_code)]in pruning predicates because they cover parquet-only code paths and would risk unfulfilled expectations when the feature is disabled. -
datafusion-session: no allow/expect present.
@Jefffrey I looked into the reamining three and I think they don't require the changes.
* `datafusion-proto`: We should keep the parquet `max_statistics_size` lint as `#[allow(deprecated)]` because the deprecation isn’t emitted in all builds, and using expect causes unfulfilled-lint failures; thre are no other changes needed. * `datafusion-pruning`: The two `#[allow(dead_code)]` in pruning predicates because they cover parquet-only code paths and would risk unfulfilled expectations when the feature is disabled. * `datafusion-session`: no allow/expect present.
We should still apply the deny at their lib.rs level, and any exceptions we handle on a case by case basis, so I still expect PRs for these crates.
But it sounds like these can be fixed if we ensure the lint is expected only when certain features are enabled, see: https://github.com/apache/datafusion/issues/18881#issuecomment-3567846975
@kumarUjjawal, let's remember that the purpose of expect is to be identical to allow, but will cause a lint to be emitted when the code it decorates does not raise a lint warning. With this in mind I want to highlight the following:
datafusion-proto: We should remove the#[allow(deprecated)]lints asmax_statistics_sizeno longer exists (ref).expectis essentially helping us to identify that we don't need to add thedeprecatedattribute because there are no more deprecated fields forParquetOptions: https://github.com/apache/datafusion/blob/fedddbcae49937ecc76bd5f2bf6acd00e9d15067/datafusion/common/src/config.rs#L656-L840. Timeline ofmax_statistics_size:- https://github.com/apache/datafusion/pull/14188.
- https://github.com/apache/datafusion/pull/16690.
datafusion-pruning: For this one I think we need to first figure out ifthis is only used byparquetfeature right nowclaims are still valid.pruning_predicate.rswas created from logic initially living inlib.rsas part of this refactor: https://github.com/apache/datafusion/pull/16587. And that PR was from Jun 27. It's likely claim is no longer valid sinceexpectis failing with unfulfilled lint expectation errors, in which case I think we should remove the#![allow(dead_code)]datafusion-session: I think it's best to enable#![deny(clippy::allow_attributes)]for future use cases. Case fromdatafusion-protosuggests ifexpectwas used instead ofallow, thendeprecatedattributes would've been removed too whenmax_statistics_sizewas removed.
My suggestion when facing -D unfulfilled-lint-expectations errors is to first identify possible root causes, then track down the root cause, and finally document why an allow needs to be removed based on such findings.
Opened a PR for:
- datafusion-proto
- datafusion-pruning
- datafusion-session