datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

New lint `clippy::allow_attributes`

Open Jefffrey opened this issue 1 month ago • 16 comments

Is your feature request related to a problem or challenge?

Lint reference: https://rust-lang.github.io/rust-clippy/master/index.html?search=allow_att#allow_attributes

Checks for usage of the #[allow] attribute and suggests replacing it with the #[expect] attribute (See RFC 2383)

#[expect] attributes suppress the lint emission, but emit a warning, if the expectation is unfulfilled. This can be useful to be notified when the lint is no longer triggered.

We have a quite a few occurrences of #[allow] in our codebase, but should prefer #[expect] instead for the reasons above.

Describe the solution you'd like

Similar to #18503 we should roll this lint our gradually through crates before finally enabling at the workspace level so we don't have one big PR.

Describe alternatives you've considered

Don't enable this lint if we don't see value or if the false positive rate is too high (see below).

Additional context

I tested it for some of our current #[allow]s and it has some false positives, especially for dead_code and deprecated.

Example 1

https://github.com/apache/datafusion/blob/c8d26ba012471e6aece9430642d6a8a923bc344c/datafusion/core/benches/data_utils/mod.rs#L37-L40

  • If I switch this to #[expect(dead_code)] then clippy will flag it as this lint expectation is unfulfilled. But if I remove the lint then the dead code warning pops up anyway 🤔

Example 2

https://github.com/apache/datafusion/blob/c8d26ba012471e6aece9430642d6a8a923bc344c/datafusion/execution/src/disk_manager.rs#L117-L136

  • Switching this to #[expect(deprecated)] causes similar to above, but also flags NewOS variant as using deprecated variant. Something to do with how Default is derived?

(I say false positive but it seems more like a bug in clippy 🤔)

Maybe there are solutions for the above, I haven't looked into it much yet.

Another thing to keep in mind is how features interact, for example here:

https://github.com/apache/datafusion/blob/c8d26ba012471e6aece9430642d6a8a923bc344c/datafusion/datasource-parquet/src/reader.rs#L290-L304

  • This #[allow] is for when parquet_encryption feature is not enabled, but if we change it to #[expect] blindly then I believe it'll cause issues when parquet_encryption is enabled, so we need to be careful

Tasks

  • [x] datafusion https://github.com/apache/datafusion/pull/19133
  • [x] datafusion-catalog https://github.com/apache/datafusion/pull/18973
  • [x] datafusion-catalog-listing https://github.com/apache/datafusion/pull/19133
  • [x] datafusion-common https://github.com/apache/datafusion/pull/18988
  • [x] datafusion-common-runtime https://github.com/apache/datafusion/pull/18988
  • [x] datafusion-datasource https://github.com/apache/datafusion/pull/19068
  • [x] datafusion-datasource-arrow https://github.com/apache/datafusion/pull/19068
  • [x] datafusion-datasource-avro https://github.com/apache/datafusion/pull/19068
  • [x] datafusion-datasource-csv https://github.com/apache/datafusion/pull/19068
  • [x] datafusion-datasource-json https://github.com/apache/datafusion/pull/19068
  • [x] datafusion-datasource-parquet https://github.com/apache/datafusion/pull/19068
  • [x] datafusion-doc https://github.com/apache/datafusion/pull/19133
  • [x] datafusion-execution https://github.com/apache/datafusion/pull/19133
  • [x] datafusion-expr https://github.com/apache/datafusion/pull/19133
  • [x] datafusion-expr-common https://github.com/apache/datafusion/pull/19133
  • [ ] datafusion-ffi
  • [x] datafusion-functions https://github.com/apache/datafusion/pull/18986
  • [x] datafusion-functions-aggregate https://github.com/apache/datafusion/pull/18986
  • [x] datafusion-functions-aggregate-common https://github.com/apache/datafusion/pull/18986
  • [x] datafusion-functions-nested https://github.com/apache/datafusion/pull/18986
  • [x] datafusion-functions-table https://github.com/apache/datafusion/pull/18986
  • [x] datafusion-functions-window https://github.com/apache/datafusion/pull/18986
  • [x] datafusion-functions-window-common https://github.com/apache/datafusion/pull/18986
  • [ ] datafusion-macros
  • [ ] datafusion-optimizer
  • [x] datafusion-physical-expr https://github.com/apache/datafusion/pull/18983
  • [x] datafusion-physical-expr-adapter https://github.com/apache/datafusion/pull/19185
  • [x] datafusion-physical-expr-common https://github.com/apache/datafusion/pull/19185
  • [x] datafusion-physical-optimizer https://github.com/apache/datafusion/pull/19185
  • [x] datafusion-physical-plan https://github.com/apache/datafusion/pull/18983
  • [ ] datafusion-proto
  • [x] datafusion-proto-common https://github.com/apache/datafusion/pull/19133
  • [ ] datafusion-pruning
  • [ ] datafusion-session
  • [ ] datafusion-spark
  • [ ] datafusion-sql
  • [ ] datafusion-substrait
  • [ ] Enable it globally after all the above tasks are finished

Note: we would welcome PRs that can do multiple crates in a single PR (so long as the diff isn't too large). Felt a bit overwhelmed by the volume and speed of PRs that came in for #18503 😅

Jefffrey avatar Nov 22 '25 10:11 Jefffrey

Thoughts @2010YOUY01 ?

Jefffrey avatar Nov 22 '25 10:11 Jefffrey

If there is a good way to deal with the mentioned feature flag issue, I think we should add this lint rule.

2010YOUY01 avatar Nov 23 '25 08:11 2010YOUY01

If there is a good way to deal with the mentioned feature flag issue, I think we should add this lint rule.

It would look something like this:

    fn get_metadata<'a>(
        &'a mut self,
        #[cfg_attr(not(feature = "parquet_encryption"), expect(unused_variables))]
        options: Option<&'a ArrowReaderOptions>,
    ) -> BoxFuture<'a, parquet::errors::Result<Arc<ParquetMetaData>>> {

Doable, but a little verbose.

Jefffrey avatar Nov 23 '25 11:11 Jefffrey

If there is a good way to deal with the mentioned feature flag issue, I think we should add this lint rule.

It would look something like this:

fn get_metadata<'a>(
    &'a mut self,
    #[cfg_attr(not(feature = "parquet_encryption"), expect(unused_variables))]
    options: Option<&'a ArrowReaderOptions>,
) -> BoxFuture<'a, parquet::errors::Result<Arc<ParquetMetaData>>> {

Doable, but a little verbose.

I think this marker describes the arg quite precisely: 'this arg is only used in parquet_encryption' feature.

So I'm +1 for this lint. Maybe we should add comment like the above explanation, to deal with the verbosity.

2010YOUY01 avatar Nov 24 '25 02:11 2010YOUY01

I've marked it as good first issue, but note to anyone interested, please do not assign this issue to yourself as it is meant to be a tracking issue. Just pick a crate (or crates) that is not yet done/in-progress and leave a comment with your intention to work on those crates (to avoid overlap).

Jefffrey avatar Nov 24 '25 12:11 Jefffrey

Hello, if it's okay I am taking the physical-plan.

YuraLitvinov avatar Nov 24 '25 19:11 YuraLitvinov

Hi! If no one else is working on datafusion-functions-nested, I can take that.

carlosahs avatar Nov 27 '25 03:11 carlosahs

Hi all, I would like to take the following crates:

  • [x] datafusion-catalog
  • [x] datafusion-listing
  • [x] datafusion-common
  • [x] datafusion-common-runtime
  • [x] datafusion-datasource
  • [x] datafusion-datasource-arrow
  • [x] datafusion-datasource-avro
  • [x] datafusion-datasource-csv
  • [x] datafusion-datasource-json
  • [x] datafusion-datasource-parquet
  • [x] datafusion-doc
  • [x] datafusion-execution
  • [x] datafusion-expr
  • [x] datafusion-expr-common
  • [ ] datafusion-ffi

chakkk309 avatar Nov 27 '25 14:11 chakkk309

@Jefffrey, for datafusion-functions-nested it suffices to add #![deny(clippy::allow_attributes)] as there is no usage of #[allow]. The same applies to datafusion-functions-table. If it's ok, I'll create a PR to enforce clippy::allow_attributes for all datafusion-functions-* crates as it looks like usage of #[allow] is low and to avoid raising multiple 1-liner PRs.

carlosahs avatar Nov 27 '25 21:11 carlosahs

@Jefffrey, for datafusion-functions-nested it suffices to add #![deny(clippy::allow_attributes)] as there is no usage of #[allow]. The same applies to datafusion-functions-table. If it's ok, I'll create a PR to enforce clippy::allow_attributes for all datafusion-functions-* crates as it looks like usage of #[allow] is low and to avoid raising multiple 1-liner PRs.

That sounds good, thanks

Jefffrey avatar Nov 28 '25 00:11 Jefffrey

I would like to raise the fact that warn also does work for this case, so if there is specific need, you might also resort to it.

YuraLitvinov avatar Nov 28 '25 14:11 YuraLitvinov

I would like to raise the fact that warn also does work for this case, so if there is specific need, you might also resort to it.

I think our CI would flag any warns in the clippy lints anyway 🤔

Jefffrey avatar Nov 29 '25 01:11 Jefffrey

Hi team. If no one else is working on it, I'll take the following crates:

  • [x] datafusion-physical-expr-adapter
  • [x] datafusion-physical-expr-common
  • [x] datafusion-physical-optimizer

carlosahs avatar Dec 03 '25 13:12 carlosahs

Hi all. When trying to use #[expect] in https://github.com/apache/datafusion/blob/6746007826ebd3fcb5614bf87183674435bbb134/datafusion/physical-optimizer/src/aggregate_statistics.rs#L43-L98 I get the following error when running sh ci/scripts/rust_clippy.sh:

error: this lint expectation is unfulfilled
  --> datafusion/physical-optimizer/src/aggregate_statistics.rs:44:14
   |
44 |     #[expect(clippy::only_used_in_recursion)]
   |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: `-D unfulfilled-lint-expectations` implied by `-D warnings`
   = help: to override `-D warnings` add `#[allow(unfulfilled_lint_expectations)]`

error: could not compile `datafusion-physical-optimizer` (lib) due to 1 previous error

Took a look at https://github.com/apache/datafusion/pull/15635 and If I run cargo clippy -p datafusion-physical-optimizer -- -D warnings, I don't get any errors as lint expectation is fulfilled. But if I run cargo clippy cargo clippy -p datafusion-physical-optimizer -p datafusion -- -D warnings, I get the same error as above. Tried with other package combinations and seems like clippy::only_used_in_recursion is only unfulfilled when datafusion is included, thus the unfulfilled lint error when running sh ci/scripts/rust_clippy.sh.

Checked for references of optmize_plan in datafusion crate and it's used in:

  • https://github.com/apache/datafusion/blob/6746007826ebd3fcb5614bf87183674435bbb134/datafusion/core/tests/physical_optimizer/aggregate_statistics.rs#L73-L74
  • https://github.com/apache/datafusion/blob/6746007826ebd3fcb5614bf87183674435bbb134/datafusion/core/tests/physical_optimizer/aggregate_statistics.rs#L272-L273
  • https://github.com/apache/datafusion/blob/6746007826ebd3fcb5614bf87183674435bbb134/datafusion/core/tests/physical_optimizer/aggregate_statistics.rs#L318-L319

Anyone has any insights as to why clippy::only_used_in_recursion is only unfulfilled when datafusion package is included?

Some things I've tried:

  • Commented the optmize_plan references in datafusion to check if it had any effect on the lint fulfillment but lint error persists,
  • Removed clippy::only_used_in_recursion but then I get
    error: parameter is only used in recursion
      --> datafusion/physical-optimizer/src/aggregate_statistics.rs:48:9
       |
    48 |         context: &OptimizerContext,
       |         ^^^^^^^ help: if this is intentional, prefix it with an underscore: `_context`
       |
    note: parameter used here
      --> datafusion/physical-optimizer/src/aggregate_statistics.rs:88:47
       |
    88 |                     self.optimize_plan(child, context).map(Transformed::yes)
       |                                               ^^^^^^^
    ...
    94 |                 self.optimize_plan(child, context).map(Transformed::yes)
       |                                           ^^^^^^^
       = help: for further information visit https://rust-lang.github.io/rust-clippy/rust-1.91.0/index.html#only_used_in_recursion
       = note: `-D clippy::only-used-in-recursion` implied by `-D warnings`
       = help: to override `-D warnings` add `#[allow(clippy::only_used_in_recursion)]`
    
    error: could not compile `datafusion-physical-optimizer` (lib) due to 1 previous error
    
    when running cargo clippy -p datafusion-physical-optimizer -- -D warnings and no errors when running sh ci/scripts/rust_clippy.sh.

Some solutions I'm thinking about:

  • Try to implement optimize_plan without recursion,
  • Try to define a helper recursion function within optimize_plan so that it only takes on plan variables and then prefix context with an underscore.
  • Maybe there is a simple solution involving some clippy configuration that I'm unable to see right now.

carlosahs avatar Dec 06 '25 18:12 carlosahs

Thanks for checking this @carlosahs

Sounds a bit complex, perhaps the easiest way forward is locally allow the allow, e.g.

impl PhysicalOptimizerRule for AggregateStatistics {
    #[cfg_attr(feature = "recursive_protection", recursive::recursive)]
    #[allow(clippy::allow_attributes)]
    #[allow(clippy::only_used_in_recursion)] // See https://github.com/rust-lang/rust-clippy/issues/14566
    fn optimize_plan(
        &self,
        plan: Arc<dyn ExecutionPlan>,
        context: &OptimizerContext,
    ) -> Result<Arc<dyn ExecutionPlan>> {

Jefffrey avatar Dec 07 '25 03:12 Jefffrey

Makes sense. For now, raised PR with local allow. Thanks @Jefffrey! Tried to do iterative approach but it's indeed a bit of complex work.

carlosahs avatar Dec 07 '25 18:12 carlosahs

I'm tackling:

  • datafusion-spark

  • datafusion-sql

  • datafusion-substrait

kumarUjjawal avatar Dec 13 '25 12:12 kumarUjjawal

@Jefffrey I opened a PR with changes to three of the crates, do take a look when you get the time.

kumarUjjawal avatar Dec 13 '25 13:12 kumarUjjawal

@Jefffrey Opened PR for few other crates:

  • datafusion-ffi
  • datafusion-macros
  • datafusion-optimizer

kumarUjjawal avatar Dec 13 '25 18:12 kumarUjjawal

@Jefffrey I looked into the reamining three and I think they don't require the changes.

  • datafusion-proto: We should keep the parquet max_statistics_size lint as #[allow(deprecated)] because the deprecation isn’t emitted in all builds, and using expect causes unfulfilled-lint failures; thre are no other changes needed.

  • datafusion-pruning: The two #[allow(dead_code)] in pruning predicates because they cover parquet-only code paths and would risk unfulfilled expectations when the feature is disabled.

  • datafusion-session: no allow/expect present.

kumarUjjawal avatar Dec 13 '25 19:12 kumarUjjawal

@Jefffrey I looked into the reamining three and I think they don't require the changes.

* `datafusion-proto`: We should keep the parquet `max_statistics_size` lint as `#[allow(deprecated)]` because the deprecation isn’t emitted in all builds, and using expect causes unfulfilled-lint failures; thre are no other changes needed.

* `datafusion-pruning`: The two `#[allow(dead_code)]` in pruning predicates because they cover parquet-only code paths and would risk unfulfilled expectations when the feature is disabled.

* `datafusion-session`: no allow/expect present.

We should still apply the deny at their lib.rs level, and any exceptions we handle on a case by case basis, so I still expect PRs for these crates.

But it sounds like these can be fixed if we ensure the lint is expected only when certain features are enabled, see: https://github.com/apache/datafusion/issues/18881#issuecomment-3567846975

Jefffrey avatar Dec 14 '25 03:12 Jefffrey

@kumarUjjawal, let's remember that the purpose of expect is to be identical to allow, but will cause a lint to be emitted when the code it decorates does not raise a lint warning. With this in mind I want to highlight the following:

  • datafusion-proto: We should remove the #[allow(deprecated)] lints as max_statistics_size no longer exists (ref). expect is essentially helping us to identify that we don't need to add the deprecated attribute because there are no more deprecated fields for ParquetOptions: https://github.com/apache/datafusion/blob/fedddbcae49937ecc76bd5f2bf6acd00e9d15067/datafusion/common/src/config.rs#L656-L840. Timeline of max_statistics_size:
    1. https://github.com/apache/datafusion/pull/14188.
    2. https://github.com/apache/datafusion/pull/16690.
  • datafusion-pruning: For this one I think we need to first figure out if this is only used by parquet feature right now claims are still valid. pruning_predicate.rs was created from logic initially living in lib.rs as part of this refactor: https://github.com/apache/datafusion/pull/16587. And that PR was from Jun 27. It's likely claim is no longer valid since expect is failing with unfulfilled lint expectation errors, in which case I think we should remove the #![allow(dead_code)]
  • datafusion-session: I think it's best to enable #![deny(clippy::allow_attributes)] for future use cases. Case from datafusion-proto suggests if expect was used instead of allow, then deprecated attributes would've been removed too when max_statistics_size was removed.

My suggestion when facing -D unfulfilled-lint-expectations errors is to first identify possible root causes, then track down the root cause, and finally document why an allow needs to be removed based on such findings.

carlosahs avatar Dec 14 '25 03:12 carlosahs

Opened a PR for:

  • datafusion-proto
  • datafusion-pruning
  • datafusion-session

kumarUjjawal avatar Dec 16 '25 08:12 kumarUjjawal