flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-37125] [conf] fixed env.log.max java.lang.NumberFormatException

Open sehajsandhu123 opened this issue 11 months ago • 8 comments

What is the purpose of the change

Fix for NumberFormatException in Log Configuration This PR resolves an issue where the log.max parameter in config.yaml was treated as a string, causing a java.lang.NumberFormatException when parsed by the DefaultRolloverStrategy plugin.

Changes: Updated the config.sh script to clean up MAX_LOG_FILE_NUMBER by removing leading and trailing single quotes before parsing.

Testing: Verified correct parsing of config env.log.max in a real Hadoop 3.3.6 cluster. Confirmed the history server starts without errors after the change.

Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable)

sehajsandhu123 avatar Jan 21 '25 08:01 sehajsandhu123

CI report:

  • 7a21dd39851cc83976c1b0ca24409e67966021e1 Azure: FAILURE
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Jan 21 '25 10:01 flinkbot

Reviewed by Chi on 23/01/2025 Go back to the submitter with review comments.

davidradl avatar Jan 24 '25 16:01 davidradl

@davidradl The reason for tolerating strings is that ConfigurationParserUtils.loadAndModifyConfiguration inherently processes all YAML configuration values as strings due to how Configuration.toFileWritableMap serializes data using YamlParserUtils.toYAMLString. This results in numeric values being wrapped as strings (e.g., '10').

This fix handles this behaviour by removing unnecessary quotes to ensure the value is interpreted correctly. Adding an error check here might result in false positives, as numeric values are always converted to strings in config.yaml

sehajsandhu123 avatar Jan 24 '25 17:01 sehajsandhu123

Hey @davidradl, Just checking in on this PR—are there any updates from the review, or any changes you'd like me to make?

sehajsandhu123 avatar Feb 06 '25 13:02 sehajsandhu123

@rmetzger could you merge this PR? it relates to an issue you worked on a while back

FLINK-19865

sehajsandhu123 avatar Feb 11 '25 11:02 sehajsandhu123

Hi @dawidwys , Could you review and merge this PR for [FLINK-37125]? It fixes a NumberFormatException in env.log.max by ensuring correct parsing. The PR is already approved by @davidradl .

Thanks!

sehajsandhu123 avatar Feb 25 '25 10:02 sehajsandhu123

This PR is being marked as stale since it has not had any activity in the last 90 days. If you would like to keep this PR alive, please leave a comment asking for a review. If the PR has merge conflicts, update it with the latest from the base branch.

If you are having difficulty finding a reviewer, please reach out to the community, contact details can be found here: https://flink.apache.org/what-is-flink/community/

If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed.

github-actions[bot] avatar May 27 '25 06:05 github-actions[bot]

@ferenc-csaky If you see these tickets, you'll see that similar issues have been handled in a targeted manner before, such as in FLINK-4040, FLINK-33721, FLINK-32290

In each case, the fix was localized to avoid widespread changes and potential regressions, I considered updating Configuration.toFileWritableMap to handle type conversions globally. However, this would require significant changes to downstream methods and could introduce unintended side effects.

The change has been tested in a Hadoop 3.3.6 cluster and verified to resolve the error, while not breaking any other functionality, so I went forward with it

sehajsandhu123 avatar May 27 '25 14:05 sehajsandhu123

@flinkbot run azure

ferenc-csaky avatar Jul 01 '25 15:07 ferenc-csaky

Hi @ferenc-csaky,

Most of the CI tests are now passing, and the PR has received the necessary approvals. Do you think it’s good to merge at this point, or is there anything else you’d like me to address?

sehajsandhu123 avatar Jul 02 '25 13:07 sehajsandhu123

@sehajsandhu123 I'd rather not merge this without a successful CI run, can you rebase your commit to the latest master ant force push the PR branch pls? I think that should most probably fix it.

ferenc-csaky avatar Jul 02 '25 13:07 ferenc-csaky

@flinkbot run azure

sehajsandhu123 avatar Jul 02 '25 14:07 sehajsandhu123

I’ve just rebased the branch onto the latest master and force-pushed the updates. Could you please trigger a new CI run when you get a chance? Thanks for your help

sehajsandhu123 avatar Jul 02 '25 14:07 sehajsandhu123

It's happening after every commited change automatically, but it takes some time to get triggered. Already running for now though. I'll keep an eye on it.

ferenc-csaky avatar Jul 02 '25 14:07 ferenc-csaky

The E2E run in the CI failed because of an Ubuntu package manager network issue, today I had networking probelms with apt-get commands myself, so completely unrelated to this change, hence merging this.

ferenc-csaky avatar Jul 02 '25 17:07 ferenc-csaky