server icon indicating copy to clipboard operation
server copied to clipboard

MDEV-21978 make my_vsnprintf to use gcc-compatible format extensions

Open ParadoxV5 opened this issue 1 year ago • 8 comments

  • The Jira issue number for this PR is: MDEV-21978
  • Google Summer of Code (GSoC) 2024 milestone 1
    • Despite I’m not a beginner at open-source coding, this GSoC still manages to stretch my comfort zone :chart_with_upwards_trend:. Workflow, code style, and even the organization size for starters are all new to me.
    • GSoC milestone 2 & project status: #3360
  • Achievement: Discovered that emojis break Buildbot. :sweat_smile:

Description

This is the first but primary part of MDEV-21978. It adds the following suffix-based (MariaDB) alternatives to my_vsnprintf’s MySQL extensions:

Old New Brief Description
%`s %sQ[^I] quote with `
%b %sB fill with the binary sequence without terminating on \0
%T %sT use ... for any truncation
%sS opt-in escapes for any conflicts with the above
%M %iE[^u] print this errno and its corresponding strerror()
(%dE has no effect and thus serves as %iE’s escape.)

[^I]: Zulip discussion agreed to change this from JIRA’s %sI [^u]: Zulip discussion suggested to change this from JIRA’s %uE.

%d and %i were synonyms in plain C/C++ printf. This PR’s design utilizes that (but breaks it) to skip %iI and in turn %dE. This design reduces the impact of the suffix design (see § side-effects) by leaving the more popular %d alone. %i solos %iE here and, if we make more extensions in the future, will solo those as well. OTOH, no other format specifiers have a less-popular synonym, so %s (plus any specifiers extended in the future) gets an anti-conflict escape %sS[^X].

[^X]: JIRA author @vuvova designed this from SQL select '''': https://mariadb.zulipchat.com/#narrow/stream/118759-general/topic/MDEV-21978.20make.20my_vsnprintf.20to.20use.20gcc-compatible.20format/near/438173635

Note: Unlike what the JIRA requires:

All error messages and all usages of my_vsnprintf should be changed to use the new syntax. One way to do it is to disable old syntax conditonally[sic], only in debug builds. All gcc printf format checks should be enabled.

This PR will only migrate usages immediately affected by the addition of suffixes. To divide the project for reviewers’ convenience (and to mark Google Summer of Code milestones), I have divided the following housekeeping to a separate follow-up PR at #3360 (MDEV-21978 & GSoC part 2):

  1. (Re)activate GCC printf checks
  2. Utilize those GCC checks to port my_vsnprintf usages to the new preferred syntax
  3. Delete the old, practically deprecated extension syntaxes (e.g., %M) because printf checks are sure to compilain their tests.
    • To help keep the above housekeeping chores a sequential process, I’ve left the old format intact for now, although only on a best-effort basis.

What problem is the patch trying to solve?

Development tools such as GCC can check printf formats and arguments and catch mistakes. Howëver, our format extensions inherited from MySQL are incompatible with these tools – as listed above, they utilized specifiers that’re undefined in the C Standard.

Porting them to Standard C vsnprintf would lose the platform consistency of our in-house substitute. Therefore, the middle ground is to instead (eventually) switch to a standard-compatible set for those extensions. The JIRA says that the Linux kernel has already gone this route.

Do you think this patch might introduce side-effects in other parts of the server?

Potentially.

The introduction of suffixes makes %s and %i start consuming certain successive chars (one of QBTSE) which may’ve been originally intended for literal printing. As such, in addition to those extension alternatives, the design provides %sS and %d as synonyms for %s and %i for escaping. For example, Data Class: %sType could now result in Data Class: Structype. It now needs an escape: Data Class: %sSType (or better, a space).

In recognition of this (potential) uh regression?, I’ve bumped the snprintf service’s major version. The CI tests apparently found no mistakes, though. If there’re affected usages, I will iron them out during the rest of the GSoC while I track printf usages to migrate to the new syntax.

non-goal

my_vsnprintf.c’s code seems to have some age. Howëver, intensive refactoring for optimization is outside of this ticket’s scope.

Non-exhaustive list of possible improvements (in no particular order)
  • There may be some repeated logic in my_vsnprintf_ex and process_arg we could extract to additional helper functions.
  • union constructs reüse the same RAM space for any argument type.
  • Input-tracking variables could be in the loops’ scope rather than (file-)global.
  • We could repurpose process_str_arg’s length_arg rather than simply copying it to length.
  • process_str_arg’s if (nice_cut) has some mergeable code.
  • Besides the one in this PR, there are more if-else chains in process_int_arg and my_vsnprintf_ex that could instead use switch-case.
  • process_str_arg’s if (left_fill) currently doesn’t cover %sQ and doesn’t have an else section.
  • 0 is a flag, not a part of the field length… it shouldn’t be get_length’s concern…
  • process_arg could do some preprocessing to minimize duplicated code.

Release Notes

checks existing notes along the lines of:

  • my_snprintf_service extensions %`s, %b, %M and %T are now %sQ, %sB, %iE and %sT respectively, which are fully compatible with printf checks such as GCC -Wformat.
    • There is also a new escape %sS synonymous to %s.
    • %dE isn’t added; use it to escape %iE.
    • This applies to both my_snprintf and all other service functions that utilize it.

The snprintf family is an internal component, and at most exported to plugins. FWIW, this PR alone should bring no user-visible changes, and code quality brags can wait ’til Part 2. I have updated the (only?) documentation in service_my_snprintf.h. I have searched printf on https://mariadb.com/kb/en/ and found no relevant pages.

How can this PR be tested?

We can test the new syntax with the same strategy as the old.

In fact, the new tests I’ve included in this PR for the new syntax, I crafted them by modifying a copy-paste of existing my_vsnprintf-t.c lines. Note that these tests are no more comprehensive as the existing originals (which, uh, weren’t very throughout, unfortunately). I’ve additionally tailored a semi-unique test to confirm that %sB ignores \0-termination.

PR quality check

  • [x] This is a new feature or a refactoring, and the PR is based against the latest MariaDB development branch.
    • In order to fit this new feature, I had to also do a wee bit of refactoring so the implementation fit inside existing %s and %i branches.
  • ~~This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.~~
  • [x] I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • [x] For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

ParadoxV5 avatar Jun 05 '24 02:06 ParadoxV5