icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Diplomat: delete default constructor in C++ for transparent structs

Open echeran opened this issue 1 year ago • 9 comments

Background

When writing tests for Diplomat code added for Collator, a problem occurred with the new CollatorOptions. Currently, CollatorOptions is an options bag struct in Rust that meant to be mutated directly before passed to a constructor. Int Diplomat, it gets transpiled to a transparent struct in C++. The generated C++ code looks like:

struct ICU4XCollatorOptions {
 public:
  ICU4XCollatorStrength strength;
  ICU4XCollatorAlternateHandling alternate_handling;
  ICU4XCollatorCaseFirst case_first;
  ICU4XCollatorMaxVariable max_variable;
  ICU4XCollatorCaseLevel case_level;
  ICU4XCollatorNumeric numeric;
  ICU4XCollatorBackwardSecondLevel backward_second_level;
};

Then, in my test code, I wrote the faulty code

...
  ICU4XLocale locale = ICU4XLocale::create("en").ok().value();
  ICU4XCollatorOptions options;
  ICU4XCollator collator = ICU4XCollator::try_new(dp, locale, options).ok().value();
  ICU4XOrdering actual = collator.compare(manna, manana);
...

"Faulty" as in, it caused segmentation faults because ICU4XCollatorOptions options; was not being initialized, as the valgrind utility pointed out.

Proposal

Instead we want:

struct ICU4XCollatorOptions {
  ICU4XCollatorOptions() = delete;
 public:
...
};

This disallows the default constructor, which forces users to initialize all fields of a transparent struct upfront when instaniating it anew. Then, they would be forced to do the safe thing in the calling code:

...
  ICU4XLocale locale = ICU4XLocale::create("en").ok().value();
  ICU4XCollatorOptions options {
    strength: ICU4XCollatorStrength::Auto,
    alternate_handling: ICU4XCollatorAlternateHandling::Auto,
    case_first: ICU4XCollatorCaseFirst::Auto,
    max_variable: ICU4XCollatorMaxVariable::Auto,
    case_level: ICU4XCollatorCaseLevel::Auto,
    numeric: ICU4XCollatorNumeric::Auto,
    backward_second_level: ICU4XCollatorBackwardSecondLevel::Auto,
  };
  ICU4XCollator collator = ICU4XCollator::try_new(dp, locale, options).ok().value();
...

echeran avatar Sep 06 '22 18:09 echeran

Ideally we should be able to make the default constructor populate the fields with default values. But in the short term it may be advisable to delete the default constructor since it creates uninitialized memory.

sffc avatar Sep 06 '22 20:09 sffc

Yes, let's do that.

Manishearth avatar Sep 06 '22 20:09 Manishearth

Actually, wait, why is this broken? Each of these is an enum with a zero value of Auto, no? C++ ought to be generating the right default initializer here?

At least, that was the intent: we do this with options bags elsewhere too.

Manishearth avatar Sep 06 '22 20:09 Manishearth

POD types like enums and integers are undefined until you set them in C++.

sffc avatar Sep 06 '22 20:09 sffc

If we want C++ to initialize the values to zero, that's code we need to write (or which Diplomat needs to generate).

sffc avatar Sep 06 '22 21:09 sffc

https://stackoverflow.com/questions/1069621/are-members-of-a-c-struct-initialized-to-0-by-default

sffc avatar Sep 06 '22 21:09 sffc

I think what we should do is:

  1. In docs, encourage people to use the initializer syntax ICU4XCollatorOptions options = {};
  2. Delete the default constructor to prevent ICU4XCollatorOptions options; which creates unitialized memory and easily creates undefined behavior

sffc avatar Sep 06 '22 21:09 sffc

Ah, right, {} is different from the default constructor. Let's do that then.

Manishearth avatar Sep 06 '22 21:09 Manishearth

Hmm, I tried doing this and it disables brace-initialization like return ICU4XFixedDecimalFormatOptions{ .grouping_strategy = ......}

Manishearth avatar Sep 06 '22 22:09 Manishearth

@Manishearth Make sure this issue is tracked correctly.

sffc avatar Oct 17 '22 21:10 sffc