icu4x
icu4x copied to clipboard
Diplomat: delete default constructor in C++ for transparent structs
Background
When writing tests for Diplomat code added for Collator, a problem occurred with the new CollatorOptions. Currently, CollatorOptions is an options bag struct in Rust that meant to be mutated directly before passed to a constructor. Int Diplomat, it gets transpiled to a transparent struct in C++. The generated C++ code looks like:
struct ICU4XCollatorOptions {
public:
ICU4XCollatorStrength strength;
ICU4XCollatorAlternateHandling alternate_handling;
ICU4XCollatorCaseFirst case_first;
ICU4XCollatorMaxVariable max_variable;
ICU4XCollatorCaseLevel case_level;
ICU4XCollatorNumeric numeric;
ICU4XCollatorBackwardSecondLevel backward_second_level;
};
Then, in my test code, I wrote the faulty code
...
ICU4XLocale locale = ICU4XLocale::create("en").ok().value();
ICU4XCollatorOptions options;
ICU4XCollator collator = ICU4XCollator::try_new(dp, locale, options).ok().value();
ICU4XOrdering actual = collator.compare(manna, manana);
...
"Faulty" as in, it caused segmentation faults because ICU4XCollatorOptions options; was not being initialized, as the valgrind utility pointed out.
Proposal
Instead we want:
struct ICU4XCollatorOptions {
ICU4XCollatorOptions() = delete;
public:
...
};
This disallows the default constructor, which forces users to initialize all fields of a transparent struct upfront when instaniating it anew. Then, they would be forced to do the safe thing in the calling code:
...
ICU4XLocale locale = ICU4XLocale::create("en").ok().value();
ICU4XCollatorOptions options {
strength: ICU4XCollatorStrength::Auto,
alternate_handling: ICU4XCollatorAlternateHandling::Auto,
case_first: ICU4XCollatorCaseFirst::Auto,
max_variable: ICU4XCollatorMaxVariable::Auto,
case_level: ICU4XCollatorCaseLevel::Auto,
numeric: ICU4XCollatorNumeric::Auto,
backward_second_level: ICU4XCollatorBackwardSecondLevel::Auto,
};
ICU4XCollator collator = ICU4XCollator::try_new(dp, locale, options).ok().value();
...
Ideally we should be able to make the default constructor populate the fields with default values. But in the short term it may be advisable to delete the default constructor since it creates uninitialized memory.
Yes, let's do that.
Actually, wait, why is this broken? Each of these is an enum with a zero value of Auto, no? C++ ought to be generating the right default initializer here?
At least, that was the intent: we do this with options bags elsewhere too.
POD types like enums and integers are undefined until you set them in C++.
If we want C++ to initialize the values to zero, that's code we need to write (or which Diplomat needs to generate).
https://stackoverflow.com/questions/1069621/are-members-of-a-c-struct-initialized-to-0-by-default
I think what we should do is:
- In docs, encourage people to use the initializer syntax
ICU4XCollatorOptions options = {}; - Delete the default constructor to prevent
ICU4XCollatorOptions options;which creates unitialized memory and easily creates undefined behavior
Ah, right, {} is different from the default constructor. Let's do that then.
Hmm, I tried doing this and it disables brace-initialization like return ICU4XFixedDecimalFormatOptions{ .grouping_strategy = ......}
@Manishearth Make sure this issue is tracked correctly.