AVRO-4035 [C++] Add doc strings to generated classes
What is the purpose of the change
As requested by AVRO-4035, this uses the doc field from the avro schema when generating code.
That is, this schema:
{
"type": "record",
"doc": "Top level Doc.\nWith multiple lines",
"name": "RootRecord",
"fields": [
{
"name": "mylong",
"doc": "mylong field doc.",
"type": "long"
},
Generates the following when run through avrogencpp
// Top level Doc.
// With multiple lines
struct RootRecord {
typedef _bigrecord_Union__0__ myunion_t;
typedef _bigrecord_Union__1__ anotherunion_t;
// mylong field doc.
int64_t mylong;
Note: I decided to use line comments (//) rather than block comments (/* or /**) as they make the logic for handling escapes simpler, and make it easy to properly indent the entire comment when generating comments for record fields.
Verifying this change
This change is already covered by existing tests - avrogencpp is used to compile several schemas which include doc fields. The generated code is still valid, and one schema was adjusted to test the edge case with new lines.
Documentation
-
Does this pull request introduce a new feature? yes
-
If yes, how is the feature documented? I have updated the Jira issue to contain the release note:
avrogencpp will now include the
docfields in schemas in the generated code for records
If the schema has "doc": "Top level Doc.\r\nWith multiple lines", then it might be useful to trim the \r from the comment and assume that the ostream will expand \n back to \r\n if that's the convention on the operating system.
If the schema has "doc": "First paragraph.\n\nSecond paragraph.", then it would be nicer to output the blank line as just // rather than with a trailing space // .
A trailing backslash might cause trouble. If the schema has "doc": "Do not use the backslash character \\", and you generate C++
// Do not use the backslash character \
struct RootRecord {
...
then the preprocessor will treat the struct RootRecord { line as part of the comment. This cannot be fixed by doubling the backslash; and with GCC, this cannot be fixed by appending space characters either (Escaped Newlines (Using the GNU Compiler Collection (GCC))). I guess the code generator could detect that the last line of the comment ends with a backslash (optionally followed by whitespace), and inject another comment line;
// Do not use the backslash character \
//
struct RootRecord {
...
I guess the code generator could detect that the last line of the comment ends with a backslash (optionally followed by whitespace), and inject another comment line;
Unfortunately this will result in compiler warnings/errors dependent on the compiler options:
/home/gerrit/Desktop/avro/lang/c++/build/bigrecord.hh:34:1: error: multi-line comment [-Werror=comment]
34 | // with trailing backslash\
| ^
I decided to append (backslash) to lines ending with it. Not ideal, but generating code containing warnings isn't good either.