avro
avro copied to clipboard
AVRO-2539: fix nullable types for avro to thrift
Make sure you have checked all steps below.
Jira
- [x] My PR addresses the following Avro Jira issues and references them in the PR title. For example, "AVRO-1234: My Avro PR"
- https://issues.apache.org/jira/browse/AVRO-XXX
- In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.
Tests
- [x] My PR adds the following unit tests OR does not need testing for this extremely good reason:
Commits
- [x] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
- Subject is separated from body by a blank line
- Subject is limited to 50 characters (not including Jira issue reference)
- Subject does not end with a period
- Subject uses the imperative mood ("add", not "adding")
- Body wraps at 72 characters
- Body explains "what" and "why", not "how"
Documentation
- [x] In case of new functionality, my PR adds documentation that describes how to use it.
- All the public functions and the classes in the PR contain Javadoc that explain what it does
I'm not sure that there is some program error, because first time travis failed on java tests, but windows shell was successful, now java tests correct, but shell failed
Retriggered build, now shell is green too.
Sorry for big delay (vacation) in activity on this issue.
Probably, i've misunderstood some points of avro spec.
My idea was that if field in thrift schema marked as optional that in avro schema we should get ["null", "type"]
instead of ["type", "null"]
, because optional field may be null. However there is not optional, but nullable fields (such as strings, maps, lists and etc.) which also may be null, so such fields should have ["type", "null"]
in resulting avro schema as currently implemented.
Another point, is getting the default value of each field. After some researching and experiments i've found that getting the actual default value isn't trivial for some complex types (such as nested records, unions, ...) and attempt to get the default value of primitive (int, short, ...) may return incorrect result - not the default value defined in thrift schema, but the default value of java type. This corner case (for primitive types) has some workaround such as using java bean when compiling thrift classes, but it is not possible to force all users to do this.
The origin of my issue is this sample:
struct ThriftMsgV1 {
1: required string f1,
2: optional string f2
}
struct ThriftMsgV2 {
3: optional string f3
}
I assume, that MsgV2 is BACKWARD compatible with MsgV1 because BACKWARD compatibility allows me to delete fields and add optional fields, but when i test schemas compatibility using backward type i get error.
The question is: is it the right way to add default NULL value for all optional fields without considering their actual default value or there is another way?
I believe that i explain my thoughts clearly :)
Unfortunately, forked repository with this branch was gone and i can't modify this branch anymore. Because of it I've created another PR from actualized master
branch
https://github.com/apache/avro/pull/2396