json-avro-converter icon indicating copy to clipboard operation
json-avro-converter copied to clipboard

field [fieldName] is expected to be one of these: RECORD, NULL, for nested record with non defined nullable values

Open gadaldo opened this issue 9 years ago • 3 comments

I got: Could not evaluate union, field [fieldName] is expected to be one of these: RECORD, NULL. If this is a complex type, check if offending field: trafficSource.adwordsClickInfo adheres to schema. when I have nested records, where some of the 'nullable' fields are not specified.

schema sample:

{
    "type": "record",
    "name": "Root",
    "fields": [
        {
            "name": "field1",
            "type": [
                "long",
                "null"
            ]
        },
        {
            "name": "nestedRecord",
            "type": [
                {
                    "type": "record",
                    "namespace": "root",
                    "name": "NestedRecord",
                    "fields": [
                        {
                            "name": "nested1",
                            "type": [
                                "long",
                                "null"
                            ]
                        },
                        {
                            "name": "nested2",
                            "type": [
                                "long",
                                "null"
                            ]
                        }
                    ]
                },
                "null"
            ]
        }
    ]
}

and json string such as:

{
    "field1" : 10999859003, 
    "nestedRecord": 
    { 
        "nested1" : 123321321 
    }
}

I think when it goes in recursion it is not able to skip missing values because for those missing value at level 0 it skips the missing values.

Thank you

gadaldo avatar Oct 27 '16 11:10 gadaldo

Hey @gadaldo, In this case, the error json-avro-convert is throwing is correct because nested2 is not defined in your sample json and the schema provides no default for it. Avro should only accept this datum in either of those cases and I've verified it does in both with this code:

(using a default value)

    def 'should convert nested nullable records'() {
        given:
        def schema = '''
            {
                "type": "record",
                "name": "Root",
                "fields": [
                    {
                        "name": "field1",
                        "type": [
                            "long",
                            "null"
                        ]
                    },
                    {
                        "name": "nestedRecord",
                        "type": [
                            {
                                "type": "record",
                                "namespace": "root",
                                "name": "NestedRecord",
                                "fields": [
                                    {
                                        "name": "nested1",
                                        "type": [
                                            "long",
                                            "null"
                                        ]
                                    },
                                    {
                                        "name": "nested2",
                                        "type": [
                                            "long",
                                            "null"
                                        ], "default": 42
                                    }
                                ]
                            },
                            "null"
                        ]
                    }
                ]
            }
        '''

        def json = '''
            {
                "field1" : 10999859003,
                "nestedRecord":
                {
                    "nested1" : 123321321, "nested2":42
                }
            }
        '''

        when:
        def result = converter.convertToJson(converter.convertToAvro(json.bytes, schema), schema)

        then:
        toMap(result) == toMap(json)
    }```

**(using a provided value)**
```groovy
    def 'should convert nested nullable records2'() {
        given:
        def schema = '''
            {
                "type": "record",
                "name": "Root",
                "fields": [
                    {
                        "name": "field1",
                        "type": [
                            "long",
                            "null"
                        ]
                    },
                    {
                        "name": "nestedRecord",
                        "type": [
                            {
                                "type": "record",
                                "namespace": "root",
                                "name": "NestedRecord",
                                "fields": [
                                    {
                                        "name": "nested1",
                                        "type": [
                                            "long",
                                            "null"
                                        ]
                                    },
                                    {
                                        "name": "nested2",
                                        "type": [
                                            "long",
                                            "null"
                                        ]
                                    }
                                ]
                            },
                            "null"
                        ]
                    }
                ]
            }
        '''

        def json = '''
            {
                "field1" : 10999859003,
                "nestedRecord":
                {
                    "nested1" : 123321321, "nested2":43
                }
            }
        '''

        when:
        def result = converter.convertToJson(converter.convertToAvro(json.bytes, schema), schema)

        then:
        toMap(result) == toMap(json)
    }```

jghoman avatar Dec 16 '16 02:12 jghoman

At level 0 of the tree it does, the problem is just when the algorithm goes in recursion. Anyway, I created my own version because I needed it, that JSON comes from TableRow object when reading from Bigquery (with BigQueryIO) and I have to transform in AVRO. It'a a feature that Google does behind the scene but they don't want to expose the API seen they do a further middle transformation to proto as documented here. So based on this algorithm I created mine but I don't know if you can close the issue. Thank you anyway

gadaldo avatar Dec 16 '16 10:12 gadaldo

is this issue got fixed? Please let me know how to resolve it

jainanuj07 avatar Jan 20 '20 19:01 jainanuj07