flink-http-connector
flink-http-connector copied to clipboard
Cannot use ignore parse errors for json responses
We are looking to ignore parse error in the json format, so we can ignore json response parse errors. There does not seem to be a way to define this.
The README says:
Additionally, it is possible to pass query format options from table's DDL. This can be done by using option like so: 'lookup-request.format.customFormatName.customFormatProperty' = 'propertyValue', for example 'lookup-request.format.customFormatName.fail-on-missing-field' = 'true'.
It is important that customFormatName part match SerializationFormatFactory identifier used for custom format implementation. In this case, the fail-on-missing-field will be passed to SerializationFormatFactory::createEncodingFormat( DynamicTableFactory.Context context, ReadableConfig formatOptions) method in ReadableConfig object.
With default configuration, Flink-Json format is used for GenericGetQueryCreator, all options defined in [json-format](https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/json/) can be passed through table DDL. For example 'lookup-request.format.json.fail-on-missing-field' = 'true'. In this case, format identifier is json.
I see unit tests indicating that this is valid:
+ "'format' = 'json',"
+ "'lookup-request.format' = 'json',"
+ "'lookup-request.format.json.fail-on-missing-field' = 'true',"
Unfortunately this does not work. In the flink tests I see
+ "'value.format'='test-format'\n"
+ "'value.test-format.delimiter'='|'\n"
+ "'value.test-format.fail-on-missing'='true'");
what the problem is, is that the validation logic wants
'lookup-request.format.json.ignore-parse-errors' = 'true',
but what actually works is 'lookup-request.json.ignore-parse-errors' = 'true',
Investigating whether we can change the validation to allow through the option without the word format in the prefix and amend the README to remove the word format from the key of format options like ignore-parse-errors
Unfortunately the existing encoding code works with the format word that is used in the junit.
- the existing custom format support passes custom format options to the the QueryFactory that can pass them through to the encoder. This is tied to the format aware queries. The way you specify the format config options contains the word
.format.. - the above mechanism does not match the way that Flink and the kafka connector specify custom formats. Kafka connector has value and key as the prefixes to the formats they use. The way you specify the format config options foes not contain the word
.format..
I propose we add a new prefix calledlookup-response. This will be passed to the decode by the http connector. We can define the config option without the .format. in the name. It will look like
'format' = 'json',
'lookup-response.format' = 'json',
'lookup-response.json.ignore-parse-errors' = 'true',
I have a fix , that implements this and have added unit tests, if ignore-parse-error is specified on the lookup-response, and continue-on-error = false or true and there are metadata columns defined then the values will be:
status code 200
headers - will be those from the response
error string will be null.
Ideally we should re implement or add a consistent implementation to the existing custom formats to follow the same config name pattern. This could be done as a follow on action after this PR if requested / required.
I am now thinking that a simpler fix will be to enable the format options at the top level. This will mean that we can configure 'format' = 'json', 'json.ignore-parse-errors' = 'true',
And this will work for the responses. The existing case with lookup-request prefix will be unaltered.