sdk
sdk copied to clipboard
Option in SQL Targets to coerce types based on observed record shape
This would be a simple-to-use option for end users who are trying to deal with "rogue" or incorrect type declarations in the upstream tap.
For instance, if the tap incorrectly defines one of its fields as an integer, but it receives a string, then we could give the user an option of auto-expanding the data type to be inclusive of the declared type and also the observed type. Since a string column can hold integers as well as strings, expanding the data type to a string type will allow the load to complete successfully.
Implementation wise, if built within the tap or mapper layer, this normally would result in a new SCHEMA message being emitted upon observance of a record that does not fit the declared schema. However, if built in the target, there's no need to emit a SCHEMA message. Instead, the Sink class per batch would expand data type negotiation to be inclusive of (1) declared type, (2) target column's already existing type, and (2) observed data type in the records. Currently this negotiation exists but it only considers the first two factors.
Note:
- This probably would not be 'on' by default, since there are performance and stability reasons to not do this. However, this feature would exist an option to make sure data is landing properly.
cc @radbrt