PyAirbyte
PyAirbyte copied to clipboard
🐛 Bug: Warnings looks like errors: `Could not determine airbyte type from JSON schema`
We've received this a few times now - when PyAirbyte cannot determine a type with confidence, it prints a warning message but users often are led to believe this is fatal. We should improve the messaging here so it is more clear that this is a warning - or we should remove the messaging altogether.
Raised (most recently) here:
- https://github.com/airbytehq/PyAirbyte/issues/212
Background:
Whenever PyAirbyte determines a column type, it uses a heuristic to try to determine the "best" column type given the input data type. However, PyAirbyte always applies a "failsafe" data type when none can be determined otherwise. This is normally a string, JSON, or Variant column type, and in most cases the sync will succeed regardless - with no real loss of fidelity.
Example cases:
A classic example of this is a data type that is defined as anyOf(string, object)
. Since databases generally don't support a column type that would equally support either a string or an object (aka dictionary/struct), the cache will determine the best-available failover type, which would probably be VARCHAR
- meaning object-type inputs would be converted to string. This works fine in practice; while we might lose some "data type fidelity" we generally have zero loss of "data fidelity".
Clarified messaging:
The right thing to do here (apart from adding more advanced heuristics) would be to improve messaging so that the user knows that this is not a failure condition, and that we do have a valid typing fail-safe that we are applying for this condition. Alternatively, we could eliminate this messaging entirely.
cc @marcosmarxm , @bindipankhudi
same error is there for the source-stripe... will it be fixed?