PyAirbyte icon indicating copy to clipboard operation
PyAirbyte copied to clipboard

🐛 Bug: Warnings looks like errors: `Could not determine airbyte type from JSON schema`

Open aaronsteers opened this issue 9 months ago • 1 comments

We've received this a few times now - when PyAirbyte cannot determine a type with confidence, it prints a warning message but users often are led to believe this is fatal. We should improve the messaging here so it is more clear that this is a warning - or we should remove the messaging altogether.

Raised (most recently) here:

  • https://github.com/airbytehq/PyAirbyte/issues/212

Background:

Whenever PyAirbyte determines a column type, it uses a heuristic to try to determine the "best" column type given the input data type. However, PyAirbyte always applies a "failsafe" data type when none can be determined otherwise. This is normally a string, JSON, or Variant column type, and in most cases the sync will succeed regardless - with no real loss of fidelity.

Example cases:

A classic example of this is a data type that is defined as anyOf(string, object). Since databases generally don't support a column type that would equally support either a string or an object (aka dictionary/struct), the cache will determine the best-available failover type, which would probably be VARCHAR - meaning object-type inputs would be converted to string. This works fine in practice; while we might lose some "data type fidelity" we generally have zero loss of "data fidelity".

Clarified messaging:

The right thing to do here (apart from adding more advanced heuristics) would be to improve messaging so that the user knows that this is not a failure condition, and that we do have a valid typing fail-safe that we are applying for this condition. Alternatively, we could eliminate this messaging entirely.

cc @marcosmarxm , @bindipankhudi

aaronsteers avatar May 01 '24 02:05 aaronsteers

same error is there for the source-stripe... will it be fixed?

gan-pit avatar Jun 11 '24 10:06 gan-pit