Error when migrating data from mysql to duckdb
Hi! I'm trying to make PoC with pyairbyte and got some strange errors
Environment: Ubuntu 22, Python 3.11,
Errors
- When I run script, it complains about docker but continues to run
AirbyteSubprocessFailedError: Subprocess failed.
Run Args: ['docker', 'run', '--rm', '-i', '--volume', '/home/aprytkov/test-airbyte/source-mysql:/local/', '--volume', '/tmp:/tmp', 'airbyte/source-mysql:latest', 'discover', '--config', '/tmp/tmp8hzu648g.json']
- Full log: https://gist.github.com/Arkronus/5c5c424d3c458c2a20f4011397453683
Here is my code
import airbyte as ab
def main():
# connectors_list = ab.get_available_connectors()
# print(f"Available connectors: {connectors_list}")
source_name = 'source-mysql'
mysql_source: ab.Source = ab.get_source(source_name, install_if_missing=True)
# open mysql databse https://docs.rfam.org/en/latest/database.html
mysql_source.set_config(
config={
"host":"mysql-rfam-public.ebi.ac.uk",
'port': 4497,
'database': 'Rfam',
'username': 'rfamro',
'replication_method': {
'method': 'STANDARD'
}
}
)
streams = mysql_source.get_available_streams()
print('Streams')
print(streams)
destination_name = 'destination-duckdb'
duckdb_destination: ab.Destination = ab.get_destination(destination_name, install_if_missing=True)
print(duckdb_destination.docs_url)
duckdb_destination.set_config(
config={
'destination_path': 'db.duckdb'
}
)
result = mysql_source.read(streams=['family','clan'])
duckdb_destination.write(result)
print("Load complete")
if __name__ == "__main__":
main()
@Arkronus - thanks for raising this.
The interesting bit I see in the logs is this:
Message: class io.airbyte.protocol.models.v0.ConfiguredAirbyteCatalog schema violation: Validation error(s) :
streams.0.cursor_field: Null value is not allowed. (code: 1021)
From: streams.0.<items>.<#/definitions/ConfiguredAirbyteStream>.cursor_field.<nullable>
streams.1.cursor_field: Null value is not allowed. (code: 1021)
From: streams.1.<items>.<#/definitions/ConfiguredAirbyteStream>.cursor_field.<nullable>
I believe this connector may require that the catalog be annotated with a cursor field delegation for the tables/streams being selected. If you switch replication_method to a CDC-based method, it may resolve this, since the CDC method(s) have in-built cursor handling. Let me know if this helps!
Hello!
I've got exactly the same error. Whether the replication method is STANDARD or CDC, it doesn't change anything.
Environment:
- python 3.11
- source-mysql 3.11.1
- mysql server 5.6