PyAirbyte icon indicating copy to clipboard operation
PyAirbyte copied to clipboard

Error when migrating data from mysql to duckdb

Open Arkronus opened this issue 1 year ago • 2 comments

Hi! I'm trying to make PoC with pyairbyte and got some strange errors

Environment: Ubuntu 22, Python 3.11,

Errors

  1. When I run script, it complains about docker but continues to run
AirbyteSubprocessFailedError: Subprocess failed.
    Run Args: ['docker', 'run', '--rm', '-i', '--volume', '/home/aprytkov/test-airbyte/source-mysql:/local/', '--volume', '/tmp:/tmp', 'airbyte/source-mysql:latest', 'discover', '--config', '/tmp/tmp8hzu648g.json']
  1. Full log: https://gist.github.com/Arkronus/5c5c424d3c458c2a20f4011397453683

Here is my code

import airbyte as ab

def main():
    # connectors_list = ab.get_available_connectors()
    # print(f"Available connectors: {connectors_list}")
    
    source_name = 'source-mysql'
    
    mysql_source: ab.Source = ab.get_source(source_name, install_if_missing=True)
    
    # open mysql databse https://docs.rfam.org/en/latest/database.html
    mysql_source.set_config(
        config={
            "host":"mysql-rfam-public.ebi.ac.uk",
            'port': 4497,
            'database': 'Rfam',
            'username': 'rfamro',
            'replication_method': {
                'method': 'STANDARD'
            }
        }
    )
    
    streams = mysql_source.get_available_streams()
    print('Streams')
    print(streams)
    
    destination_name = 'destination-duckdb'

    duckdb_destination: ab.Destination = ab.get_destination(destination_name, install_if_missing=True)
    print(duckdb_destination.docs_url)
    duckdb_destination.set_config(
        config={
            'destination_path': 'db.duckdb'
        }
    )

    result = mysql_source.read(streams=['family','clan'])
    duckdb_destination.write(result)
    print("Load complete")

if __name__ == "__main__":
    main()

Arkronus avatar Dec 27 '24 18:12 Arkronus

@Arkronus - thanks for raising this.

The interesting bit I see in the logs is this:

Message: class io.airbyte.protocol.models.v0.ConfiguredAirbyteCatalog schema violation: Validation error(s) :
streams.0.cursor_field: Null value is not allowed. (code: 1021)
From: streams.0.<items>.<#/definitions/ConfiguredAirbyteStream>.cursor_field.<nullable>
streams.1.cursor_field: Null value is not allowed. (code: 1021)
From: streams.1.<items>.<#/definitions/ConfiguredAirbyteStream>.cursor_field.<nullable>

I believe this connector may require that the catalog be annotated with a cursor field delegation for the tables/streams being selected. If you switch replication_method to a CDC-based method, it may resolve this, since the CDC method(s) have in-built cursor handling. Let me know if this helps!

aaronsteers avatar Jan 24 '25 21:01 aaronsteers

Hello!

I've got exactly the same error. Whether the replication method is STANDARD or CDC, it doesn't change anything.

Environment:

  • python 3.11
  • source-mysql 3.11.1
  • mysql server 5.6

nicob3y avatar Feb 04 '25 00:02 nicob3y