vertica-python icon indicating copy to clipboard operation
vertica-python copied to clipboard

vertica_python 1.1.1 jsonparser (reject_on_duplicates=true) not working

Open mlvdheiden opened this issue 3 years ago • 3 comments

running:

cop_cmd = "COPY {} FROM STDIN parser fjsonparser(reject_on_duplicate=true, reject_on_empty_key=true, flatten_arrays=true);".format(tab_name, options) result = cur.copy(cop_cmd, json.dumps(entities)) does NOT reject on duplicates, instead it creates the same table records again.

sample entities:

{'ucmdbId': '4c5ce50bcfa0eaf69f742751bbadd330', 'globalId': '4c5ce50bcfa0eaf69f742751bbadd330', 'type': 'node', 'properties': {'lic_operational2advanced': False, 'root_candidatefordeletetime': '2022-08-10T16:00:01.909Z', 'data_operationisnew': False, 'lic_type_basic': False, 'lic_type_asset': False, 'lic_type_udf': False, 'root_class': 'node', 'last_discovered_by_probe': '10.208.233.79', 'lic_type_udi': False, 'TenantsUses': ['System Default Tenant'], 'display_label': '35.82.239.78 DefaultDomain', 'data_operationstate': 0, 'host_key': '35.82.239.78 DefaultDomain', 'lic_type_premium': False, 'data_allow_auto_discovery': True, 'last_discovered_time': '2022-07-19T17:24:41.610Z', 'root_actualdeletetime': '2022-08-30T16:00:01.909Z', 'data_teststate': 0, 'default_gateway_ip_address_type': 'IPv4', 'data_changecorrstate': 0, 'last_modified_time': '2022-07-12T16:00:01.842Z', 'create_time': '2022-07-11T17:24:46.936Z', 'TenantOwner': 'System Default Tenant', 'data_changestate': 0, 'global_id': '4c5ce50bcfa0eaf69f742751bbadd330', 'lic_type_management': False, 'data_testisnew': False, 'root_lastaccesstime': '2022-07-21T16:00:01.909Z', 'lic_type_operational': False, 'root_iscandidatefordeletion': False, 'data_source': 'UCMDBDiscovery: Network Connectivity Data Analyzer', 'last_discovered_by': 'Network Connectivity Data Analyzer', 'data_changeisnew': False, 'data_testcorrstate': 0, 'track_changes': False, 'host_iscomplete': False, 'data_operationcorrstate': 0, 'is_save_persistency': False, 'data_adminstate': 0, 'lic_type_full': False, 'root_enableageing': False, 'data_updated_by': "Enrichment: Enrichment's rule: Disable Aging"}, 'attributesQualifiers': None, 'displayLabel': None, 'label': 'ConfigurationItem'}

where ucmdbid is a Uuid and globalid is a Uuid

mlvdheiden avatar Jul 22 '22 14:07 mlvdheiden

@mlvdheiden Looks like this is not a problem of vertica-python itself. Please refer to Vertica Documentation to check your COPY syntax and json data (Trying to do the copy on vsql is also a way to debug). If you believe this is a bug of vertica-python itself, please provide a complete reproducer. Thanks.

sitingren avatar Jul 25 '22 11:07 sitingren

Thx for your reply. Here the complete sequence:

a. create the connection: db_config = {'host': 'xxxxxxxxxx', 'port': '5433', 'user': 'dbadmin', 'password': 'xxxxxxxx, 'database': 'xxxxxxxx', 'ssl': False, 'autocommit': True, 'use_prepared_statements': False, 'type': 'vertica', 'connection_load_balance': False, 'session_label': 'Streamer', 'connection_timeout': 1860000}

conn = vertica_python.connect(db_config)

b. create table: cur = conn.cursor() tab_name = 'test' un_col = 'globalid' stmt = 'CREATE FLEX TABLE %s(%s VARCHAR(200) NOT NULL, '
'%s DATE);' % (tab_name, un_col, 'loaded_at') try ... result = cur.execute(stmt) ..except...

b. run the copy command: tab_name='test' options = 'REJECT_ON_DUPLICATE=TRUE,FLATTEN_ARRAYS=TRUE' cur = conn.cursor() cop_cmd = "COPY {} FROM STDIN parser fjsonparser(reject_on_duplicate=true, reject_on_empty_key=true, flatten_arrays=true REJECTED DATA AS TABLE);".format(tab_name, options, tab_name+'reject') try ... result = cur.copy(cop_cmd, json.dumps(entities)) ... except...

If you run this command twice with the same data (see sample from first thread) it will append data to the existing table and NOT reject.

Do I eventually misunderstand the option handling for COPY?

mlvdheiden avatar Jul 25 '22 14:07 mlvdheiden

@mlvdheiden Please read up to Vertica Documentation or ask Vertica support for clarifications. Your question is unrelated to vertica-python. In the Vertica Documentation example, it shows reject_on_duplicate=true will reject a row if your json row data is {"ucmdbId": ..., "globalId": "4c5ce50bcfa0eaf69f742751bbadd330", "globalId": "4c5ce50bcfa0eaf69f742751bbadd330", ...}, not running the same row data twice.

sitingren avatar Jul 26 '22 10:07 sitingren