tap-mongodb
tap-mongodb copied to clipboard
KeyError: '_id' CRITICAL ('Exception writing records', KeyError('_id'))
Mongo to redshift. I got this error what can be the possible reasons
RROR Exception writing records
Traceback (most recent call last):
File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 300, in write_batch
{'version': target_table_version})
File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/sql_base.py", line 808, in write_batch_helper
for table_batch in denest.to_table_batches(schema, key_properties, records):
File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/denest.py", line 21, in to_table_batches
key_properties)
File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/denest.py", line 47, in _get_streamed_table_schemas
key_prop_schemas[key] = schema['properties'][key]
KeyError: '_id'
CRITICAL ('Exception writing records', KeyError('_id'))
Traceback (most recent call last):
File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 300, in write_batch
{'version': target_table_version})
File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/sql_base.py", line 808, in write_batch_helper
for table_batch in denest.to_table_batches(schema, key_properties, records):
File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/denest.py", line 21, in to_table_batches
key_properties)
File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/denest.py", line 47, in _get_streamed_table_schemas
key_prop_schemas[key] = schema['properties'][key]
KeyError: '_id'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/macbookpro/.virtualenvs/target-redshift/bin/target-redshift", line 10, in <module>
sys.exit(cli())
File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_redshift/__init__.py", line 55, in cli
main(args.config)
File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_redshift/__init__.py", line 49, in main
target_tools.main(redshift_target)
File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/target_tools.py", line 28, in main
stream_to_target(input_stream, target, config=config)
File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/target_tools.py", line 77, in stream_to_target
raise e
File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/target_tools.py", line 64, in stream_to_target
line
File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/target_tools.py", line 152, in _line_handler
state_tracker.flush_stream(line_data['stream'])
File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/stream_tracker.py", line 41, in flush_stream
self._write_batch_and_update_watermarks(stream)
File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/stream_tracker.py", line 67, in _write_batch_and_update_watermarks
self.target.write_batch(stream_buffer)
File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_redshift/redshift.py", line 72, in write_batch
return PostgresTarget.write_batch(self, nullable_stream_buffer)
File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 309, in write_batch
raise PostgresError(message, ex)
target_postgres.exceptions.PostgresError: ('Exception writing records', KeyError('_id'))
@dmosorast again your help is need
@all I have also added detailed reference here. https://stackoverflow.com/questions/62793871/keyerror-id-critical-exception-writing-records-keyerror-id-singer-i
@waleedarshad-vf Did you find the solution?
@bonysimon007 No, We moved to stitchdata which is the enterprised version and worked
From the looks of this article, my best guess is it's probably stemming from the schema not being specified in the catalog.json
https://titanwolf.org/Network/Articles/Article?AID=b4eb664e-70f4-430c-b568-bdd0d32bcf35#gsc.tab=0
I managed to get around this by using https://github.com/transferwise/pipelinewise-tap-mongodb and https://github.com/transferwise/pipelinewise-target-postgres
I wrote some Python code to edit the catalog that comes out after tap-mongodb --config config/singer/tap-mongodb-config.json --discover > catalog.json
import json
import pathlib
input_file = 'catalog.json'
output_file = 'mongo/catalog.json'
with open(input_file) as f:
data = json.load(f)
for stream in data['streams']:
stream_metadata = stream['metadata'][0]
stream_metadata['metadata']['selected'] = True
stream_metadata['metadata']['replication-method'] = 'FULL_TABLE'
# https://github.com/singer-io/getting-started/blob/master/docs/DISCOVERY_MODE.md
# https://github.com/singer-io/tap-mongodb/issues/48
stream['schema']['properties'] = {
"_id": {
"type": [
"string",
"null"
]
},
"document": {
"type": [
"object",
"array",
"string",
"null"
]
}
}
pathlib.Path(output_file).parent.mkdir(parents=True, exist_ok=True)
with open(output_file, 'w') as json_file:
json.dump(data, json_file)
Which will sync the entire document into a single JSONB column in PostgreSQL.
If you are then looking to expand the JSONB column into a full table in PostgreSQL, this can be done in SQL statements. Although if using camelCase for document keys in MongoDB then this causes funny behaviour due to snake_case in PostgreSQL. I was able to define the function given here: https://rudism.com/converting-json-keys-to-snake-case-in-postgresql/.
CREATE TEMPORARY TABLE snake_case (document jsonb);
INSERT INTO snake_case SELECT json_keys_to_snake_case FROM <SCHEMA_NAME>.<TABLE_NAME> m, json_keys_to_snake_case(m.document);
INSERT INTO <SCHEMA_NAME>.<TABLE_NAME> (
_id,
document_key1,
document_key2
)
SELECT
t."_id",
t."document_key1",
t."document_key2"
FROM
snake_case m
CROSS JOIN LATERAL
jsonb_to_record(m.document) as t(
_id TEXT,
document_key1 <DATA_TYPE>,
document_key2 <DATA_TYPE>
);
DROP TABLE IF EXISTS snake_case;
Where <SCHEMA_NAME> is the schema the target-postgres sent the data to, <TABLE_NAME> is the table the data is in and document_key1 and document_key2 are example keys from the MongoDB collection