target-postgres icon indicating copy to clipboard operation
target-postgres copied to clipboard

invalid_records_* doesn't ignore psycopg2 encoding errors

Open seajhawk opened this issue 1 year ago • 1 comments

I'm getting the following error: target_postgres.exceptions.PostgresError: ('Exception writing records', QueryCanceled("COPY from stdin failed: error in .read() call: UnicodeEncodeError 'utf-8' codec can't encode character '\\udc81' in position 184: surrogates not allowed\nCONTEXT: COPY tmp_b02cfd05_c98b_477c_950e_cb03ed693bab, line 23990\n")) cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres

I was hoping that the invalid_records_* settings would help with this kind of problem, but that doesn't seem to be the case.

Is it supposed to help?

If not, do you have any suggestions for avoiding this kind of problem?

seajhawk avatar Jan 29 '23 01:01 seajhawk

This looks like postgres itself is rejecting your data because it contains an invalid character \\udc81. I don't think the target's validation checks cover this type of issue, which might be why they don't seem to help. Maybe check your data source and the tap to validate that you're not sending invalid data over?

laurentS avatar Feb 13 '23 10:02 laurentS