target-postgres
target-postgres copied to clipboard
Fix decimal precision (decimal.InvalidOperation decimal.DivisionImpossible error)
Problem
Error message (this is from target-snowflake but the problem is in the shared code):
target-snowflake | CRITICAL [<class 'decimal.DivisionImpossible'>]
target-snowflake | INFO MillisLoggingCursor: 72 millis spent executing: ROLLBACK
target-snowflake | Traceback (most recent call last):
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/bin/target-snowflake", line 8, in <module>
target-snowflake | sys.exit(cli())
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/target_snowflake/__init__.py", line 57, in cli
target-snowflake | main(args.config)
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/target_snowflake/__init__.py", line 51, in main
target-snowflake | target_tools.main(target)
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/target_postgres/target_tools.py", line 28, in main
target-snowflake | stream_to_target(input_stream, target, config=config)
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/target_postgres/target_tools.py", line 77, in stream_to_target
target-snowflake | raise e
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/target_postgres/target_tools.py", line 64, in stream_to_target
target-snowflake | line
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/target_postgres/target_tools.py", line 141, in _line_handler
target-snowflake | state_tracker.handle_record_message(line_data['stream'], line_data)
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/target_postgres/stream_tracker.py", line 63, in handle_record_message
target-snowflake | self.streams[stream].add_record_message(line_data)
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/target_postgres/singer_stream.py", line 145, in add_record_message
target-snowflake | self.validator.validate(record_message['record'])
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/jsonschema/validators.py", line 129, in validate
target-snowflake | for error in self.iter_errors(*args, **kwargs):
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/jsonschema/validators.py", line 105, in iter_errors
target-snowflake | for error in errors:
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/jsonschema/_validators.py", line 304, in properties_draft4
target-snowflake | schema_path=property,
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/jsonschema/validators.py", line 121, in descend
target-snowflake | for error in self.iter_errors(instance, schema):
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/jsonschema/validators.py", line 105, in iter_errors
target-snowflake | for error in errors:
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/jsonschema/_validators.py", line 127, in multipleOf
target-snowflake | failed = instance % dB
target-snowflake | decimal.InvalidOperation: [<class 'decimal.DivisionImpossible'>]
meltano | Loading failed (1): decimal.InvalidOperation: [<class 'decimal.DivisionImpossible'>]
meltano | ELT could not be completed: Loader failed
ELT could not be completed: Loader failed
The problem is that Singer's offical tap-postgres
uses minimum
, maximum
, and multipleOf
to effectively report the scale of the column.
For example,
{
"type": "SCHEMA",
"stream": "foobar",
"schema": {
"type": "object",
"properties": {
"id": {
"type": ["integer"],
"minimum": -2147483648,
"maximum": 2147483647
},
"sample_numeric": {
"type": ["null", "number"],
"exclusiveMaximum": true,
"maximum": 100000000000000000000000000000000000000000000000000000000000000,
"multipleOf": 1E-38,
"exclusiveMinimum": true,
"minimum": -100000000000000000000000000000000000000000000000000000000000000
}
}
},
"key_properties": ["id"],
"bookmark_properties": []
}
{
"type": "RECORD",
"stream": "foobar",
"record": {
"id": 1,
"sample_numeric": 0.000913808181253534
},
"version": 1596488217992,
"time_extracted": "2020-08-03T20:56:57.992931Z"
}
This breaks JSON schema validation as when validating multipleOf
it tries to do Decimal('0.000913808181253534') % Decimal('1E-38')
, as the default decimal precision in python is too small (28 I believe).
Solution
This is a well-known problem that has bitten several targets, for which there are several solutions. One is simply to set the precision to something arbitrarily high, like 40. Another, which the pipelinewise-target-postgres
does, is to simply not allow precision higher than some threshold.
Here, I ported a solution I wrote for meltano/target-postgres
which sets Python's decimal precision to as large as it needs to be to match the schema. This solution was later ported to meltano/target-snowflake
. Open to other solutions though. I would request that any solution also be ported to datamill-co/target-snowflake
.
Why not making the default precision configurable in the target configuration? Does this PR make sure that the column precision is changed e.g. when a new tap-version sends data with a higher precision?
Why not making the default precision configurable in the target configuration?
That would be a fine solution too. I can send an updated PR for this approach if you like.
Does this PR make sure that the column precision is changed e.g. when a new tap-version sends data with a higher precision?
This doesn't effect the database column types, only the in-python JSON Schema validation of the incoming data. But if the schema was updated with new precision then this code would recognize it.
A PR for a config option would be great 👍
@hz-lschick done 👍
I seem to have a common problem with decimal.InvalidOperation: [<class 'decimal.DivisionImpossible'>] Is there a plan to merge this?
Hi @r-nyq we are looking into it and will try to merge it soon