target-postgres icon indicating copy to clipboard operation
target-postgres copied to clipboard

Fix decimal precision (decimal.InvalidOperation decimal.DivisionImpossible error)

Open rabidaudio opened this issue 3 years ago • 6 comments

Problem

Error message (this is from target-snowflake but the problem is in the shared code):

target-snowflake   | CRITICAL [<class 'decimal.DivisionImpossible'>]
target-snowflake   | INFO MillisLoggingCursor: 72 millis spent executing: ROLLBACK
target-snowflake   | Traceback (most recent call last):
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/bin/target-snowflake", line 8, in <module>
target-snowflake   |     sys.exit(cli())
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/target_snowflake/__init__.py", line 57, in cli
target-snowflake   |     main(args.config)
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/target_snowflake/__init__.py", line 51, in main
target-snowflake   |     target_tools.main(target)
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/target_postgres/target_tools.py", line 28, in main
target-snowflake   |     stream_to_target(input_stream, target, config=config)
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/target_postgres/target_tools.py", line 77, in stream_to_target
target-snowflake   |     raise e
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/target_postgres/target_tools.py", line 64, in stream_to_target
target-snowflake   |     line
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/target_postgres/target_tools.py", line 141, in _line_handler
target-snowflake   |     state_tracker.handle_record_message(line_data['stream'], line_data)
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/target_postgres/stream_tracker.py", line 63, in handle_record_message
target-snowflake   |     self.streams[stream].add_record_message(line_data)
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/target_postgres/singer_stream.py", line 145, in add_record_message
target-snowflake   |     self.validator.validate(record_message['record'])
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/jsonschema/validators.py", line 129, in validate
target-snowflake   |     for error in self.iter_errors(*args, **kwargs):
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/jsonschema/validators.py", line 105, in iter_errors
target-snowflake   |     for error in errors:
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/jsonschema/_validators.py", line 304, in properties_draft4
target-snowflake   |     schema_path=property,
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/jsonschema/validators.py", line 121, in descend
target-snowflake   |     for error in self.iter_errors(instance, schema):
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/jsonschema/validators.py", line 105, in iter_errors
target-snowflake   |     for error in errors:
target-snowflake   |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.7/site-packages/jsonschema/_validators.py", line 127, in multipleOf
target-snowflake   |     failed = instance % dB
target-snowflake   | decimal.InvalidOperation: [<class 'decimal.DivisionImpossible'>]
meltano            | Loading failed (1): decimal.InvalidOperation: [<class 'decimal.DivisionImpossible'>]
meltano            | ELT could not be completed: Loader failed
ELT could not be completed: Loader failed

The problem is that Singer's offical tap-postgres uses minimum, maximum, and multipleOf to effectively report the scale of the column.

For example,

{
  "type": "SCHEMA",
  "stream": "foobar",
  "schema": {
    "type": "object",
    "properties": {
      "id": {
        "type": ["integer"],
        "minimum": -2147483648,
        "maximum": 2147483647
      },
      "sample_numeric": {
        "type": ["null", "number"],
        "exclusiveMaximum": true,
        "maximum": 100000000000000000000000000000000000000000000000000000000000000,
        "multipleOf": 1E-38,
        "exclusiveMinimum": true,
        "minimum": -100000000000000000000000000000000000000000000000000000000000000
      }
    }
  },
  "key_properties": ["id"],
  "bookmark_properties": []
}

{
  "type": "RECORD",
  "stream": "foobar",
  "record": {
    "id": 1,
    "sample_numeric": 0.000913808181253534
  },
  "version": 1596488217992,
  "time_extracted": "2020-08-03T20:56:57.992931Z"
}

This breaks JSON schema validation as when validating multipleOf it tries to do Decimal('0.000913808181253534') % Decimal('1E-38'), as the default decimal precision in python is too small (28 I believe).

Solution

This is a well-known problem that has bitten several targets, for which there are several solutions. One is simply to set the precision to something arbitrarily high, like 40. Another, which the pipelinewise-target-postgres does, is to simply not allow precision higher than some threshold.

Here, I ported a solution I wrote for meltano/target-postgres which sets Python's decimal precision to as large as it needs to be to match the schema. This solution was later ported to meltano/target-snowflake. Open to other solutions though. I would request that any solution also be ported to datamill-co/target-snowflake.

rabidaudio avatar Aug 16 '21 21:08 rabidaudio

Why not making the default precision configurable in the target configuration? Does this PR make sure that the column precision is changed e.g. when a new tap-version sends data with a higher precision?

ghost avatar Aug 17 '21 06:08 ghost

Why not making the default precision configurable in the target configuration?

That would be a fine solution too. I can send an updated PR for this approach if you like.

Does this PR make sure that the column precision is changed e.g. when a new tap-version sends data with a higher precision?

This doesn't effect the database column types, only the in-python JSON Schema validation of the incoming data. But if the schema was updated with new precision then this code would recognize it.

rabidaudio avatar Aug 20 '21 17:08 rabidaudio

A PR for a config option would be great 👍

ghost avatar Aug 20 '21 19:08 ghost

@hz-lschick done 👍

rabidaudio avatar Aug 23 '21 16:08 rabidaudio

I seem to have a common problem with decimal.InvalidOperation: [<class 'decimal.DivisionImpossible'>] Is there a plan to merge this?

r-nyq avatar Mar 07 '22 22:03 r-nyq

Hi @r-nyq we are looking into it and will try to merge it soon

ericboucher avatar Apr 19 '22 18:04 ericboucher