redash icon indicating copy to clipboard operation
redash copied to clipboard

Use orjson for json_dumps

Open EugeneChung opened this issue 9 months ago • 5 comments

What type of PR is this?

  • [x] Refactor
  • [x] Feature
  • [ ] Bug Fix
  • [ ] New Query Runner (Data Source)
  • [ ] New Alert Destination
  • [ ] Other

Description

Following the discussion in https://github.com/getredash/redash/pull/7339#issuecomment-2684176762, I updated utils.json_dumps to use orjson for improved serialization performance.

Key implementation details:

  • orjson 3.10.15 is used, as it's the last version compatible with Python 3.8.
  • Pre-processing: Before calling orjson.dumps, data is pre-processed recursively using the existing custom JSONEncoder to maintain compatibility with Redash's current serialization specifications.
    • For instance, datetime serialization differs:
      • With the custom JSONEncoder: {"time": "2024-03-01T15:30:45.123"}
      • With orjson: {"time": "2024-03-01T15:30:45.123456"}
    • Note: Unlike the standard json module, orjson does not allow overriding serialization behavior for supported native types. The provided default function isn't called for these built-in supported types.
  • Option Mapping:
    • Default options are set to orjson.OPT_NON_STR_KEYS | orjson.OPT_UTC_Z, aligning with ensure_ascii=False behavior.
    • The sort_keys parameter maps directly to OPT_SORT_KEYS.
  • Testing: Added pytest cases to validate behavior aligned with the existing JSONEncoder specifications.

How is this tested?

  • [X] Unit tests (pytest, jest)
  • [ ] E2E Tests (Cypress)
  • [X] Manually
  • [ ] N/A

For Athena and Trino, the result of select 1.0, cast('NaN' as double), cast('Infinity' as double), cast('-Infinity' as double) is 1.0, null, null, null as expected.

Related Tickets & Documents

  • Issue: https://github.com/getredash/redash/issues/6992
  • Previous PRs:
    • https://github.com/getredash/redash/pull/7339
    • https://github.com/getredash/redash/pull/7348

EugeneChung avatar Mar 31 '25 14:03 EugeneChung

Basic test works

SELECT 'NaN'::float AS not_a_number, 'Inf'::float AS inf, now() AS date

eradman avatar Mar 31 '25 16:03 eradman

(venv) ~/project/private/redash git:[master]
ruff check --fix tests/test_utils.py
warning: The top-level linter settings are deprecated in favour of their counterparts in the `lint` section. Please update the following options in `pyproject.toml`:
  - 'ignore' -> 'lint.ignore'
  - 'select' -> 'lint.select'
  - 'mccabe' -> 'lint.mccabe'
  - 'per-file-ignores' -> 'lint.per-file-ignores'
All checks passed!
(venv) ~/project/private/redash git:[master]
ruff --version
ruff 0.11.2

EugeneChung avatar Apr 01 '25 05:04 EugeneChung

Is there a difference in the way key-value values are formatted?

- Options: {"dbname":"testdb1","host":"example.com"}
+ Options: {"dbname": "testdb1", "host": "example.com"}

https://github.com/getredash/redash/actions/runs/14329251848/job/40418005004?pr=7391

eradman avatar Apr 11 '25 19:04 eradman

@eradman Yes. As I commented, orjson always uses compact separators. I'm going to fix the test.

EugeneChung avatar Apr 12 '25 00:04 EugeneChung

Passed all failed tests. image

EugeneChung avatar Apr 12 '25 01:04 EugeneChung