redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

serde/json: experimental asynchronous streaming parser

Open nvartolomei opened this issue 8 months ago • 5 comments

Backports Required

  • [x] none - not a bug fix
  • [ ] none - this is a backport
  • [ ] none - issue does not exist in previous branches
  • [ ] none - papercut/not impactful enough to backport
  • [ ] v25.1.x
  • [ ] v24.3.x
  • [ ] v24.2.x

Release Notes

  • none

nvartolomei avatar May 01 '25 21:05 nvartolomei

:tada: Snyk checks have passed. No issues have been found so far.

:white_check_mark: security/snyk check is complete. No issues have been found. (View Details)

:white_check_mark: license/snyk check is complete. No issues have been found. (View Details)

secpanda avatar May 01 '25 21:05 secpanda

:tada: Snyk checks have passed. No issues have been found so far.

:white_check_mark: security/snyk check is complete. No issues have been found. (View Details)

:white_check_mark: license/snyk check is complete. No issues have been found. (View Details)

secpanda avatar May 01 '25 21:05 secpanda

:tada: Snyk checks have passed. No issues have been found so far.

:white_check_mark: security/snyk check is complete. No issues have been found. (View Details)

:white_check_mark: license/snyk check is complete. No issues have been found. (View Details)

secpanda avatar May 01 '25 21:05 secpanda

CI test results

test results on build#65426
test_id test_kind job_url test_status passed
rptest.tests.scaling_up_test.ScalingUpTest.test_scaling_up_with_recovered_topic ducktape https://buildkite.com/redpanda/redpanda/builds/65426#01968e36-91bb-4151-a04a-89f0792f5bd0 FLAKY 15/21
rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type_and_url_style=.CloudStorageType.S3.1.virtual_host.test_case=.TS_Read==True.TS_TxRangeMaterialized==True ducktape https://buildkite.com/redpanda/redpanda/builds/65426#01968e49-4fe2-4eb2-8555-d34d355d66ba FAIL 0/1
rptest.tests.tiered_storage_pause_test.TestTieredStoragePause.test_safe_pause_resume.allow_gaps_topic_level=None.allow_gaps_cluster_level=True ducktape https://buildkite.com/redpanda/redpanda/builds/65426#01968e36-91bc-447a-ac0e-afa5ff963584 FLAKY 20/21
rptest.tests.upgrade_test.UpgradeBackToBackTest.test_upgrade_with_all_workloads.single_upgrade=False ducktape https://buildkite.com/redpanda/redpanda/builds/65426#01968e49-4fe1-4d99-a59a-a0dcf8ac7cb9 FLAKY 12/21
rptest.transactions.consumer_offsets_test.VerifyConsumerOffsetsThruUpgrades.test_consumer_group_offsets.versions_to_upgrade=1 ducktape https://buildkite.com/redpanda/redpanda/builds/65426#01968e36-91bd-4185-be09-f6254efc44ab FLAKY 20/21
rptest.transactions.producers_api_test.ProducersAdminAPITest.test_producers_state_api_during_load ducktape https://buildkite.com/redpanda/redpanda/builds/65426#01968e49-4fe1-4d99-a59a-a0dcf8ac7cb9 FLAKY 20/21
test results on build#65790
test_class test_method test_arguments test_kind job_url test_status passed reason
AvailabilityTests test_recovery_after_catastrophic_failure ducktape https://buildkite.com/redpanda/redpanda/builds/65790#0196bbeb-0108-435b-82e8-a8528e9952bc FLAKY 20/21 upstream reliability is '99.16666666666667'. current run reliability is '95.23809523809523'. drift is 3.92857 and the allowed drift is set to 50. The test should PASS
DatalakeTransactionTests test_with_transactions {"cloud_storage_type": 1, "compaction": false} ducktape https://buildkite.com/redpanda/redpanda/builds/65790#0196bbeb-0108-4f0d-bc8e-a526beeb58be FLAKY 20/21 upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS
PartitionBalancerTest test_fuzz_admin_ops ducktape https://buildkite.com/redpanda/redpanda/builds/65790#0196bbe1-1c0f-4d1a-81bc-90be291f327f FLAKY 18/21 upstream reliability is '91.43968871595331'. current run reliability is '85.71428571428571'. drift is 5.7254 and the allowed drift is set to 50. The test should PASS
RandomNodeOperationsTest test_node_operations {"cloud_storage_type": 1, "enable_failures": false, "mixed_versions": false, "with_chunked_compaction": true, "with_iceberg": true, "with_tiered_storage": false} ducktape https://buildkite.com/redpanda/redpanda/builds/65790#0196bbe1-1c0e-41e0-941b-68d342181163 FLAKY 20/21 upstream reliability is '98.10426540284361'. current run reliability is '95.23809523809523'. drift is 2.86617 and the allowed drift is set to 50. The test should PASS
test results on build#67813
test_class test_method test_arguments test_kind job_url test_status passed reason
DatalakeDelayedEnablementTest test_enabling_iceberg_in_existing_cluster {"catalog_type": "rest_jdbc", "cloud_storage_type": 1} ducktape https://buildkite.com/redpanda/redpanda/builds/67813#0197a32e-2297-458a-b103-d89d064d5d55 FLAKY 12/21 upstream reliability is '72.1419185282523'. current run reliability is '57.14285714285714'. drift is 14.99906 and the allowed drift is set to 50. The test should PASS
NodesDecommissioningTest test_decommissioning_working_node {"delete_topic": true, "tick_interval": 3600000} ducktape https://buildkite.com/redpanda/redpanda/builds/67813#0197a335-b082-4471-8887-68d5504a778d FLAKY 20/21 upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS
NodesDecommissioningTest test_recycle_all_nodes {"auto_assign_node_id": false} ducktape https://buildkite.com/redpanda/redpanda/builds/67813#0197a335-b081-470f-8ffe-01dc5808f377 FLAKY 20/21 upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS
TieredStorageTest test_tiered_storage {"cloud_storage_type_and_url_style": [1, "virtual_host"], "test_case": {"name": "(TS_Read == True, TS_TxRangeMaterialized == True, SpilloverManifestUploaded == True)"}} ducktape https://buildkite.com/redpanda/redpanda/builds/67813#0197a335-b081-470f-8ffe-01dc5808f377 FAIL 0/1 The test has failed across all retries

vbotbuildovich avatar May 02 '25 02:05 vbotbuildovich

sorry @dotnwat didn't mean to request a review from you :)

rockwotj avatar May 12 '25 17:05 rockwotj

I'm taking over responsibility for this PR. We'd like to merge it soon so we can build some Iceberg with JSON schema capabilities on top of it. I've addressed much of the review feedback, and I think the updated PR is merge-worthy, but there are a few other issues that could be addressed. I'll outline them:

  1. Error handling: The parser does not provide good error messages. It just fails. It's important to improve the error handling so we provide at least a description and position for parsing errors in all cases.
  2. UTF-8 validation: We do not validate that the text is validly UTF-8 encoded. This is important, and should be straightforward to add.
  3. Parse any type at top-level: The updated JSON standard allows any type at the top level, but we accept only objects and arrays. In practice top-level values will be arrays or objects but we should try to match the standard.
  4. Depth limits: We do not bail on deeply nested structures. It's important to add this for safety.
  5. Number handling: There's a couple of alternate proposals for how to handle number parsing. It'd be nice to simplify, and the current approach has some known issues with high-precision parsing, but the current scheme works pretty well.
  6. Fuzzing or randomized testing: We should build some randomized testing, at least. Ideally, we would fuzz test the parser.

wdberkeley avatar Jun 24 '25 15:06 wdberkeley

Retry command for Build#67813

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/tiered_storage_model_test.py::TieredStorageTest.test_tiered_storage@{"cloud_storage_type_and_url_style":[1,"virtual_host"],"test_case":{"name":"(TS_Read == True, TS_TxRangeMaterialized == True, SpilloverManifestUploaded == True)"}}

vbotbuildovich avatar Jun 24 '25 20:06 vbotbuildovich

/ci-repeat 1 tests/rptest/tests/tiered_storage_model_test.py::TieredStorageTest.test_tiered_storage@{"cloud_storage_type_and_url_style":[1,"virtual_host"],"test_case":{"name":"(TS_Read == True, TS_TxRangeMaterialized == True, SpilloverManifestUploaded == True)"}}

wdberkeley avatar Jun 24 '25 22:06 wdberkeley

Tests all pass, but the test results job is borked, going to force merge.

rockwotj avatar Jun 25 '25 14:06 rockwotj

@rockwotj @wdberkeley snagged these coverage reports and thought you might be interested in them.

image image

dotnwat avatar Jun 25 '25 22:06 dotnwat