undici icon indicating copy to clipboard operation
undici copied to clipboard

address bad usage of TST

Open ywave620 opened this issue 11 months ago • 5 comments

This relates to performance of parseHeaders

Rationale

Inserting keys to TST in order yields the worst performance of this data structure. It is similar to inserting keys to a binary search tree in order makes it degenerate to a link list.

Randomize the insertion order simply works, but we do not want any potential instable performance, so I take a snapshot of one randomized insertion order that works well

WeChatWorkScreenshot_ea6ff026-5e0d-44b5-8dce-4da5858291c9

Changes

Randomize the insertion order of the well known header keys

Features

N/A

Bug Fixes

N/A

Breaking Changes and Deprecations

N/A

Status

  • [x] I have read and agreed to the Developer's Certificate of Origin
  • [x] Tested
  • [x] Benchmarked (optional)
  • [S] Documented
  • [ ] Review ready
  • [ ] In review
  • [ ] Merge ready

ywave620 avatar Feb 07 '25 08:02 ywave620

@metcoder95 could you help move this forward

ywave620 avatar Feb 12 '25 12:02 ywave620

If not properly shuffled to balance the left and right sides, it will cause a bad case and further degrade performance. This is similar to the problem with binary trees.

tsctx avatar Feb 12 '25 13:02 tsctx

If not properly shuffled to balance the left and right sides, it will cause a bad case and further degrade performance. This is similar to the problem with binary trees.

Given the array is sorted, randomly retrieving item from it and inserting to a TST/Binary search tree should yield a balanced tree. And the bottom line is that we definitely don't want to insert items to a TST in order because this is the worst case of this data structure.

Maybe I can print out the TST and do a comparison? I did a non formal one before and it turned out that when searching for Via we actually saved a lot of node traversing

ywave620 avatar Feb 14 '25 02:02 ywave620

Seems CI is not happy, can you try to rebase with main and adjust?

metcoder95 avatar Feb 14 '25 08:02 metcoder95

However I would prefer if:

  1. a script to generate a "working well" snapshot would be placed in scripts
  2. a test that verifies that the snapshots contains all keys is added

mcollina avatar Aug 22 '25 19:08 mcollina