charset-normalizer-rs icon indicating copy to clipboard operation
charset-normalizer-rs copied to clipboard

Improvements : Speed

Open chris-ha458 opened this issue 9 months ago • 16 comments

As per our discussion in #2 for speed improvements the following has been suggested

  • calc coherence & mess in threads
  • or calc mess for plugins in threads (or some async?)
  • or something other...

The paths I had in mind was these:

  • Related to threads idea : use Rayon
    • Replace HashMap with concurrent DashMap (Current std HashMap implements rayon so not strictly necessary, but might be useful to look into regardless)
  • ~Use replace hashing algorithm used in HashMap~ with FxHash, AHash, HighwayHash
    • aHash implemented #14
  • ~Replace sort() with sort_unstable()~ #6
  • ~Identfiy preallocation opportunities~. For instance, replace Vec::new() with Vec::with_capacity()
    • Seems like most current new() cannot really preallocate due to uncertainty. The basic preallocation algorithm is optimized enough that unless we have a strong idea regarding memory access premature allocation is not helpful.

Many of these are low hanging fruit and related to refactoring the code to idiomatic Rust code. For example, there are many for loops in this code. Iterator based code is more idiomatic, easier to improve with rayon, and interact better with allocation. (pushing items from within a for loop can cause multiple allocs and copies, while collecting an iterator can allow fewer allocations.)

chris-ha458 avatar Sep 24 '23 23:09 chris-ha458