bloom-filters icon indicating copy to clipboard operation
bloom-filters copied to clipboard

Browser friendliness?

Open achingbrain opened this issue 1 year ago • 4 comments

I'd really like to use this module in browsers but the bundle size ends up being very large, just importing the CuckooFilter adds over 50KB of minified code to the bundle.

Using esbuild to bundle just this script:

import { CuckooFilter } from 'bloom-filters'

const filter = new CuckooFilter(1, 2, 3, 4)
console.info(filter.has('hello world'))

Creates a 216KB bundle (63KB minified), and I have to shim some node internals (e.g. the Buffer class) for it to work:

image

The CuckooFilter implementation itself is 8.4KB un-minified so there's a lot of unused code here.

Would you be open to some PRs that make this module more browser friendly?

Some low hanging fruit I can see immediately:

  1. Switch tsc to output ESM for better tree shaking
  2. Replace use of node Buffers with built-in Uint8Arrays
  3. Remove, replace or optimise use of lodash
  4. Remove long dependency and use built-in BigInts

The first is a breaking change but will yield the biggest benefit - the breaking change is that consumers will no longer be able to require the module, they must import it via static or dynamic imports.

The second is breaking where Buffers are used as return values which from what I can see only appears to be in the Invertible Bloom Lookup Table implementation.

The rest are just internal changes so are non-breaking.

achingbrain avatar Apr 27 '24 06:04 achingbrain

One spanner in the works here is that this module uses an older, incompatible version of JavaScript decorators so updating TypeScript is quite painful.

The decorators just add saveAsJSON/fromJSON methods to the filters. Is it necessary to use decorators for this? Could they just be regular methods instead?

achingbrain avatar Apr 27 '24 08:04 achingbrain

Hello 🖖 I'm working on a 4.0.0-alpha.0 available here https://github.com/Callidon/bloom-filters/tree/next/4.0.0. What's new in this version?

  • All packages are updated to latest; means a lot of things were broken but partially fixed now I still need to fix some tests
  • Use @node-rs/xxhash instead of xxhashjs. A very good implementation and a very good boost in performance. Plus! It uses WASM when bundled in a browser 👍
  • I removed the lodash packages to import only specific ones with lodash.xxxx.
  • Switch the package as ESM, it is still in typescript .mts but compiled as .mjs
  • I added examples for building using rspack and webpack
  • I added an example when using pure mjs with node
  • eslint is now very strict, it extends: @typescript-eslint/strict-type-checked
  • use jest instead of mocha; remove all describe to be able to use beforeAll/beforeEach if necessary
  • The hashing library is only used in the Hashing class in a static variable called lib which only exports xxh32 and xxh64 so that the functions are available everywhere by calling this._hashing.lib.xxh[32/64] or Hashing.lib.xxh[32/64]
  • the package manager is now yarn 2 yarn@berry
  • The test suite is now a complete run of:
    • compile with tsc
    • run rspack bundle
    • run webpack bundle
    • run prettier
    • run eslint
    • run jest
  • no more decorators; it is now plain fromJSON and saveAsJSON functions in the different classes.

I would also like to remove the long and buffer dependencies, so if want to help you are welcome to create a PR from this new branch.

folkvir avatar May 03 '24 10:05 folkvir

Let me get a stable state before going on! I need to fix the tests. With the new xxhash package I introduced good bugs dealing with bigints. This will prepare the work for the long package.

folkvir avatar May 03 '24 22:05 folkvir

Update: buffer and long are not used anymore. The draft https://github.com/Callidon/bloom-filters/pull/71 is in progress but is working as expected. Take a look, I will be happy to get feedback, especially on the usage of the wasm which takes around 347kb 😬

folkvir avatar May 16 '24 19:05 folkvir