bun icon indicating copy to clipboard operation
bun copied to clipboard

reduce unnecessary ffi boundary crossing

Open anonrig opened this issue 4 months ago • 1 comments

What does this PR do?

Removes FFI boundary crossing - direct Zig → simdutf instead of Zig → C++ → simdutf

How did you verify your code works?

Existing tests should be sufficient

anonrig avatar Dec 09 '25 03:12 anonrig

Walkthrough

Replaces an external C binding for base64 URL encoding with an internal simdutf-based implementation. The public function signature remains unchanged; only the internal implementation is modified to use the internal library instead of external bindings.

Changes

Cohort / File(s) Summary
Implementation update
src/base64/base64.zig
Removed external C binding declaration for WTF__base64URLEncode and replaced its call with bun.simdutf.base64.encode, while maintaining the same public function signature.
Test coverage
test/js/bun/util/base64-url-safe-simdutf.test.ts
New test suite validating Bun.unsafe.base64.encodeURLSafe with simdutf implementation across multiple scenarios including simple strings, padding validation, URL-safe alphabet usage, and parity with Buffer.toString('base64url').

Suggested reviewers

  • markovejnovic

Pre-merge checks

✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: removing unnecessary FFI boundary crossings by switching to direct simdutf calls from Zig.
Description check ✅ Passed The PR description covers both required template sections: it explains what the change does and how verification was performed, though verification details are minimal.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2028e21d60ee0410d3ce928a4963ec76a75742d1 and d0243ea8421f9d4b1b22d9b13c9ae23d1ad16a49.

📒 Files selected for processing (2)
  • src/base64/base64.zig (1 hunks)
  • test/js/bun/util/base64-url-safe-simdutf.test.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
test/**/*.{js,ts,jsx,tsx}

📄 CodeRabbit inference engine (.cursor/rules/writing-tests.mdc)

test/**/*.{js,ts,jsx,tsx}: Write tests as JavaScript and TypeScript files using Jest-style APIs (test, describe, expect) and import from bun:test Use test.each and data-driven tests to reduce boilerplate when testing multiple similar cases

Files:

  • test/js/bun/util/base64-url-safe-simdutf.test.ts
test/**/*.test.{ts,js,jsx,tsx,mjs,cjs}

📄 CodeRabbit inference engine (test/CLAUDE.md)

test/**/*.test.{ts,js,jsx,tsx,mjs,cjs}: Use bun:test with files that end in *.test.{ts,js,jsx,tsx,mjs,cjs} Do not write flaky tests. Never wait for time to pass in tests; always wait for the condition to be met instead of using an arbitrary amount of time Never use hardcoded port numbers in tests. Always use port: 0 to get a random port Prefer concurrent tests over sequential tests using test.concurrent or describe.concurrent when multiple tests spawn processes or write files, unless it's very difficult to make them concurrent When spawning Bun processes in tests, use bunExe and bunEnv from harness to ensure the same build of Bun is used and debug logging is silenced Use -e flag for single-file tests when spawning Bun processes Use tempDir() from harness to create temporary directories with files for multi-file tests instead of creating files manually Prefer async/await over callbacks in tests When callbacks must be used and it's just a single callback, use Promise.withResolvers to create a promise that can be resolved or rejected from a callback Do not set a timeout on tests. Bun already has timeouts Use Buffer.alloc(count, fill).toString() instead of 'A'.repeat(count) to create repetitive strings in tests, as ''.repeat is very slow in debug JavaScriptCore builds Use describe blocks for grouping related tests Always use await using or using to ensure proper resource cleanup in tests for APIs like Bun.listen, Bun.connect, Bun.spawn, Bun.serve, etc Always check exit codes and test error scenarios in error tests Use describe.each() for parameterized tests Use toMatchSnapshot() for snapshot testing Use beforeAll(), afterEach(), beforeEach() for setup/teardown in tests Track resources (servers, clients) in arrays for cleanup in afterEach()

Files:

  • test/js/bun/util/base64-url-safe-simdutf.test.ts
test/**/*.test.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

test/**/*.test.{ts,tsx}: For single-file tests in Bun test suite, prefer using -e flag over tempDir For multi-file tests in Bun test suite, prefer using tempDir and Bun.spawn Always use port: 0 when spawning servers in tests - do not hardcode ports or use custom random port functions Use normalizeBunSnapshot to normalize snapshot output in tests instead of manual output comparison Never write tests that check for no 'panic', 'uncaught exception', or similar strings in test output - that is not a valid test Use tempDir from harness to create temporary directories in tests - do not use tmpdirSync or fs.mkdtempSync In tests, call expect(stdout).toBe(...) before expect(exitCode).toBe(0) when spawning processes for more useful error messages on failure Do not write flaky tests - do not use setTimeout in tests; instead await the condition to be met since you're testing the CONDITION, not TIME PASSING Verify your test fails with USE_SYSTEM_BUN=1 bun test <file> and passes with bun bd test <file> - tests are not valid if they pass with USE_SYSTEM_BUN=1 Avoid shell commands in tests - do not use find or grep; use Bun's Glob and built-in tools instead Test files must end in .test.ts or .test.tsx and be created in the appropriate test folder structure

Files:

  • test/js/bun/util/base64-url-safe-simdutf.test.ts
src/**/*.{cpp,zig}

📄 CodeRabbit inference engine (.cursor/rules/building-bun.mdc)

src/**/*.{cpp,zig}: Use bun bd or bun run build:debug to build debug versions for C++ and Zig source files; creates debug build at ./build/debug/bun-debug Run tests using bun bd test <test-file> with the debug build; never use bun test directly as it will not include your changes Execute files using bun bd <file> <...args>; never use bun <file> directly as it will not include your changes Enable debug logs for specific scopes using BUN_DEBUG_$(SCOPE)=1 environment variable Code generation happens automatically as part of the build process; no manual code generation commands are required

Files:

  • src/base64/base64.zig
src/**/*.zig

📄 CodeRabbit inference engine (.cursor/rules/building-bun.mdc)

Use bun.Output.scoped(.${SCOPE}, .hidden) for creating debug logs in Zig code

Implement core functionality in Zig, typically in its own directory in src/

src/**/*.zig: Private fields in Zig are fully supported using the # prefix: struct { #foo: u32 }; Use decl literals in Zig for declaration initialization: const decl: Decl = .{ .binding = 0, .value = 0 }; Prefer @import at the bottom of the file (auto formatter will move them automatically)

Be careful with memory management in Zig code - use defer for cleanup with allocators

Files:

  • src/base64/base64.zig
**/*.zig

📄 CodeRabbit inference engine (.cursor/rules/zig-javascriptcore-classes.mdc)

**/*.zig: Expose generated bindings in Zig structs using pub const js = JSC.Codegen.JS<ClassName> with trait conversion methods: toJS, fromJS, and fromJSDirect Use consistent parameter name globalObject instead of ctx in Zig constructor and method implementations Use bun.JSError!JSValue return type for Zig methods and constructors to enable proper error handling and exception propagation Implement resource cleanup using deinit() method that releases resources, followed by finalize() called by the GC that invokes deinit() and frees the pointer Use JSC.markBinding(@src()) in finalize methods for debugging purposes before calling deinit() For methods returning cached properties in Zig, declare external C++ functions using extern fn and callconv(JSC.conv) calling convention Implement getter functions with naming pattern get<PropertyName> in Zig that accept this and globalObject parameters and return JSC.JSValue Access JavaScript CallFrame arguments using callFrame.argument(i), check argument count with callFrame.argumentCount(), and get this with callFrame.thisValue() For reference-counted objects, use .deref() in finalize instead of destroy() to release references to other JS objects

Files:

  • src/base64/base64.zig
🧠 Learnings (16)
📓 Common learnings
Learnt from: Jarred-Sumner
Repo: oven-sh/bun PR: 22278
File: src/ast/E.zig:980-1003
Timestamp: 2025-09-04T02:04:43.094Z
Learning: In Bun's Zig codebase, `as(i32, u8_or_u16_value)` is sufficient for casting u8/u16 to i32 in comparison operations. `intCast` is not required in this context, and the current casting approach compiles successfully.
Learnt from: cirospaciari
Repo: oven-sh/bun PR: 24622
File: src/deps/uws/us_socket_t.zig:112-113
Timestamp: 2025-11-12T04:11:52.293Z
Learning: In Bun's Zig codebase, when passing u32 values to C FFI functions that expect c_uint parameters, no explicit intCast is needed because c_uint is equivalent to u32 on Bun's target platforms and Zig allows implicit coercion between equivalent types. This pattern is used consistently throughout src/deps/uws/us_socket_t.zig in functions like setTimeout, setLongTimeout, and setKeepalive.
Learnt from: fmguerreiro
Repo: oven-sh/bun PR: 23774
File: src/install/PackageManager/updatePackageJSONAndInstall.zig:548-548
Timestamp: 2025-10-24T10:43:09.398Z
Learning: In Bun's Zig codebase, the `as(usize, intCast(...))` cast pattern triggers a Zig compiler bug that causes compilation to hang indefinitely when used in complex control flow contexts (loops + short-circuit operators + optional unwrapping). Avoid this pattern and use simpler alternatives like just `intCast(...)` if type casting is necessary.
📚 Learning: 2025-11-24T18:36:59.706Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: src/bun.js/bindings/v8/CLAUDE.md:0-0
Timestamp: 2025-11-24T18:36:59.706Z
Learning: Applies to src/bun.js/bindings/v8/test/v8/v8.test.ts : Add corresponding test cases to test/v8/v8.test.ts using checkSameOutput() function to compare Node.js and Bun output

Applied to files:

  • test/js/bun/util/base64-url-safe-simdutf.test.ts
📚 Learning: 2025-11-24T18:37:30.259Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: test/CLAUDE.md:0-0
Timestamp: 2025-11-24T18:37:30.259Z
Learning: Applies to test/**/*.test.{ts,js,jsx,tsx,mjs,cjs} : Use `Buffer.alloc(count, fill).toString()` instead of `'A'.repeat(count)` to create repetitive strings in tests, as ''.repeat is very slow in debug JavaScriptCore builds

Applied to files:

  • test/js/bun/util/base64-url-safe-simdutf.test.ts
📚 Learning: 2025-11-24T18:35:08.612Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: .cursor/rules/dev-server-tests.mdc:0-0
Timestamp: 2025-11-24T18:35:08.612Z
Learning: Applies to test/bake/dev/sourcemap.test.ts : Organize source-map tests in sourcemap.test.ts for tests verifying source-maps are correct

Applied to files:

  • test/js/bun/util/base64-url-safe-simdutf.test.ts
📚 Learning: 2025-12-02T05:59:51.485Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-02T05:59:51.485Z
Learning: Applies to test/**/*.test.{ts,tsx} : Verify your test fails with `USE_SYSTEM_BUN=1 bun test <file>` and passes with `bun bd test <file>` - tests are not valid if they pass with `USE_SYSTEM_BUN=1`

Applied to files:

  • test/js/bun/util/base64-url-safe-simdutf.test.ts
📚 Learning: 2025-11-24T18:35:39.205Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: .cursor/rules/registering-bun-modules.mdc:0-0
Timestamp: 2025-11-24T18:35:39.205Z
Learning: Add tests for new Bun runtime functionality

Applied to files:

  • test/js/bun/util/base64-url-safe-simdutf.test.ts
📚 Learning: 2025-11-24T18:37:30.259Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: test/CLAUDE.md:0-0
Timestamp: 2025-11-24T18:37:30.259Z
Learning: Applies to test/**/*.test.{ts,js,jsx,tsx,mjs,cjs} : Use `bun:test` with files that end in `*.test.{ts,js,jsx,tsx,mjs,cjs}`

Applied to files:

  • test/js/bun/util/base64-url-safe-simdutf.test.ts
📚 Learning: 2025-12-02T05:59:51.485Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-02T05:59:51.485Z
Learning: Applies to test/**/*.test.{ts,tsx} : Use `normalizeBunSnapshot` to normalize snapshot output in tests instead of manual output comparison

Applied to files:

  • test/js/bun/util/base64-url-safe-simdutf.test.ts
📚 Learning: 2025-11-24T18:35:08.612Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: .cursor/rules/dev-server-tests.mdc:0-0
Timestamp: 2025-11-24T18:35:08.612Z
Learning: Applies to test/bake/dev/esm.test.ts : Organize ESM tests in esm.test.ts for tests about various ESM features in development mode

Applied to files:

  • test/js/bun/util/base64-url-safe-simdutf.test.ts
📚 Learning: 2025-11-24T18:35:50.422Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: .cursor/rules/writing-tests.mdc:0-0
Timestamp: 2025-11-24T18:35:50.422Z
Learning: Applies to test/**/*.{js,ts,jsx,tsx} : Write tests as JavaScript and TypeScript files using Jest-style APIs (`test`, `describe`, `expect`) and import from `bun:test`

Applied to files:

  • test/js/bun/util/base64-url-safe-simdutf.test.ts
📚 Learning: 2025-10-18T05:23:24.403Z
Learnt from: theshadow27
Repo: oven-sh/bun PR: 23798
File: test/js/bun/telemetry-server.test.ts:91-100
Timestamp: 2025-10-18T05:23:24.403Z
Learning: In the Bun codebase, telemetry tests (test/js/bun/telemetry-*.test.ts) should focus on telemetry API behavior: configure/disable/isEnabled, callback signatures and invocation, request ID correlation, and error handling. HTTP protocol behaviors like status code normalization (e.g., 200 with empty body → 204) should be tested in HTTP server tests (test/js/bun/http/), not in telemetry tests. Keep separation of concerns: telemetry tests verify the telemetry API contract; HTTP tests verify HTTP semantics.

Applied to files:

  • test/js/bun/util/base64-url-safe-simdutf.test.ts
📚 Learning: 2025-10-25T17:20:19.041Z
Learnt from: theshadow27
Repo: oven-sh/bun PR: 24063
File: test/js/bun/telemetry/server-header-injection.test.ts:5-20
Timestamp: 2025-10-25T17:20:19.041Z
Learning: In the Bun telemetry codebase, tests are organized into two distinct layers: (1) Internal API tests in test/js/bun/telemetry/ use numeric InstrumentKind enum values to test Zig↔JS injection points and low-level integration; (2) Public API tests in packages/bun-otel/test/ use string InstrumentKind values ("http", "fetch", etc.) to test the public-facing BunSDK and instrumentation APIs. This separation allows internal tests to use efficient numeric enums for refactoring flexibility while the public API maintains a developer-friendly string-based interface.

Applied to files:

  • test/js/bun/util/base64-url-safe-simdutf.test.ts
📚 Learning: 2025-11-24T18:36:59.706Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: src/bun.js/bindings/v8/CLAUDE.md:0-0
Timestamp: 2025-11-24T18:36:59.706Z
Learning: Ensure V8 API tests compare identical C++ code output between Node.js and Bun through the test suite validation process

Applied to files:

  • test/js/bun/util/base64-url-safe-simdutf.test.ts
📚 Learning: 2025-10-19T02:44:46.354Z
Learnt from: theshadow27
Repo: oven-sh/bun PR: 23798
File: packages/bun-otel/context-propagation.test.ts:1-1
Timestamp: 2025-10-19T02:44:46.354Z
Learning: In the Bun repository, standalone packages under packages/ (e.g., bun-vscode, bun-inspector-protocol, bun-plugin-yaml, bun-plugin-svelte, bun-debug-adapter-protocol, bun-otel) co-locate their tests with package source code using *.test.ts files. This follows standard npm/monorepo patterns. The test/ directory hierarchy (test/js/bun/, test/cli/, test/js/node/) is reserved for testing Bun's core runtime APIs and built-in functionality, not standalone packages.

Applied to files:

  • test/js/bun/util/base64-url-safe-simdutf.test.ts
📚 Learning: 2025-11-24T18:37:30.259Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: test/CLAUDE.md:0-0
Timestamp: 2025-11-24T18:37:30.259Z
Learning: Applies to test/**/*.test.{ts,js,jsx,tsx,mjs,cjs} : When spawning Bun processes in tests, use `bunExe` and `bunEnv` from `harness` to ensure the same build of Bun is used and debug logging is silenced

Applied to files:

  • test/js/bun/util/base64-url-safe-simdutf.test.ts
📚 Learning: 2025-11-24T18:36:59.706Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: src/bun.js/bindings/v8/CLAUDE.md:0-0
Timestamp: 2025-11-24T18:36:59.706Z
Learning: Applies to src/bun.js/bindings/v8/src/napi/napi.zig : For each new V8 C++ method, add both GCC/Clang and MSVC mangled symbol names to the V8API struct in src/napi/napi.zig using extern fn declarations

Applied to files:

  • src/base64/base64.zig
🔇 Additional comments (2)
test/js/bun/util/base64-url-safe-simdutf.test.ts (1)

3-99: Verify test exercises new encodeURLSafe path with USE_SYSTEM_BUN=1 and bun bd before submitting

The test suite covers essential cases: URL-safe alphabet (- and _ instead of + and /), padding removal, empty/single-byte inputs, larger payloads, and parity with Buffer.toString("base64url"). Expectations for "Man" → "TWFu", "Woman" → "V29tYW4", and the 0xff pattern match RFC 4648 semantics.

However, per test guidelines, this test must fail when run with USE_SYSTEM_BUN=1 bun test test/js/bun/util/base64-url-safe-simdutf.test.ts and pass with bun bd test test/js/bun/util/base64-url-safe-simdutf.test.ts. If the test passes under USE_SYSTEM_BUN=1, it is not validating your new implementation—it may be exercising legacy code or Node.js compatibility paths instead.

src/base64/base64.zig (1)

101-103: encodeURLSafe now correctly delegates directly to simdutf; verify callsites and buffer sizing

Switching encodeURLSafe to call bun.simdutf.base64.encode(source, dest, true) removes the extra C++ FFI hop and aligns with the PR goal.

Two follow-ups worth verifying:

  1. Consider whether delegating to the existing simdutfEncodeUrlSafe helper would reduce duplication and centralize URL-safe behavior in one place.

  2. Scan callsites to ensure all buffers are sized appropriately (e.g. via urlSafeEncodeLen or simdutfEncodeLenUrlSafe) and that nothing relied on quirks of the old FFI implementation.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Dec 09 '25 03:12 coderabbitai[bot]