AFFiNE refactor(editor): optimize code block highlighting

Limitation: If a new line is added the subsequent lines needs retokenization. ( Will fix it in next PR )

Summary by CodeRabbit

New Features
- Introduced dynamic, on-demand syntax highlighting for code blocks, providing more accurate and responsive tokenization.
- Added support for incremental tokenization with state preservation, improving performance when editing large code blocks.
- Added new syntax highlighting integration using Shiki for enhanced theme and language support.
- Added a new property to display line indices within code blocks for improved context.
Improvements
- Enhanced syntax highlighting precision by aligning tokens with their exact positions within code lines.
- Improved handling of language and theme changes with cancellation support for ongoing language loading.
- Added line-based token retrieval and rendering enhancements for inline code units.
- Introduced a new method to retrieve full line content for better text processing.
Bug Fixes
- Resolved issues with outdated or incorrect token highlighting during code editing.

Jun 14 '25 06:06 golok727

Walkthrough

This change introduces a new tokenizer architecture for code blocks, shifting from static, precomputed token arrays to a dynamic, stateful tokenization model. New classes and interfaces for tokenization and Shiki integration are added, and existing components are refactored to use the tokenizer for line-by-line syntax highlighting with improved state management and caching.

Changes

File(s)	Change Summary
blocksuite/affine/blocks/code/src/code-block-inline.ts	Modified the `renderer` function in `CodeBlockUnitSpecExtension` to accept and bind a `lineIndex` property for rendered components.
blocksuite/affine/blocks/code/src/code-block.ts	Replaced static token arrays with a `tokenizer$` signal and dynamic tokenization; introduced asynchronous language loading, caching, and stateful tokenization using `CodeTokenizer` and `ShikiTokenProvider`. Updated rendering and effects accordingly.
blocksuite/affine/blocks/code/src/highlight/affine-code-unit.ts	Refactored to use the tokenizer for dynamic token retrieval per line, added a `lineIndex` property, and adjusted token offset logic for accurate highlighting within code units. Added a getter for the closest `v-line` element.
blocksuite/framework/std/src/inline/components/v-line.ts	Added a `lineContent` getter to the `VLine` class for retrieving concatenated inserted strings from the line's elements.
blocksuite/affine/blocks/code/src/highlight/shiki.ts	Introduced a new module integrating Shiki syntax highlighting with a custom tokenizer state and provider, enabling incremental and stateful tokenization compatible with the new architecture.
blocksuite/affine/blocks/code/src/tokenizer/index.ts	New index module re-exporting all exports from the new `tokenizer.ts` and `types.ts` modules.
blocksuite/affine/blocks/code/src/tokenizer/tokenizer.ts	Added a new `CodeTokenizer` class implementing stateful, cached, line-by-line tokenization with cache validation and state management.
blocksuite/affine/blocks/code/src/tokenizer/types.ts	Introduced new TypeScript interfaces and types for tokenizer state, tokens, tokenization results, and token providers.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CodeBlockComponent
    participant CodeTokenizer
    participant ShikiTokenProvider

    User->>CodeBlockComponent: Edits code or changes language
    CodeBlockComponent->>ShikiTokenProvider: (If needed) Load language/theme
    CodeBlockComponent->>CodeTokenizer: Create or update tokenizer
    loop For each visible line
        CodeBlockComponent->>CodeTokenizer: tokenizeLine({lineContent, lineIndex})
        CodeTokenizer->>ShikiTokenProvider: tokenize(lineContent, state)
        ShikiTokenProvider-->>CodeTokenizer: TokenizationResult (tokens, endState)
        CodeTokenizer-->>CodeBlockComponent: Tokens for line
        CodeBlockComponent->>AffineCodeUnit: Render with tokens and lineIndex
    end

Suggested labels

app:core

Suggested reviewers

L-Sun

Poem

In the warren where the code lines flow,
A tokenizer hops, with state in tow.
Shiki’s colors shimmer, bright and keen,
Each line is parsed, each token seen.
Caches burrow, deltas gleam—
Syntax highlighting, a coder’s dream!
🐇✨

📜 Recent review details

Configuration used: CodeRabbit UI Review profile: CHILL Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 519359bfbf5d0afed9df71965c7bb0d143376eb2 and 1ea3c1e9b3bb98f0b2e84ebc04f50cdb1d4d135a.

📒 Files selected for processing (2)

blocksuite/affine/blocks/code/src/code-block-inline.ts (1 hunks)
blocksuite/affine/blocks/code/src/code-block.ts (7 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

blocksuite/affine/blocks/code/src/code-block-inline.ts
blocksuite/affine/blocks/code/src/code-block.ts

✨ Finishing Touches

[ ] 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Jun 14 '25 06:06 coderabbitai[bot]

Codecov Report

Attention: Patch coverage is 3.33333% with 87 lines in your changes missing coverage. Please review.

Project coverage is 56.18%. Comparing base (a6edb61) to head (519359b). Report is 1 commits behind head on canary.

Files with missing lines	Patch %	Lines
...uite/affine/blocks/code/src/tokenizer/tokenizer.ts	0.00%	35 Missing :warning:
blocksuite/affine/blocks/code/src/code-block.ts	0.00%	22 Missing :warning:
...fine/blocks/code/src/highlight/affine-code-unit.ts	6.25%	15 Missing :warning:
...ocksuite/affine/blocks/code/src/highlight/shiki.ts	13.33%	13 Missing :warning:
...ksuite/affine/blocks/code/src/code-block-inline.ts	0.00%	1 Missing :warning:
...uite/framework/std/src/inline/components/v-line.ts	0.00%	1 Missing :warning:

Additional details and impacted files

@@            Coverage Diff             @@
##           canary   #12817      +/-   ##
==========================================
- Coverage   56.60%   56.18%   -0.43%     
==========================================
  Files        2667     2669       +2     
  Lines      127713   127784      +71     
  Branches    20211    20146      -65     
==========================================
- Hits        72298    71793     -505     
- Misses      53787    53822      +35     
- Partials     1628     2169     +541

Flag	Coverage Δ
server-test	`78.96% <ø> (-0.93%)`	:arrow_down:
unittest	`31.86% <3.33%> (-0.04%)`	:arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
:package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Jun 16 '25 01:06 codecov[bot]

Is there any progress on the retokenization issue you mentioned in PR description ?

Jul 03 '25 11:07 L-Sun

@L-Sun This version works better than doing a full retokenization each time. With our current setup, tracking newlines and revalidating the cache isn’t straightforward, so I haven’t found a clean solution yet. This should be solid enough to move forward - feel free to merge.

Jul 04 '25 08:07 golok727

refactor(editor): optimize code block highlighting

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Suggested labels

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Codecov Report

CodeRabbit Configuration File (`.coderabbit.yaml`)