refactor(editor): optimize code block highlighting
Limitation: If a new line is added the subsequent lines needs retokenization. ( Will fix it in next PR )
Summary by CodeRabbit
-
New Features
- Introduced dynamic, on-demand syntax highlighting for code blocks, providing more accurate and responsive tokenization.
- Added support for incremental tokenization with state preservation, improving performance when editing large code blocks.
- Added new syntax highlighting integration using Shiki for enhanced theme and language support.
- Added a new property to display line indices within code blocks for improved context.
-
Improvements
- Enhanced syntax highlighting precision by aligning tokens with their exact positions within code lines.
- Improved handling of language and theme changes with cancellation support for ongoing language loading.
- Added line-based token retrieval and rendering enhancements for inline code units.
- Introduced a new method to retrieve full line content for better text processing.
-
Bug Fixes
- Resolved issues with outdated or incorrect token highlighting during code editing.
Walkthrough
This change introduces a new tokenizer architecture for code blocks, shifting from static, precomputed token arrays to a dynamic, stateful tokenization model. New classes and interfaces for tokenization and Shiki integration are added, and existing components are refactored to use the tokenizer for line-by-line syntax highlighting with improved state management and caching.
Changes
| File(s) | Change Summary |
|---|---|
| blocksuite/affine/blocks/code/src/code-block-inline.ts | Modified the renderer function in CodeBlockUnitSpecExtension to accept and bind a lineIndex property for rendered components. |
| blocksuite/affine/blocks/code/src/code-block.ts | Replaced static token arrays with a tokenizer$ signal and dynamic tokenization; introduced asynchronous language loading, caching, and stateful tokenization using CodeTokenizer and ShikiTokenProvider. Updated rendering and effects accordingly. |
| blocksuite/affine/blocks/code/src/highlight/affine-code-unit.ts | Refactored to use the tokenizer for dynamic token retrieval per line, added a lineIndex property, and adjusted token offset logic for accurate highlighting within code units. Added a getter for the closest v-line element. |
| blocksuite/framework/std/src/inline/components/v-line.ts | Added a lineContent getter to the VLine class for retrieving concatenated inserted strings from the line's elements. |
| blocksuite/affine/blocks/code/src/highlight/shiki.ts | Introduced a new module integrating Shiki syntax highlighting with a custom tokenizer state and provider, enabling incremental and stateful tokenization compatible with the new architecture. |
| blocksuite/affine/blocks/code/src/tokenizer/index.ts | New index module re-exporting all exports from the new tokenizer.ts and types.ts modules. |
| blocksuite/affine/blocks/code/src/tokenizer/tokenizer.ts | Added a new CodeTokenizer class implementing stateful, cached, line-by-line tokenization with cache validation and state management. |
| blocksuite/affine/blocks/code/src/tokenizer/types.ts | Introduced new TypeScript interfaces and types for tokenizer state, tokens, tokenization results, and token providers. |
Sequence Diagram(s)
sequenceDiagram
participant User
participant CodeBlockComponent
participant CodeTokenizer
participant ShikiTokenProvider
User->>CodeBlockComponent: Edits code or changes language
CodeBlockComponent->>ShikiTokenProvider: (If needed) Load language/theme
CodeBlockComponent->>CodeTokenizer: Create or update tokenizer
loop For each visible line
CodeBlockComponent->>CodeTokenizer: tokenizeLine({lineContent, lineIndex})
CodeTokenizer->>ShikiTokenProvider: tokenize(lineContent, state)
ShikiTokenProvider-->>CodeTokenizer: TokenizationResult (tokens, endState)
CodeTokenizer-->>CodeBlockComponent: Tokens for line
CodeBlockComponent->>AffineCodeUnit: Render with tokens and lineIndex
end
Suggested labels
app:core
Suggested reviewers
- L-Sun
Poem
In the warren where the code lines flow,
A tokenizer hops, with state in tow.
Shiki’s colors shimmer, bright and keen,
Each line is parsed, each token seen.
Caches burrow, deltas gleam—
Syntax highlighting, a coder’s dream!
🐇✨
📜 Recent review details
Configuration used: CodeRabbit UI Review profile: CHILL Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between 519359bfbf5d0afed9df71965c7bb0d143376eb2 and 1ea3c1e9b3bb98f0b2e84ebc04f50cdb1d4d135a.
📒 Files selected for processing (2)
blocksuite/affine/blocks/code/src/code-block-inline.ts(1 hunks)blocksuite/affine/blocks/code/src/code-block.ts(7 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- blocksuite/affine/blocks/code/src/code-block-inline.ts
- blocksuite/affine/blocks/code/src/code-block.ts
✨ Finishing Touches
- [ ] 📝 Generate Docstrings
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
🪧 Tips
Chat
There are 3 ways to chat with CodeRabbit:
- Review comments: Directly reply to a review comment made by CodeRabbit. Example:
I pushed a fix in commit <commit_id>, please review it.Explain this complex logic.Open a follow-up GitHub issue for this discussion.
- Files and specific lines of code (under the "Files changed" tab): Tag
@coderabbitaiin a new review comment at the desired location with your query. Examples:@coderabbitai explain this code block.@coderabbitai modularize this function.
- PR comments: Tag
@coderabbitaiin a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.@coderabbitai read src/utils.ts and explain its main purpose.@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.@coderabbitai help me debug CodeRabbit configuration file.
Support
Need help? Create a ticket on our support page for assistance with any issues or questions.
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.
CodeRabbit Commands (Invoked using PR comments)
@coderabbitai pauseto pause the reviews on a PR.@coderabbitai resumeto resume the paused reviews.@coderabbitai reviewto trigger an incremental review. This is useful when automatic reviews are disabled for the repository.@coderabbitai full reviewto do a full review from scratch and review all the files again.@coderabbitai summaryto regenerate the summary of the PR.@coderabbitai generate docstringsto generate docstrings for this PR.@coderabbitai generate sequence diagramto generate a sequence diagram of the changes in this PR.@coderabbitai resolveresolve all the CodeRabbit review comments.@coderabbitai configurationto show the current CodeRabbit configuration for the repository.@coderabbitai helpto get help.
Other keywords and placeholders
- Add
@coderabbitai ignoreanywhere in the PR description to prevent this PR from being reviewed. - Add
@coderabbitai summaryto generate the high-level summary at a specific location in the PR description. - Add
@coderabbitaianywhere in the PR title to generate the title automatically.
CodeRabbit Configuration File (.coderabbit.yaml)
- You can programmatically configure CodeRabbit by adding a
.coderabbit.yamlfile to the root of your repository. - Please see the configuration documentation for more information.
- If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation:
# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
Documentation and Community
- Visit our Documentation for detailed information on how to use CodeRabbit.
- Join our Discord Community to get help, request features, and share feedback.
- Follow us on X/Twitter for updates and announcements.
Codecov Report
Attention: Patch coverage is 3.33333% with 87 lines in your changes missing coverage. Please review.
Project coverage is 56.18%. Comparing base (
a6edb61) to head (519359b). Report is 1 commits behind head on canary.
Additional details and impacted files
@@ Coverage Diff @@
## canary #12817 +/- ##
==========================================
- Coverage 56.60% 56.18% -0.43%
==========================================
Files 2667 2669 +2
Lines 127713 127784 +71
Branches 20211 20146 -65
==========================================
- Hits 72298 71793 -505
- Misses 53787 53822 +35
- Partials 1628 2169 +541
| Flag | Coverage Δ | |
|---|---|---|
| server-test | 78.96% <ø> (-0.93%) |
:arrow_down: |
| unittest | 31.86% <3.33%> (-0.04%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
Is there any progress on the retokenization issue you mentioned in PR description ?
@L-Sun This version works better than doing a full retokenization each time. With our current setup, tracking newlines and revalidating the cache isn’t straightforward, so I haven’t found a clean solution yet. This should be solid enough to move forward - feel free to merge.