AFFiNE icon indicating copy to clipboard operation
AFFiNE copied to clipboard

refactor(editor): optimize code block highlighting

Open golok727 opened this issue 6 months ago • 1 comments

Limitation: If a new line is added the subsequent lines needs retokenization. ( Will fix it in next PR )

Summary by CodeRabbit

  • New Features

    • Introduced dynamic, on-demand syntax highlighting for code blocks, providing more accurate and responsive tokenization.
    • Added support for incremental tokenization with state preservation, improving performance when editing large code blocks.
    • Added new syntax highlighting integration using Shiki for enhanced theme and language support.
    • Added a new property to display line indices within code blocks for improved context.
  • Improvements

    • Enhanced syntax highlighting precision by aligning tokens with their exact positions within code lines.
    • Improved handling of language and theme changes with cancellation support for ongoing language loading.
    • Added line-based token retrieval and rendering enhancements for inline code units.
    • Introduced a new method to retrieve full line content for better text processing.
  • Bug Fixes

    • Resolved issues with outdated or incorrect token highlighting during code editing.

golok727 avatar Jun 14 '25 06:06 golok727

Walkthrough

This change introduces a new tokenizer architecture for code blocks, shifting from static, precomputed token arrays to a dynamic, stateful tokenization model. New classes and interfaces for tokenization and Shiki integration are added, and existing components are refactored to use the tokenizer for line-by-line syntax highlighting with improved state management and caching.

Changes

File(s) Change Summary
blocksuite/affine/blocks/code/src/code-block-inline.ts Modified the renderer function in CodeBlockUnitSpecExtension to accept and bind a lineIndex property for rendered components.
blocksuite/affine/blocks/code/src/code-block.ts Replaced static token arrays with a tokenizer$ signal and dynamic tokenization; introduced asynchronous language loading, caching, and stateful tokenization using CodeTokenizer and ShikiTokenProvider. Updated rendering and effects accordingly.
blocksuite/affine/blocks/code/src/highlight/affine-code-unit.ts Refactored to use the tokenizer for dynamic token retrieval per line, added a lineIndex property, and adjusted token offset logic for accurate highlighting within code units. Added a getter for the closest v-line element.
blocksuite/framework/std/src/inline/components/v-line.ts Added a lineContent getter to the VLine class for retrieving concatenated inserted strings from the line's elements.
blocksuite/affine/blocks/code/src/highlight/shiki.ts Introduced a new module integrating Shiki syntax highlighting with a custom tokenizer state and provider, enabling incremental and stateful tokenization compatible with the new architecture.
blocksuite/affine/blocks/code/src/tokenizer/index.ts New index module re-exporting all exports from the new tokenizer.ts and types.ts modules.
blocksuite/affine/blocks/code/src/tokenizer/tokenizer.ts Added a new CodeTokenizer class implementing stateful, cached, line-by-line tokenization with cache validation and state management.
blocksuite/affine/blocks/code/src/tokenizer/types.ts Introduced new TypeScript interfaces and types for tokenizer state, tokens, tokenization results, and token providers.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CodeBlockComponent
    participant CodeTokenizer
    participant ShikiTokenProvider

    User->>CodeBlockComponent: Edits code or changes language
    CodeBlockComponent->>ShikiTokenProvider: (If needed) Load language/theme
    CodeBlockComponent->>CodeTokenizer: Create or update tokenizer
    loop For each visible line
        CodeBlockComponent->>CodeTokenizer: tokenizeLine({lineContent, lineIndex})
        CodeTokenizer->>ShikiTokenProvider: tokenize(lineContent, state)
        ShikiTokenProvider-->>CodeTokenizer: TokenizationResult (tokens, endState)
        CodeTokenizer-->>CodeBlockComponent: Tokens for line
        CodeBlockComponent->>AffineCodeUnit: Render with tokens and lineIndex
    end

Suggested labels

app:core

Suggested reviewers

  • L-Sun

Poem

In the warren where the code lines flow,
A tokenizer hops, with state in tow.
Shiki’s colors shimmer, bright and keen,
Each line is parsed, each token seen.
Caches burrow, deltas gleam—
Syntax highlighting, a coder’s dream!
🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI Review profile: CHILL Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 519359bfbf5d0afed9df71965c7bb0d143376eb2 and 1ea3c1e9b3bb98f0b2e84ebc04f50cdb1d4d135a.

📒 Files selected for processing (2)
  • blocksuite/affine/blocks/code/src/code-block-inline.ts (1 hunks)
  • blocksuite/affine/blocks/code/src/code-block.ts (7 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • blocksuite/affine/blocks/code/src/code-block-inline.ts
  • blocksuite/affine/blocks/code/src/code-block.ts
✨ Finishing Touches
  • [ ] 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

coderabbitai[bot] avatar Jun 14 '25 06:06 coderabbitai[bot]

Codecov Report

Attention: Patch coverage is 3.33333% with 87 lines in your changes missing coverage. Please review.

Project coverage is 56.18%. Comparing base (a6edb61) to head (519359b). Report is 1 commits behind head on canary.

Files with missing lines Patch % Lines
...uite/affine/blocks/code/src/tokenizer/tokenizer.ts 0.00% 35 Missing :warning:
blocksuite/affine/blocks/code/src/code-block.ts 0.00% 22 Missing :warning:
...fine/blocks/code/src/highlight/affine-code-unit.ts 6.25% 15 Missing :warning:
...ocksuite/affine/blocks/code/src/highlight/shiki.ts 13.33% 13 Missing :warning:
...ksuite/affine/blocks/code/src/code-block-inline.ts 0.00% 1 Missing :warning:
...uite/framework/std/src/inline/components/v-line.ts 0.00% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           canary   #12817      +/-   ##
==========================================
- Coverage   56.60%   56.18%   -0.43%     
==========================================
  Files        2667     2669       +2     
  Lines      127713   127784      +71     
  Branches    20211    20146      -65     
==========================================
- Hits        72298    71793     -505     
- Misses      53787    53822      +35     
- Partials     1628     2169     +541     
Flag Coverage Δ
server-test 78.96% <ø> (-0.93%) :arrow_down:
unittest 31.86% <3.33%> (-0.04%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Jun 16 '25 01:06 codecov[bot]

Is there any progress on the retokenization issue you mentioned in PR description ?

L-Sun avatar Jul 03 '25 11:07 L-Sun

@L-Sun This version works better than doing a full retokenization each time. With our current setup, tracking newlines and revalidating the cache isn’t straightforward, so I haven’t found a clean solution yet. This should be solid enough to move forward - feel free to merge.

golok727 avatar Jul 04 '25 08:07 golok727