AFFiNE icon indicating copy to clipboard operation
AFFiNE copied to clipboard

feat: import .docx files

Open UNIDY2002 opened this issue 8 months ago • 1 comments

Support importing .docx files, as mentioned in https://github.com/toeverything/AFFiNE/issues/10154#issuecomment-2655744757

It essentially uses mammoth to convert the docx to html, and then imports the html with the standard steps.

UNIDY2002 avatar Apr 17 '25 11:04 UNIDY2002

How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

graphite-app[bot] avatar Apr 17 '25 11:04 graphite-app[bot]

[!WARNING]

Rate limit exceeded

@darkskygit has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 4 minutes and 35 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 4d6532da7a26772879ee0ea489013a8f5d6ccbc2 and 8e83125fef5df5671851dec90afe5c07fbf1b543.

⛔ Files ignored due to path filters (1)
  • yarn.lock is excluded by !**/yarn.lock, !**/*.lock
📒 Files selected for processing (4)
  • blocksuite/affine/blocks/root/package.json (1 hunks)
  • blocksuite/affine/shared/package.json (1 hunks)
  • blocksuite/affine/shared/src/adapters/html/html.ts (2 hunks)
  • blocksuite/framework/std/package.json (1 hunks)

Walkthrough

Adds .docx import support: new DocxTransformer using mammoth to convert .docx Blobs to HTML and import via HtmlTransformer; file picker and import dialog accept .docx; mammoth dependency added; transformer exported; i18n keys for UI labels/tooltips added.

Changes

Cohort / File(s) Summary
File system utilities
blocksuite/affine/shared/src/utils/file/filesys.ts
Added Docx file type entry (MIME + .docx) and extended AcceptTypes to include 'Docx'.
Docx transformer
blocksuite/affine/widgets/linked-doc/src/transformers/docx.ts, blocksuite/affine/widgets/linked-doc/src/transformers/index.ts
New docx.ts introduces ImportDocxOptions and DocxTransformer.importDocx which uses mammoth.convertToHtml then delegates to HtmlTransformer.importHTMLToDoc; index export added.
Dependency
blocksuite/affine/widgets/linked-doc/package.json
Added dependency "mammoth": "^1.11.0".
Import dialog integration
packages/frontend/core/src/desktop/dialogs/import/index.tsx
Extended ImportType/AcceptType to include docx; added UI import option entry and importConfigs entry that accepts a single .docx and calls DocxTransformer.importDocx.
Internationalization
packages/frontend/i18n/src/i18n.gen.ts, packages/frontend/i18n/src/resources/en.json, packages/frontend/i18n/src/resources/zh-Hans.json, packages/frontend/i18n/src/resources/zh-Hant.json
Added com.affine.import.docx and com.affine.import.docx.tooltip keys and updated generated i18n types.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant ImportDialog
    participant FileSystem
    participant DocxTransformer
    participant Mammoth
    participant HtmlTransformer
    participant Collection

    User->>ImportDialog: choose "Import .docx"
    ImportDialog->>FileSystem: request .docx file (accept: .docx)
    FileSystem-->>ImportDialog: return Blob
    ImportDialog->>DocxTransformer: importDocx(collection, schema, blob)
    DocxTransformer->>Mammoth: convertToHtml(blob)
    Mammoth-->>DocxTransformer: HTML
    DocxTransformer->>HtmlTransformer: importHTMLToDoc(html, collection, schema, extensions)
    HtmlTransformer->>Collection: create document
    Collection-->>HtmlTransformer: doc ID
    HtmlTransformer-->>DocxTransformer: doc ID
    DocxTransformer-->>ImportDialog: { docIds }
    ImportDialog-->>User: show imported doc

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Areas to check closely:
    • Error handling and promise flow in docx.ts around mammoth.convertToHtml and HtmlTransformer.importHTMLToDoc.
    • Type declarations: ImportDocxOptions and exported DocxTransformer shape.
    • Import dialog wiring: AcceptType/ImportType updates and single-file acceptance behavior in importConfigs.
    • New dependency in package.json for packaging/build impact.

Poem

🐰 I hopped a docx into my paws at dawn,
Mammoth spun its HTML with a yawn,
Transformers stitched each paragraph tight,
Pages sprung forward — what a sight!
Now .docx hops home at first light.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(editor): import docs from docx' clearly and concisely summarizes the main change: adding support for importing documents from .docx files.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Nov 15 '25 05:11 coderabbitai[bot]

Codecov Report

:x: Patch coverage is 12.50000% with 7 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 56.62%. Comparing base (c302425) to head (8e83125). :warning: Report is 5 commits behind head on canary.

Files with missing lines Patch % Lines
...affine/widgets/linked-doc/src/transformers/docx.ts 16.66% 5 Missing :warning:
blocksuite/affine/shared/src/adapters/html/html.ts 0.00% 2 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           canary   #11774      +/-   ##
==========================================
- Coverage   56.63%   56.62%   -0.02%     
==========================================
  Files        2756     2757       +1     
  Lines      137627   137634       +7     
  Branches    21036    21030       -6     
==========================================
- Hits        77947    77933      -14     
+ Misses      57997    57987      -10     
- Partials     1683     1714      +31     
Flag Coverage Δ
server-test 77.35% <ø> (-0.03%) :arrow_down:
unittest 31.95% <12.50%> (-0.01%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Nov 15 '25 06:11 codecov[bot]