content icon indicating copy to clipboard operation
content copied to clipboard

Migration challenges with SQLite

Open NathanDubord opened this issue 10 months ago • 6 comments

Environment

Version

v3

Reproduction

Can only reproduce with specific hardware

Description

Hi all, we're trying to migrate from Content v2 to Content v3 and have noticed some significant challenges with SQLite and our GitLab architecture. Issues we're facing:

  1. Linux x86-64 runner (32 vCPU's, 128GB memory, and 200GB storage) takes 25% longer to nuxt generate compared to v2 on the same hardware. We believe the issue is something to do with SQLite. Originally the builds were significantly longer, but changing journal mode to WAL helped shave off 20%. When switching to a much less powerful macOS (6 vCPU's, 16GB memory, 50GB storage) runner, we're seeing build times match Content v2.
  2. SQLite errors with larger data. My content directory has a single test.md file that is just the word "test" copy pasted 8000 times. The limit appears to be around 51,980 for yml and 32,875 for md. This is for json strings in a single column, if the content is separated into more columns this error goes away.
yarn dev
yarn run v1.22.22
$ nuxt dev
Nuxt 3.15.4 with Nitro 2.10.4                                                                                                              
  ➜ Local:    http://localhost:3000/
  ➜ Network:  use --host to expose

  ➜ DevTools: press Shift + Option + D in the browser (v1.7.0)                                                                             

✔ Processed 2 collections and 1 files in 60.52ms (0 cached, 1 parsed)                                                                      

 ERROR  [unhandledRejection] near "yml": syntax error                                                                                      
    at Database.exec (node_modules/better-sqlite3/lib/methods/wrappers.js:9:14)
    at Object.exec (node_modules/db0/connectors/better-sqlite3.mjs:27:22)
    at Object.exec (node_modules/@nuxt/content/dist/module.mjs:369:10)
    at processCollectionItems (node_modules/@nuxt/content/dist/module.mjs:2521:8)
    at async node_modules/@nuxt/content/dist/module.mjs:2420:20
    at async initNuxt (node_modules/nuxt/dist/shared/nuxt.CrJjphBv.mjs:5343:3)
    at async NuxtDevServer._load (node_modules/@nuxt/cli/dist/chunks/dev2.mjs:162:5)
    at async NuxtDevServer.load (node_modules/@nuxt/cli/dist/chunks/dev2.mjs:94:7)
    at async NuxtDevServer.init (node_modules/@nuxt/cli/dist/chunks/dev2.mjs:89:5)
    at async Object.run (node_modules/@nuxt/cli/dist/chunks/dev-child.mjs:83:5)

content.config.ts

export default defineContentConfig({
  collections: {
    content: defineCollection({
      type: 'page',
      source: '**',
    }),
  },
});
  1. It's mentioned in the migration documentation that you don't need to modify your content files. However we've had to modify our yml files to efficiently query nested content. Our blog posts for example were in .yml files with 2 objects at the root, "seo" and "content". The content object is getting dropped into the content column of the table as a text field. This makes is significantly slower/harder to query than before as our blog posts can be massive. We've been taking queryable fields out of our content column and creating new columns to index. Is there anything we're missing here?

Overall we're really excited to use Content v3, but have a few roadblocks, please let me know if there's any additional information I can provide!

Additional context

Content v2 to Content v3 Merge Request - https://gitlab.com/gitlab-com/marketing/digital-experience/about-gitlab-com/-/merge_requests/399

NathanDubord avatar Mar 08 '25 03:03 NathanDubord

Hello @NathanDubord, Thanks for the issue description, I have some idea about the source of these errors, but some parts are surprising to me.

Linux x86-64 runner (32 vCPU's, 128GB memory, and 200GB storage) takes 25% longer to nuxt generate compared to v2 on the same hardware. We believe the issue is something to do with SQLite. Originally the builds were significantly longer, but changing journal mode to WAL helped shave off 20%. When switching to a much less powerful macOS (6 vCPU's, 16GB memory, 50GB storage) runner, we're seeing build times match Content v2.

This difference might be related to the SQLite connector we are using. There is an open PR #3230 to adopt built-in adapter which is introduced in Node 22.5+. As I checked, you are using Node 22.11, so if you check with the pr release in both of the runners we can find out if it is related to the SQLite adapter or not.

npm i https://pkg.pr.new/@nuxt/content@3230

Another improvement would be to introduce a flag to disable development cache to improve build speed. This cache database improves performance in development, which prevents unnecessary parses. But in CI it is always empty and checking and filling is not necessary.

https://github.com/nuxt/content/blob/75121d5bf408c050a6544f5a005c19419f280bee/src/module.ts#L307-L309

SQLite errors with larger data. My content directory has a single test.md file that is just the word "test" copy pasted 8000 times. The limit appears to be around 51,980 for yml and 32,875 for md. This is for json strings in a single column, if the content is separated into more columns this error goes away.

This is most likely related to our query generator and splitter. Could you share .nuxt/content/database.sql?

It's mentioned in the migration documentation that you don't need to modify your content files. However we've had to modify our yml files to efficiently query nested content. Our blog posts for example were in .yml files with 2 objects at the root, "seo" and "content". The content object is getting dropped into the content column of the table as a text field. This makes is significantly slower/harder to query than before as our blog posts can be massive. We've been taking queryable fields out of our content column and creating new columns to index. Is there anything we're missing here?

Sorry to hear that, as you explained and what I get from your repository, you have a unique way to write blog posts in yaml files and you use markdown-it to parse and render in runtime. You can still keep the contents structure as is and just modify them just for Content module and querying purpose. using hooks https://content.nuxt.com/docs/advanced/hooks#contentfileafterparse

// nuxt.config.ts
export default defineNuxtConfig({
  hooks: {
    'content:file:afterParse'({ content }) {
      // copy category out of `content` field to store it in separate column and easy filtering
      content.category = content.content.category
    }
  }
})

Also, I am wondering what stopped you from using markdown files and providing SEO data in front-matter and use ContentRenderer to render parsed data? like:

---
title: ....
category: ...
seo:
  title: ...
  description: ...
---

# Markdown

farnabaz avatar Mar 10 '25 17:03 farnabaz

Hello @farnabaz

I have the same issue here, but in development and using markdown files.

It seems that the error is thrown for large files, while processCollectionItems run the insert into queries: https://github.com/nuxt/content/blob/main/src/module.ts#L362

In my example, the error says

ERROR  [unhandledRejection] near "["TairoTable","TairoTableHeading","TairoTableRow","TairoTableCell"]": syntax error

Here is the generated sql that thrown the error: https://pastebin.com/g2Jngw8G For info, the original query size is 70740 (69.08 KB)

Initially I thought it came from my metadata, but removing them raise other errors, while reducing content size make the error disappear

---
title: TairoTable
components:
  - TairoTable
  - TairoTableHeading
  - TairoTableRow
  - TairoTableCell
---

This is really strange, we are far from sqlite limites: https://www.sqlite.org/limits.html

Another point is that the errors are not caught, and nuxt crash/restart continuously

note: I've tried experimental sqlite, but this doesn't change anything

note 2: The issues do not occur with content v3.1.0, and start with content v3.2.0

stafyniaksacha avatar Apr 04 '25 09:04 stafyniaksacha

Hey @stafyniaksacha, From the generated SQL, I can say that the query generator is wrong, and it does not escape newlines (check line 7, 8, 9). But debugging blindly is not ideal, I would appreciate it if you could share your collection and content file? Ideally, a simple repository to reproduce the issue.

farnabaz avatar Apr 04 '25 10:04 farnabaz

Yep, I see those newline, but If I remove the components from the yml, then newlines are added in other place.

Here is the example of new sql output: https://pastebin.com/06x2aEdb

And here is the schema:

const schema = z.object({
  // components: z.string().array(),
  toc: z.boolean(),
  icon: z.object({
    src: z.string(),
    srcDark: z.string().optional(),
  }),
})

export default defineContentConfig({
  collections: {
    docs: defineCollection({
      source: 'documentation/**',
      type: 'page',
      schema,
    }),
  },
})

stafyniaksacha avatar Apr 04 '25 10:04 stafyniaksacha

Could you provide raw markdown?

farnabaz avatar Apr 04 '25 10:04 farnabaz

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions[bot] avatar Jun 03 '25 11:06 github-actions[bot]