rspack icon indicating copy to clipboard operation
rspack copied to clipboard

[Bug]: rspack build gets stuck at ci

Open uladzimirdev opened this issue 8 months ago • 24 comments

System Info

 Binaries:
    Node: 22.13.1 - ~/.volta/tools/image/node/22.13.1/bin/node
    Yarn: 1.22.19 - ~/.volta/tools/image/yarn/1.22.19/bin/yarn
    npm: 10.8.2 - ~/.volta/tools/image/npm/10.8.2/bin/npm
    Watchman: 2024.08.12.00 - /opt/homebrew/bin/watchman
  npmPackages:
    @rspack/cli: ^1.2.8 => 1.2.8
    @rspack/core: ^1.2.8 => 1.2.8
    @rspack/plugin-react-refresh: ^1.0.1 => 1.0.1

Details

to create a prod build I use a command WEBPACK_BUNDLE=production NODE_OPTIONS=--max-old-space-size=8196 rspack build, which basically runs rspack build. After upgrade from v1.1.6 to 1.2.8 (and it actually happened with 1.2.5) this command gets stuck and does not provide any additional info in 20min, then CI fails by timeout. Usual time to build static resources at CI is 100s.

I don't have a link with reproduction, as it happens from time to time. But here is a link to the GHA job

rspack config

Image

Reproduce link

No response

Reproduce Steps

WEBPACK_BUNDLE=production NODE_OPTIONS=--max-old-space-size=8196 rspack build

uladzimirdev avatar Mar 13 '25 13:03 uladzimirdev

The reasons for stuck are so many, if you can provide a repro I believe we can help solving that in a few days

JSerFeng avatar Mar 14 '25 06:03 JSerFeng

Hello @uladzimirdev, sorry we can't investigate the problem further without reproduction demo, please provide a repro demo by forking rspack-repro, or provide a minimal GitHub repository by yourself. Issues labeled by need reproduction will be closed if no activities in 14 days.

github-actions[bot] avatar Mar 14 '25 06:03 github-actions[bot]

@JSerFeng clear, I'll try to do it, but so far no luck. Any tips on how to collect debug info?

uladzimirdev avatar Mar 14 '25 07:03 uladzimirdev

@uladzimirdev there're some known deadlock potential bugs we're fixing, we'll release a canary version so you can try whether it fixes your problem

hardfist avatar Mar 14 '25 09:03 hardfist

thanks @hardfist. I'm trying to find the prerequisites to repro this possible deadlock. You know, we run 2 jobs at CI in parallel (only env variables are different) and second job gets stuck maybe 3/100 times, so it's really hard to verify atm.

it's not reproducible locally, no matter how much resources do I provide and how many other tasks I run to make CPU busy. I've created CPU profiles, but nothing caught my eye. if it was a circular dep problem, I'd be able to reproduce it locally.

I had to upgrade swc packages together with rspack, maybe there can be some problem.

Rsdoctor didn't show any specific problem

CPU Profile

Image

uladzimirdev avatar Mar 14 '25 11:03 uladzimirdev

normally deadlock is caused by rust side so you need rust side profile to debug, you can generate cpu profile by following this guide https://rspack.dev/contribute/development/profiling#samply

you can try @rspack-canary/[email protected] by following this guide to see whether it solves your deadlock problem

hardfist avatar Mar 14 '25 14:03 hardfist

I've been able to gather some logs from github actions, using TRACE level. The file is huge, I uploaded it to google drive.

Please let me know if I need to upload it somewhere else so you have access to it

uladzimirdev avatar Mar 28 '25 09:03 uladzimirdev

I've been able to gather some logs from github actions, using TRACE level. The file is huge, I uploaded it to google drive.

Please let me know if I need to upload it somewhere else so you have access to it

en, seem stuck in emit_assets phase(which is the last phase of rspack)

hardfist avatar Mar 28 '25 09:03 hardfist

@uladzimirdev this may caused by https://github.com/web-infra-dev/rspack/pull/9587/files#diff-3fbb9f7dfbadceab7b0c89038c54de9851f25ae957a1f91aea05b0e46da2b209L367 which cause block_on on js function call and should be fixed in 1.3.0, can you help test whether it's still stuck with 1.3.0

hardfist avatar Mar 28 '25 09:03 hardfist

On Docusaurus repo (currently Rspack 1.2.5) we also encounter this.

Example CI job timeout: https://github.com/facebook/docusaurus/actions/runs/13945874964/job/39032844416

I'm not sure when it started to happen, but probably after 1.2.x, when we also turned on persistent cache.


I've also encountered it locally. I'm not 100% sure but I think it also happened with the dev server.

From what I remember, restarting the dev server or the prod build would consistently lead to the bundling process being stuck again at the end (100% progress bar, I think the status was "sealing" or "emiting" or something)

I also think that cleaning node_module/.cache folder permitted us to restart the dev server / prod build and make it complete successfully. So I assume it might be related to some kind of corrupted persistent cache?

I'm not 100% sure of all this. Will try to investigate more the next time I see this issue locally, but I don't know exactly how to trigger it, unfortunately.

slorber avatar Mar 28 '25 16:03 slorber

@hardfist @h-a-n-a FYI I didn't have any deadlock after upgrade to 1.3.0, maybe it's too early, but my issue seems to be resolved

uladzimirdev avatar Apr 01 '25 11:04 uladzimirdev

@slorber I think that we're also running into this when testing this out with our Backstage microsite and docusaurus: https://github.com/backstage/backstage/pull/29413

Can't seem to reproduce this locally, but fails consistently in CI.

Tried a lot of things to see get something working, including trying 1.30.

Happy to get some debug logs to help with working out what's wrong here :pray:

benjdlambert avatar Apr 01 '25 14:04 benjdlambert

since backstage is OSS, @benjdlambert if you met deadlock issue in your ci, please ping me so I can investigate it in your repo

hardfist avatar Apr 01 '25 14:04 hardfist

@hardfist the PR above is using rspack with docusaurus. Feel free to dig around on that branch and fork to run some tests. The logs are also there too with the failures.

https://github.com/backstage/backstage/pull/29413

benjdlambert avatar Apr 01 '25 14:04 benjdlambert

@benjdlambert in my case it doesn't reproduce consistently, so it might be something else.

Image

Maybe try to use Rspack 1.1 and see if it improves, I don't remember having issues with that version.

slorber avatar Apr 01 '25 15:04 slorber

@slorber i actually tried that in a previous commit and got the same issue. So to be honest I’m not sure at this point what’s causing it, if it actually is a deadlock or no. The symptoms are the same as this issue though.

benjdlambert avatar Apr 01 '25 15:04 benjdlambert

I'm having similar problems, but just while using rsbuild build. Our pipelines started timing out about 3 weeks ago randomly when building multiple projects. It's like rsbuild isn't properly exiting when it completes a build. I can rerun the exact same code on the same agent and it'll succeed 95% of the time. It's not project-specific (I'm running nx run-many to build projects in a mono-repo, and I've seen our task time out in between different projects). It's like it just hangs and doesn't move to the next build, eventually timing out. I've turned on all the verbose logs that I can and I can't seem to pinpoint an exact cause.

rrussell0 avatar Apr 04 '25 10:04 rrussell0

@rrussell0 most of the deadlock issue is solved in 1.3.0 can you try to upgrade to see whether it is solved

hardfist avatar Apr 04 '25 10:04 hardfist

We have been having a similar issue, randomly hangs in CI (github actions) without any errors or anything.

I tried adding a progressHandler to get some logs with:

const handler = (percentage: any, message: any, ...args: any) => {
  console.info(percentage);
  console.log(message);
  console.log(args);
  console.log('----------------------------------');
};
new rspack.ProgressPlugin(handler);

And all we get is the following, seems to be a different random file each time.

Image

Is there a way to enable rust logging in CI?

jtsorlinis avatar Apr 16 '25 06:04 jtsorlinis

@jtsorlinis try RSPACK_PROFILE=TRACE=layer=logger rspack build

hardfist avatar Apr 16 '25 07:04 hardfist

@hardfist I tried to do this on the backstage repo for the docusaurus build but didn't seem to get any additional logs. I guess it's the rspack build command that picks up on these env vars.

Is there anything I can add for those builds to get more logs?

https://github.com/backstage/backstage/blob/28228d3623f5f05a1fa49e977476ce0df8792a21/.github/workflows/verify_microsite.yml#L275-L279

benjdlambert avatar Apr 16 '25 09:04 benjdlambert

@rrussell0 most of the deadlock issue is solved in 1.3.0 can you try to upgrade to see whether it is solved

After upgrading RSbuild in our project, the deadlocks seem to have stopped - at least none in the last week. Thanks!

rrussell0 avatar Apr 16 '25 10:04 rrussell0

Can't see any errors even with trace enabled, it seems to just stop and then timeout. Not sure if the logs will help.

I've attached the last 750 lines or so because the whole log was ~2GB

truncated-logs.txt

Just to confirm this is happening on both 1.2.8 and 1.3.5, and seems to happen at random (80% of builds succeed)

jtsorlinis avatar Apr 16 '25 22:04 jtsorlinis

Can't see any errors even with trace enabled, it seems to just stop and then timeout. Not sure if the logs will help.

I've attached the last 750 lines or so because the whole log was ~2GB

truncated-logs.txt

Just to confirm this is happening on both 1.2.8 and 1.3.5, and seems to happen at random (80% of builds succeed)

@benjdlambert @jtsorlinis RSPACK_TRACE_LAYER=logger RSPACK_PROFILE=OVERVIEW rspack build can generate much smaller log in 1.3.6 version

hardfist avatar Apr 24 '25 06:04 hardfist

Hi @hardfist,

Trying with RSPACK_PROFILE=OVERVIEW seems to produce very similar levels of logging as RSPACK_PROFILE=TRACE

In other news, the build hangs and times out consistently now in gh actions but works fine locally

jtsorlinis avatar Apr 28 '25 23:04 jtsorlinis

Hi @hardfist,

Trying with RSPACK_PROFILE=OVERVIEW seems to produce very similar levels of logging as RSPACK_PROFILE=TRACE

In other news, the build hangs and times out consistently now in gh actions but works fine locally

can you help upload trace.json to github artifacts and share it with me?

hardfist avatar Apr 28 '25 23:04 hardfist

@hardfist sorry for not getting back to you, as this was holding up our development I ended up just moving us back to webpack+swc-loader for the time being.

I will try provide you with a trace when I get some spare time

jtsorlinis avatar May 09 '25 22:05 jtsorlinis

@hardfist I would try and help here with a trace.json as our builds fails all of the time, and feels pretty good test environment. However, from what I can tell at the moment, that the runner just totally crashes when running, so I'm not 100% sure how we're gonna be able to get a dump from a host that's unresponsive.

I'm using rspack through the docusaurus build, so is there a way that I can get a trace.json to disk easily? @slorber also maybe you might know of any props that get passed through to rspack to be able to debug this?

benjdlambert avatar May 10 '25 19:05 benjdlambert

@benjdlambert does deadlock happens in backstage? if it's open source I can debug in your repo, honestly it's not easy to debug deadlock problems right now, we're still investigating better solutions for deadlock detection https://github.com/web-infra-dev/rspack/issues/10327

hardfist avatar May 12 '25 03:05 hardfist

does deadlock happens in backstage? if it's open source I can debug in your repo

I'm not sure it's a deadlock, but it consistently fails every build and seems like the agent dies. https://github.com/backstage/backstage/pull/29413 is the PR and branch. If I can help, please let me know :pray:

benjdlambert avatar May 12 '25 08:05 benjdlambert