nx icon indicating copy to clipboard operation
nx copied to clipboard

Nx daemon not available in docker containers

Open hytromo opened this issue 2 years ago • 12 comments

Current Behavior

Currently the Nx daemon usage is completely disabled when Nx is running in a docker container (source).

This causes Nx commands to recalculate the project graph each time instead of using the daemon's cache (source).

So, for every nx command that is run, this includes hashing all the monorepo files each time, which can be a really time consuming operation for larger monorepos (source).

We are developing inside a docker container to ensure common tooling between the team members, and this is really slowing us down (5+ seconds to start running each nx command, like nx run <project>:<target>)

Hacking the code on the fly to enable the daemon to work inside a docker container seems to work fine, so not sure why this was introduced here.

Expected Behavior

Nx Daemon should work inside docker containers to avoid recalculating the project graph each time which can be a very expensive operation.

Github Repo

No response

Steps to Reproduce

  1. Start a docker container and mount your large monorepo in it
  2. Try to run an nx target to see the slow startup time
  3. Run nx daemon to see that the daemon hasn't started
  4. Run another nx target and notice how slow the startup is

Nx Report

Node : 18.12.1
   OS   : linux x64
   npm  : 8.19.2
   
   nx : 15.4.2
   @nrwl/angular : Not Found
   @nrwl/cypress : Not Found
   @nrwl/detox : Not Found
   @nrwl/devkit : 15.4.2
   @nrwl/esbuild : Not Found
   @nrwl/eslint-plugin-nx : Not Found
   @nrwl/expo : Not Found
   @nrwl/express : Not Found
   @nrwl/jest : Not Found
   @nrwl/js : 15.4.2
   @nrwl/linter : 15.4.2
   @nrwl/nest : Not Found
   @nrwl/next : Not Found
   @nrwl/node : Not Found
   @nrwl/nx-cloud : Not Found
   @nrwl/nx-plugin : Not Found
   @nrwl/react : Not Found
   @nrwl/react-native : Not Found
   @nrwl/rollup : Not Found
   @nrwl/schematics : Not Found
   @nrwl/storybook : Not Found
   @nrwl/web : Not Found
   @nrwl/webpack : Not Found
   @nrwl/workspace : 15.4.2
   @nrwl/vite : Not Found
   typescript : 4.8.4
   ---------------------------------------
   Local workspace plugins:
   ---------------------------------------
   Community plugins:

Failure Logs

No response

Additional Information

No response

hytromo avatar Jan 04 '23 10:01 hytromo

Thanks for this. I've been working on streamlining a build/CI pipeline for a couple of days now and have been wondering whether I was configuring things wrong or if it was intentional behaviour from NX. For some reason googling "docker nx cache" or similar things yields essentially no results so I was beginning to think I was the only one with this issue. Good to know that's not the case.

Would you mind sharing your workaround if it's not too much trouble? Much appreciated.

lega0208 avatar Jan 04 '23 20:01 lega0208

Would you mind sharing your workaround if it's not too much trouble? Much appreciated.

I haven't found a real workaround, unfortunately. To test out whether the Nx Daemon works inside a docker container I just edited on the fly the Nx code itself.

hytromo avatar Jan 05 '23 07:01 hytromo

Hi @hytromo / @lega0208

To provide some context, we originally disabled the daemon in Docker because when running with Nx Cloud we offload network calls to it.

With this setup, it was possible for people to get their workspaces/caches into bad states due to the docker container closing when Nx's process is completed, but before those background calls were finished.

There is probably a smarter way for us to approach this than flat-out disabling the daemon.

StalkAltan avatar Jan 05 '23 20:01 StalkAltan

I haven't found a real workaround, unfortunately. To test out whether the Nx Daemon works inside a docker container I just edited on the fly the Nx code itself.

Sorry, I misunderstood, but thanks nonetheless!

Hi @hytromo / @lega0208

To provide some context, we originally disabled the daemon in Docker because when running with Nx Cloud we offload network calls to it.

With this setup, it was possible for people to get their workspaces/caches into bad states due to the docker container closing when Nx's process is completed, but before those background calls were finished.

There is probably a smarter way for us to approach this than flat-out disabling the daemon.

Thanks for the reply, I figured it was something along those lines.

I'm unfamiliar with the internals so forgive me if this is a silly question, but would it be difficult or problematic to wait for the background calls to finish before exiting the process? Seems like a really straightforward solution so I'm guessing there's a good reason it isn't done that way.

If the reason is performance-related, it might still be better than no daemon at all, and could be worth adding a way to opt-in to that for this particular use case as a temporary solution? I don't know a lot of the details so just throwing that out there.

lega0208 avatar Jan 05 '23 21:01 lega0208

A simpler, one-line-of-code solution would be to introduce a new environment variable in Nx Cloud and check its value instead of disabling the daemon inside docker containers blindly.

hytromo avatar Feb 08 '23 14:02 hytromo

Thanks for this. I've been working on streamlining a build/CI pipeline for a couple of days now and have been wondering whether I was configuring things wrong or if it was intentional behaviour from NX. For some reason googling "docker nx cache" or similar things yields essentially no results so I was beginning to think I was the only one with this issue. Good to know that's not the case.

I've been pulling my hair out as well. I assumed I was doing something wrong and could find nothing on it.

prmichaelsen avatar Mar 06 '23 16:03 prmichaelsen

For the interim, I went with a different remote cache solution: https://github.com/wvanderdeijl/nx-remotecache-gcs

nx.json

{
  "$schema": "./node_modules/nx/schemas/nx-schema.json",
  "npmScope": "parm",
  "tasksRunnerOptions": {
    "default": {
      "runner": "nx-remotecache-gcs",
      "options": {
         "bucket": "gs://parm-nx-cache"
        "cacheableOperations": [
          "build", "lint", "test", "e2e", "server"
        ],
     }
  },
  ...
}
# docker-compose.yml
services:
    app: 
        build:
            args: 
                GCP_CREDENTIALS: $GCP_CREDENTIALS
# .env
GCP_CREDENTIALS=<service-account-credentials>
# Dockerfile
# for nx-remotecache-gcs
ARG GCP_CREDENTIALS=
RUN echo "${GCP_CREDENTIALS}" > credentials_path
ENV GOOGLE_APPLICATION_CREDENTIALS credentials_path

RUN npx nx run <commands>

prmichaelsen avatar Mar 11 '23 14:03 prmichaelsen

You can check if the NX Daemon is running with: ps -aux | grep node_modules/nx/src/daemon/server/start.js If it's not running, you can manually start it with: nx daemon --start To manually stop: nx daemon --stop I'm also using NX in a Linux DevContainer (Cypress/included container actually).

In our CI pipeline we are also struggling with long running tasks. Our unit-test take 5 hours to complete! I just added the NX_DAEMON=true environment variable and also ran "nx daemon --start". Maybe it (also?) works at your side.

I hope this helps...

GrumpyMeow avatar Mar 28 '23 18:03 GrumpyMeow

My team is also running into this issue in Linux Dev Containers. Starting the daemon manually using nx daemon --start and/or adding NX_DAEMON=true to the environment does not restore the caching behavior for us even though nx daemon reports that it is "running".

@StalkAltan Is there anything in progress to resolve this? VSCode Dev Containers are widely used especially with complex monorepo environments and this is a pretty major barrier

JakeDern avatar Jul 09 '23 20:07 JakeDern

Here is our solution for working around this in devcontainers - We're hacking the daemon client with the following script on startup after running npm install. It's not perfect since a variety of npm commands can undo it, but it gets the job done most of the time and works for our pipelines. Nx is such a great tool it's unfortunate that this is needed... hoping for a real fix in the future.

import { spawnSync } from 'child_process';
import { join } from 'path';
import { readFileSync, writeFileSync } from 'fs';

const node_modules_dir = get_node_modules_folder();
const nx_module_name = 'nx'
const nx_docker_cache_file = join('src', 'daemon', 'client', 'client.js')
const searchTerm = 'isDocker() ||';

// Read the file
const filename = join(node_modules_dir, nx_module_name, nx_docker_cache_file);
const content = readFileSync(filename, 'utf-8');

// Remove the matching line
const updatedContent = content.split('\n')
    .filter(line => !line.includes(searchTerm))
    .join('\n');

// Write the updated content back to the file
writeFileSync(filename, updatedContent);

console.log(`Removed '${searchTerm}' from ${filename}`)

// Reset nx and start the daemon
spawnSync('npx nx reset', { shell: true, stdio: 'inherit' });
spawnSync('npx nx daemon --start', { shell: true, stdio: 'inherit' });
spawnSync('npx nx daemon', { shell: true, stdio: 'inherit' })

/**
 * Gets the path to the node_modules folder
 * @returns The path to the node_modules folder
 */
function get_node_modules_folder(): string {
    const output = spawnSync('npm root', { shell: true });
    return output.stdout.toString().trim();
}

JakeDern avatar Aug 03 '23 17:08 JakeDern

I discovered a workaround where this isn't an issue if you don't use Nx

prmichaelsen avatar Aug 03 '23 20:08 prmichaelsen

This issue has been automatically marked as stale because it hasn't had any recent activity. It will be closed in 14 days if no further activity occurs. If we missed this issue please reply to keep it active. Thanks for being a part of the Nx community! 🙏

github-actions[bot] avatar Jan 31 '24 00:01 github-actions[bot]

Still an issue we're working around in our team. Commenting to stop it from being marked stale.

JakeDern avatar Feb 07 '24 15:02 JakeDern

@JakeDern

Here is our solution for working around this in devcontainers - We're hacking the daemon client with the following script on startup after running npm install. It's not perfect since a variety of npm commands can undo it, but it gets the job done most of the time and works for our pipelines. Nx is such a great tool it's unfortunate that this is needed... hoping for a real fix in the future.

Take a look at https://www.npmjs.com/package/patch-package. Although it is still a hack, patch-package is made for this.

Thanks for your hack! We're up and caching.

tfrijsewijk avatar Jun 04 '24 06:06 tfrijsewijk

Can't say if its the daemon or another issue, but since ~1 month the performance of nx inside of docker is making it impossible for us to use.

  • 2 front end apps + 1 node back end starts randomly after ~1 minutes.
  • would love to try and provide a reproduction...but that difficult for us to do with our private code.

LPCmedia avatar Jun 23 '24 06:06 LPCmedia

Running into the same issue within a devcontainer environment. My nest app won't reload on changes since the Daemon isn't running. Seems like the suggestion about a separate environment variable might be a good alternative to support the nx cloud without breaking nx functionality inside devcontainers.

Starting the daemon manually via nx daemon or setting the NX_DAEMON environment variables won't ever do anything because as far as the code is concerned, it bails as soon as it see's that it's inside a docker environment. Perhaps it could check to see if there is a daemon running? That would still require manual starting of the daemon which doesn't seem like a great solution.

https://github.com/nrwl/nx/blob/a8dc251ccec87dcb8874b4ccb5132b33e74daa9e/packages/nx/src/daemon/client/client.ts#L116-L126

Will unfortunately need to use patch-package to remove that line as was suggested, but given the role that devcontainer's play within our organization it would be great to have an actual solution.

Edit

Did a git blame and found the changes to the js/node package happened here which was included in the 19.5.0 release, so reverted nx libs back to 19.4.4 and reloading is now working again.

cjam avatar Jul 24 '24 15:07 cjam