kestra icon indicating copy to clipboard operation
kestra copied to clipboard

Task using WorkingDirectory cache ends with an error: `Unable to execute WorkingDirectory post actions`

Open anna-geller opened this issue 1 year ago • 10 comments

Explain the bug

Is cache only intended for use with Process runner - @loicmathieu? if so, we can close the issue

reproducer:

id: python_cached_dependencies
namespace: dev

tasks:
  - id: working_dir
    type: io.kestra.core.tasks.flows.WorkingDirectory
    tasks:
      - id: python_script
        type: io.kestra.plugin.scripts.python.Script
        warningOnStdErr: false
        beforeCommands:
          - python -m venv venv
          - source venv/bin/activate
          - pip install pandas
        script: |
          import pandas as pd
          print(pd.__version__)
    cache:
      patterns:
        - venv/**
      ttl: PT1H

stack trace:

Command succeed with code 0
2023-10-04 12:04:29.766Cache files changed, we update the cache
2023-10-04 12:04:29.918Unable to execute WorkingDirectory post actions
2023-10-04 12:04:29.918java.io.IOException: Is a directory
	at java.base/sun.nio.ch.FileDispatcherImpl.read0(Native Method)
	at java.base/sun.nio.ch.FileDispatcherImpl.read(Unknown Source)
	at java.base/sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
	at java.base/sun.nio.ch.IOUtil.read(Unknown Source)
	at java.base/sun.nio.ch.IOUtil.read(Unknown Source)
	at java.base/sun.nio.ch.FileChannelImpl.read(Unknown Source)
	at java.base/sun.nio.ch.ChannelInputStream.read(Unknown Source)
	at java.base/sun.nio.ch.ChannelInputStream.read(Unknown Source)
	at java.base/sun.nio.ch.ChannelInputStream.read(Unknown Source)
	at java.base/java.nio.file.Files.read(Unknown Source)
	at java.base/java.nio.file.Files.readAllBytes(Unknown Source)
	at io.kestra.core.tasks.flows.WorkingDirectory.postExecuteTasks(WorkingDirectory.java:326)
	at io.kestra.core.runners.Worker.handleTask(Worker.java:173)
	at io.kestra.core.runners.Worker.lambda$run$2(Worker.java:137)
	at io.micrometer.core.instrument.internal.TimedRunnable.run(TimedRunnable.java:49)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)

changing to PROCESS runner works:

image

Environment Information

  • Kestra Version: 0.13.0
  • Operating System and Java Version (if not using Kestra Docker image):

anna-geller avatar Oct 04 '23 10:10 anna-geller

We must support docker runner

tchiotludo avatar Oct 04 '23 10:10 tchiotludo

Does it work? I think it has something to do with file write access on some Docker installation so it can be ignored.

loicmathieu avatar Oct 04 '23 10:10 loicmathieu

thanks. I shared a reproducer, can you try it? I ran it here on the preview https://preview-oss.kestra.io/ui/executions/dev/python_cached_dependencies/16dIqspaJWkf1eDO3s8i0o/logs?level=TRACE&page=1

and it leads to the error shown in the issue body

anna-geller avatar Oct 04 '23 10:10 anna-geller

we don't display error if it's work for the end users, so if it work, hide the log

tchiotludo avatar Oct 04 '23 10:10 tchiotludo

I can confirm it didn't work

loicmathieu avatar Oct 04 '23 11:10 loicmathieu

I checked and this is caused by the container being run as root. I can lower the message from ERROR to WARNING. By the way, the same issue occurs for cleaning the temporary directory (which is not related to this task).

This is caused by the Docker daemon being run as root I think. The python official image runs as root but I'm not sure changing that will make any difference.

loicmathieu avatar Oct 19 '23 16:10 loicmathieu

Interesting, so it means that the files are properly cached and the error is only due to writing those files with the root user?

if so, it seems reasonable to change to WARNING

anna-geller avatar Oct 19 '23 16:10 anna-geller

Unfortunately this is when creating the cache file so caching will not work. The file has been created as root so it cannot be run with the user running Kestra.

loicmathieu avatar Oct 20 '23 07:10 loicmathieu

we don't have a solution for this atm as it's more a Docker issue than a Kestra issue - for now, we should extend task documentation for the WorkingDirectory task saying that the cache property can only be used with PROCESS runner

anna-geller avatar Nov 22 '23 11:11 anna-geller

I am getting the same issue but have specified a PROCESS runner for all tasks. My overall Kestra deployment is using Docker with user as ROOT. Is the solution to have Kestra utilize a different system user in its docker compose to resolve the issue?

walker-philips avatar Dec 21 '23 20:12 walker-philips

Hi @walker-philips, sorry for the late response — I think we don't know yet what's the right way to resolve the issue, that's why it's still open. We might solve it as part of the new script runner for Docker https://github.com/kestra-io/kestra/issues/3153

anna-geller avatar Mar 19 '24 22:03 anna-geller

@anna-geller I made a small adjustement by ignoring directories and files that cannot be read (like executables) and I can make it works. I'm not sure but depending on the system you may need to set the docker user to 1000 for ex to avoid permission issue.

See https://github.com/kestra-io/kestra/pull/3422

loicmathieu avatar Mar 28 '24 16:03 loicmathieu

nice! let's keep the issue open to QA your change thoroughly first on the develop image

anna-geller avatar Mar 28 '24 17:03 anna-geller

@anna-geller did you had a change to have a look at this? Can we close it?

loicmathieu avatar May 17 '24 12:05 loicmathieu

overall it works, I only don't understand the second line here "Cache files changed, we update the cache" -- I didn't update anything so the cache should be the same for 1 hour

image

anna-geller avatar May 17 '24 12:05 anna-geller

also:

image

looking at not significantly reduced execution times, something doesn't seem to be working entirely as expected

hard to judge by execution times whether dependencies were properly cached and/or retrieved from the cache:

image

Worth QAing a bit more, can you cross-check on your end?

anna-geller avatar May 17 '24 12:05 anna-geller

I only don't understand the second line here "Cache files changed, we update the cache"

You didn't change anything, but the pip install changed it (the same goes for npm). There are usually some files that change even when the same dependencies are fetched, this can be handled by the cache regex.

looking at not significantly reduced execution times, something doesn't seem to be working entirely as expected

There is two things here:

  • I think venv creates a random directory, so I'm not sure caching would work with venv.
  • Even if caching exist, sometimes it's not significantly faster than downloading each library, it depends ...

Worth QAing a bit more, can you cross-check on your end?

I did cross check and the caching functionality works, wether or not it applies to a specific use case (like Python venv) is not dependent of the caching feature.

loicmathieu avatar May 17 '24 13:05 loicmathieu

gotcha, in that case let's close it. thanks so much!

anna-geller avatar May 17 '24 13:05 anna-geller