circleci-docs icon indicating copy to clipboard operation
circleci-docs copied to clipboard

Add note about naming files in parallel jobs

Open nokite opened this issue 1 year ago • 8 comments

Description

Added a note about naming files in persist_to_workspace during parallel jobs - names are fixed, you cannot use parallelism environment variables to make the names of files unique and identifiable. Proven by this error message:

Error locating workspace root directory: stat /tmp/deploy_logs_$CIRCLE_NODE_INDEX: no such file or directory"

Reasons

This behavior is not explained, so one has to figure it out by trial and error.

Furthermore, there does not seem to be a way to achieve the goal at all - save files from multiple parallel runs of a job, and give them unique names that don't conflict with each other).

Furthermore, the error message could be improved to help figure out the problem. Currently it says The specified paths did not match any files in /tmp/deploy_logs - it doesn't mention what paths were used: was it resolved to deploy_logs_0 or is it literally deploy_logs_$CIRCLE_NODE_INDEX

Content Checklist

Please follow our style when contributing to CircleCI docs. Our style guide is here: https://circleci.com/docs/style/style-guide-overview.

Please take a moment to check through the following items when submitting your PR (this is just a guide so will not be relevant for all PRs) 😸:

  • [x] Break up walls of text by adding paragraph breaks.
  • [x] Consider if the content could benefit from more structure, such as lists or tables, to make it easier to consume.
  • [x] Keep the title between 20 and 70 characters.
  • [x] Consider whether the content would benefit from more subsections (h2-h6 headings) to make it easier to consume.
  • [x] Check all headings h1-h6 are in sentence case (only first letter is capitalized).
  • [x] Is there a "Next steps" section at the end of the page giving the reader a clear path to what to read next?
  • [x] Include relevant backlinks to other CircleCI docs/pages.

nokite avatar Sep 09 '24 09:09 nokite

Hey @nokite! Thank you for the PR. It would be great to get a better understanding of what you want to get working here, any chance you can share more?

Parallelism is generally used for splitting up work across execution environments, rather than running the same work multiple times, generally not writing to the same files from the various parallel running jobs. So, it would be useful to understand more to get this addition to the docs right, or potentially offer a different way to achieve what's needed.

rosieyohannan avatar Sep 09 '24 14:09 rosieyohannan

Hey @nokite! Thank you for the PR. It would be great to get a better understanding of what you want to get working here, any chance you can share more?

Parallelism is generally used for splitting up work across execution environments, rather than running the same work multiple times, generally not writing to the same files from the various parallel running jobs. So, it would be useful to understand more to get this addition to the docs right, or potentially offer a different way to achieve what's needed.

Absolutely, I'll try to explain my goals. Thanks for replying!

I use parallelism in order to build and deploy a product in multiple variants in reasonable time. By passing $CIRCLE_NODE_INDEX, I tell it which set of variants to build in each parallel run/instance. Each parallel run builds a couple of variants (just enough to stay below the 3h runtime limit). Any of these variants may occasionally fail to build or deploy. I need to know which ones failed (and with what error message) at the end of the whole workflow.

I would be happy to achieve this in any way.

My assumption was that a good way would be to:

  • create a log (or rather a file with results) in each parallel run
  • save these logs in the workspace
  • have a dependent job at the end that merges all the logs and saves them as an artifact

So the issue I ran into is that you can't save these unique logs in the workspace, as the filename has to be hardcoded - so it can't vary between parallel runs (by using the index, for instance).

nokite avatar Sep 09 '24 19:09 nokite

Hey @nokite, thank you!

Sorry if I'm wrong here but it sounds like you might be able to simplify things. In CircleCI, parallelism as a feature (configuring a number of parallel execution environments and telling CircleCI how to split work across them) is generally reserved for splitting a test suite.

What you describe sounds to me like you want concurrent jobs in a workflow, so you can configure your sets of variants to build in separate jobs and then create a workflow where those jobs run concurrently: https://circleci.com/docs/concurrency/#concurrency-in-workflows. Then you would be able to see in the UI/wherever which failed/built/deployed for each job.

Basically "parallel" and "concurrent" can mean largely the same thing but in CircleCI parallelism is a specific testing-focussed feature, whereas concurrency is about running jobs, doing work at the same time across multiple execution environments.

Please let me know if I misunderstood and oversimplified things here!

rosieyohannan avatar Sep 09 '24 20:09 rosieyohannan

@rosieyohannan thanks, I think we're on the same page. The point where we might be thinking differently is that I believe CircleCI's parallelism has potential outside the realm of testing. It's a powerful tool, and there's no need to restrict its purpose, in my opinion.

Technically, I believe my usage of parallelism fits its purpose. I'm splitting work of the same type across parallel runners. There's pretty much nothing different between the work for each job than an index that helps split the work. Each build uses the same codebase, but has slightly different configuration and a few of the files that it builds differ. Internally we call the same command, with only a parameter differing between jobs/runs. (So I think it does pretty much the same as your example with tests - which runs different sets of tests in the same codebase, using some system for splitting the work between the runners.)

Regarding what you suggested - I agree it's totally valid, and that's how I started. Initially I had a workflow config where I called the job a large number of times - via multiple entries in the workflow section. That resulted in a really, really long and repetitive config.yml.

It bugged me that the only difference between the separate calls was an index (which I passed as a parameter). The job itself was still defined only once - as there was no reason to duplicate it.

Then I figured out that what I was doing was basically parallelism - same type of work called with an index. So I refactored the config and used parallelism, which felt right (and awesome 🙂). It reduced the number of repetitive lines dramatically, and I liked the different way it was shown in the CircleCI UI. I could see everything at a glance (fitting on a single screen), and could easily switch between runs.

It ended up being a long comment unfortunately, but I hope I managed to illustrate why I see parallelism as the right tool for the job. Let me know!

nokite avatar Sep 10 '24 08:09 nokite

Hi @nokite,

Thanks for reporting this, the confusion I think comes from what info is available at which points in the process. E.g. env-vars aren't available for interpolation when we're processing config.

I think we could make the different phases and what's available in each phase clearer in the docs.

To get on to your root issue, there's a few different options available that might get you unblocked:

  • You could use your original approach but with a matrix parameter so you don't have to manually add the job into the workflow multiple times.
  • You can keep the parallelism approach but instead of persisting /tmp/deploy_logs_$CIRCLE_NODE_INDEX to the workspace, instead save the log files to /tmp/deploy_logs/$CIRCLE_NODE_INDEX.log and then persist /tmp/deploy_logs.
    - persist_to_workspace:
        root: "/tmp/deploy_logs"
        paths:
          - "*"
    
  • You can store the logs as artifacts instead. In this case you could write each log to the same fixed filename and store that file as an artifact, artifacts get uploaded into a unique space per container in the parallel job.

(edited to fix config snippet)

gordonsyme avatar Sep 10 '24 13:09 gordonsyme

@gordonsyme Thanks for your suggestions, I appreciate your time. I liked the second idea - using paths: "*". I did not know that this was possible! Excellent! I am using it now 🔥 (note for anyone else reading this - I used it as an array - with a new line: paths: \n - "*")

P.S. Matrix seems a bit too verbose for this use case, if I understand it correctly. I'd have to list all the parameters that define the array/matrix of (many) concurrent jobs. And in my case those parameters are not really meaningful - they're just indexes. It does seem like a great solution when the parameters are meaningful though (like in os: [docker, linux, macos] | node-version: ["14.17.6", "16.9.0"]).

I can't comment on artifacts, I would have to try it out with my setup.

nokite avatar Sep 17 '24 15:09 nokite

@nokite awesome, glad you're sorted :)

(note for anyone else reading this - I used it as an array - with a new line: paths: \n - "*")

That'll teach me to write out config off the top of my head 😅, I'll edit my first reply so the correct form is out there for anyone else who comes across it.

gordonsyme avatar Sep 17 '24 15:09 gordonsyme

As for the PR, I'm OK if you close it and handle the update on your side, as you have a better understanding. I can suggest the following:

  • There are some considerations about parallelism. Environment variables like $CIRCLE_NODE_INDEX cannot be used in the persist_to_workspace step in order to define dynamic file names. In order to save files with dynamic names, you could save all files in a folder with paths: - "*". This way, you can give unique names for a file in each parallel run and avoid conflicts when saving it.

nokite avatar Sep 17 '24 15:09 nokite

@nokite I've added your suggestion to our backlog but closing this for now. Thank you!

rosieyohannan avatar Jun 02 '25 10:06 rosieyohannan