flux-core icon indicating copy to clipboard operation
flux-core copied to clipboard

Compilation bakes in `prefix`, resulting binaries can't be moved

Open vchuravy opened this issue 4 years ago • 26 comments

When doing: flux --version only the minimal help is embedded. Using strace on the process shows that it is looking up a file in the compilation prefix.

openat(AT_FDCWD, "/workspace/destdir/share/flux/help.d", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = -1 ENOENT (No such file or directory)

flux hwloc info

lstat("/workspace", 0x7ffc8f984460)     = -1 ENOENT (No such file or directory)
lstat("/workspace", 0x7ffc8f9830d0)     = -1 ENOENT (No such file or directory)
write(2, "flux-hwloc: flux_open: No such f"..., 49flux-hwloc: flux_open: No such file or directory
) = 49
exit_group(1)                           = ?
+++ exited with 1 +++
ERROR: failed process: Process(`strace /home/vchuravy/.julia/artifacts/a165ec9f736d09a2fd95aa360a79248e41b3ecbb/bin/flux hwloc info`, ProcessExited(1)) [1]

Am I missing a configure option? https://github.com/JuliaPackaging/Yggdrasil/blob/2ce2be1361c134fad0079b32ed059477a81befd4/F/flux_core/build_tarballs.jl#L37

Ideally these lookups would be binary relative.

vchuravy avatar Feb 10 '21 03:02 vchuravy

Yes, the extended help is provided by an installed JSON file, which allows other Flux subprojects to optionally include their commands in flux help output.

Since sysconfdir can be separately defined from prefix I'm not sure how to look up this file relative to the binary, but maybe I'm missing something? How will flux find its other config files in this environment?

I'm also confused by the flux hwloc info strace output you posted above. That error is from flux_open() since flux isn't running, or were you referring to the stat() of /workspace?

grondo avatar Feb 10 '21 03:02 grondo

Ah okay the hwloc case makes sense, but sadly this is also true for flux start:

lstat("/workspace", 0x7ffc96995c90)     = -1 ENOENT (No such file or directory)
brk(0x97c000)                           = 0x97c000
access("/workspace/destdir/libexec/flux/cmd/flux-start.py", R_OK|X_OK) = -1 ENOENT (No such file or directory)
execve("/workspace/destdir/libexec/flux/cmd/flux-start", ["start", "-n", "4"], 0x95cde0 /* 64 vars */) = -1 ENOENT (No such file or directory)
write(2, "flux: `start' is not a flux comm"..., 56flux: `start' is not a flux command.  See 'flux --help'
) = 56
exit_group(1)                           = ?
+++ exited with 1 +++
ERROR: failed process: Process(`strace /home/vchuravy/.julia/artifacts/a165ec9f736d09a2fd95aa360a79248e41b3ecbb/bin/flux start -n 4`, ProcessExited(1)) [1]

vchuravy avatar Feb 10 '21 03:02 vchuravy

Hm either a environment variable, or a configure option that forces everything to be relative would be fine with me.

From https://www.gnu.org/prep/standards/html_node/Directory-Variables.html it seems that all options should be relative to $prefix.

vchuravy avatar Feb 10 '21 03:02 vchuravy

I think I'm still confused, that has nothing to do with the help.d directory.

Everything should be relative to /prefix with any autotools project, so I'm not sure exactly what exactly is wrong here. It looks like you compiled for a prefix in /workspace/destdir but are running out of a temporary directory? You'll have to configure quite a few environment variables to get that to work (check output of flux env for some that must be set properly)

grondo avatar Feb 10 '21 03:02 grondo

It looks like you compiled for a prefix in /workspace/destdir but are running out of a temporary directory?

Right the cross compilation environment I am using compiles in a jail, and then I want to distribute the produced binaries and libraries.

You'll have to configure quite a few environment variables to get that to work (check output of flux env for some that must be set properly)

Thanks for the hint, that might be what I am looking for.

vchuravy avatar Feb 10 '21 03:02 vchuravy

BTW, if you compile with --without-python many things will not work, including flux start which will try to run at least a couple python scripts during startup via rc scripts.

grondo avatar Feb 10 '21 03:02 grondo

Right the cross compilation environment I am using compiles in a jail, and then I want to distribute the produced binaries and libraries.

Compilation in a jail + distribution of the result is kind of how RPMs are built so I'm surprised that doesn't "just work" using --prefix and make install with DESTDIR. Are there any other autotools projects in Yggdrasil to which we can look as an example?

grondo avatar Feb 10 '21 03:02 grondo

Are there any other autotools projects in Yggdrasil to which we can look as an example?

Quite a few. Random example https://github.com/JuliaPackaging/Yggdrasil/blob/359c2d951539ea53fc6fbd913870551dee696cef/G/Giflib/build_tarballs.jl

vchuravy avatar Feb 10 '21 03:02 vchuravy

Thanks, so we just have to figure out why flux-core doesn't "just work" since I didn't see anything tricky going on there. I'll try to help more tomorrow.

grondo avatar Feb 10 '21 04:02 grondo

Thanks! Happy to jump on a call as well.

Re flux start after fixing the environment variables

2021-02-10T03:55:41.955681Z broker.err[0]: rc1.0: fish: Unknown command: /workspace/destdir/etc/flux/rc1
2021-02-10T03:55:41.955706Z broker.err[0]: rc1.0: fish: 
2021-02-10T03:55:41.955717Z broker.err[0]: rc1.0: /workspace/destdir/etc/flux/rc1

vchuravy avatar Feb 10 '21 04:02 vchuravy

Yes, your prefix was set to /workspace/destdir, but flux isn't installed there. You can get it to work by setting FLUX_RC_PATH, but like I said it will fail soon after without Python.

On Tue, Feb 9, 2021, 8:04 PM Valentin Churavy [email protected] wrote:

Thanks! Happy to jump on a call as well.

Re flux start

2021-02-10T03:55:41.955681Z broker.err[0]: rc1.0: fish: Unknown command: /workspace/destdir/etc/flux/rc1 2021-02-10T03:55:41.955706Z broker.err[0]: rc1.0: fish: 2021-02-10T03:55:41.955717Z broker.err[0]: rc1.0: /workspace/destdir/etc/flux/rc1

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/flux-framework/flux-core/issues/3508#issuecomment-776423795, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFVEUVXQYLWZXGOBGSAYWDS6IAULANCNFSM4XME7JRQ .

grondo avatar Feb 10 '21 04:02 grondo

You can get it to work by setting FLUX_RC_PATH,

I think the broker ignores that environment variable https://github.com/flux-framework/flux-core/blob/8d3ad84d3775c53bc5e24c73ef2e0af90179da17/src/broker/broker.c#L604-L659

but like I said it will fail soon after without Python.

Yeah I will think about how to best fix that for my scenario. Might mean that I either have to provide my own python (meh), use the system python (unreliable), or just use the library and expect the user to install flux themselves. Right now my goal was to enable a scenario where we have Julia -> starts Flux -> starts Julia instead of requiring a running external flux environment.

vchuravy avatar Feb 10 '21 04:02 vchuravy

Ah, yeah, sorry I wasn't thinking clearly last night. The rc1 rc3 paths can be set as a broker attributes, e.g. to disable them completely:

$ flux start  -o,-Sbroker.rc1_path=,-Sbroker.rc3_path=

You could also set explicit paths to the rc scripts to the new prefix you've moved the flux-core install. I checked the default rc1 and rc3 and it doesn't appear they use any of our python-based commands.

Right now my goal was to enable a scenario where we have Julia -> starts Flux -> starts Julia instead of requiring a running external flux environment.

That should be simple to accomplish, but is made more difficult in your environment by the following:

  • flux-core package is not easily relocatable (i.e. you can't install to a prefix then move it to another prefix)
  • You can't use Python in your environment, and many important flux-core commands are developed in Python

Did I summarize correctly? Both of these seem like non-trivial issues.

grondo avatar Feb 10 '21 15:02 grondo

BTW, what is the use case for using and starting Flux in this manner? Is it only to run automated tests of the Julia bindings in the build system, or is there some broader application?

grondo avatar Feb 10 '21 15:02 grondo

I can add Python into the mix, but I wanted to avoid it since it also looks like we need Python packages.

Testing is one use-case, but for CESMIX I want to try and provide a Julia driver that talks to Flux and if necessary creates its own Flux instance to start subtasks in.

I am but fuzzy on the details since I am trying to learn Flux and it's capabilities, while simultaneously writing the bindings.

On Wed, Feb 10, 2021, 10:39 Mark Grondona [email protected] wrote:

BTW, what is the use case for using and starting Flux in this manner? Is it only to run automated tests of the Julia bindings in the build system, or is there some broader application?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/flux-framework/flux-core/issues/3508#issuecomment-776799724, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDO2VYP3PP6CDPVO6FCXLS6KSETANCNFSM4XME7JRQ .

vchuravy avatar Feb 10 '21 16:02 vchuravy

Testing is one use-case, but for CESMIX I want to try and provide a Julia driver that talks to Flux and if necessary creates its own Flux instance to start subtasks in.

Yes, this would be easy if Flux were to be installed to its configured prefix, and all its dependencies were included on the system.

It is very easy to relocate packages that are a single library (e.g. Giflib referenced above), but trying to relocate a package with lots of components (e.g. modules, plugins, and lots of internal subcommands) may cause you a lot of pain...

grondo avatar Feb 10 '21 16:02 grondo

It is very easy to relocate packages that are a single library (e.g. Giflib referenced above), but trying to relocate a package with lots of components (e.g. modules, plugins, and lots of internal subcommands) may cause you a lot of pain...

Yeah I fully understand that, we had to solve that for Julia, where we use a relative lookup based of the binary + environment variables if necessary.

IIUC there is special support for intree versus installed https://github.com/flux-framework/flux-core/blob/8d3ad84d3775c53bc5e24c73ef2e0af90179da17/src/common/libflux/Makefile.am#L18-L39 and https://github.com/flux-framework/flux-core/blob/8d3ad84d3775c53bc5e24c73ef2e0af90179da17/src/common/libflux/conf.c#L45-L105

Maybe it makes sense to rewire the intree support to be relative?

vchuravy avatar Feb 10 '21 17:02 vchuravy

Maybe it makes sense to rewire the intree support to be relative?

That may work for some paths, but for autotools it is often the case that sysconfdir is defined outside of any prefix (e.g. --prefix=/usr --sysconfdir=/etc) I'm not sure how those files under sysconfdir can ever be relocatable.

Perhaps if an executable discovers it is not intree, and it is not in prefix, it can fall back to relative paths, making an assumption that it has been relocated and sysconfdir, etc will always be under prefix.

I have no idea how much work that might entail.

grondo avatar Feb 10 '21 17:02 grondo

It might help us to understand why Julia needs to run in this way. I still feel a bit lost.

If you are making a Julia bindings package, my assumption is that package would be installed alongside a more typically installed flux-core (and flux-sched) packages (i.e. a full copy of flux-core would not be installed as part of your bindings package).

I may not understand how Julia works at all, but I assumed it behaved like other scripting languages.

grondo avatar Feb 10 '21 17:02 grondo

Ah, is it because of: https://julialang.github.io/Pkg.jl/v1/environments/

Are there Julia Slurm bindings? Do they also install a copy of Slurm and run it from the side-installed path?

How will this work when users want to use the Julia bindings with a system instance of Flux, or a copy of Flux that a workflow has started under an existing RM?

grondo avatar Feb 10 '21 17:02 grondo

Right one of my goals is to enable a workflow like:

Pkg.add("FluxRM")
using FluxRM
using Distributed

addprocs(FluxManager(), 4)

where we use Flux to start 4 sub-processes for a primary process that uses Distributed to communicate with them. In this case the user would ideally not be required to install Flux in their system.

Are there Julia Slurm bindings? Do they also install a copy of Slurm and run it from the side-installed path?

I started on that before we had BinaryBuilder/Yggdrasil, but stopped development of that partly because it was hard to have composability with the non-HPC centric workflow, e.g. just install Julia on your machine and get going.

How will this work when users want to use the Julia bindings with a system instance of Flux, or a copy of Flux that a workflow has started under an existing RM?

Similarly to MPI.jl you will be able to set a environment flag to use the system binaries/pickup the Flux RM from the environment.

vchuravy avatar Feb 10 '21 18:02 vchuravy

Also to be clear, I am shooting for the moon here. I won't let this be a blocker for the Julia bindings and I can focus on using Flux for an HPC centric environment where we can assume it being installed system wide. In the long-term I would like to be able to provide a more Julian story.

We had a positive experience with MPI.jl where adding BinaryBuilder provided binaries lowered the friction by enabling developers of other Julia packages to worry about whether the user installed MPI, but at least you could always get a working "single process" setup.

vchuravy avatar Feb 10 '21 18:02 vchuravy

As predicted:

2021-02-10T18:41:24.383235Z broker.err[0]: rc1.0:   File "/home/vchuravy/.julia/artifacts/a165ec9f736d09a2fd95aa360a79248e41b3ecbb/libexec/flux/cmd/flux-admin.py", line 15, in <module>
2021-02-10T18:41:24.383256Z broker.err[0]: rc1.0:     import flux
2021-02-10T18:41:24.383272Z broker.err[0]: rc1.0: ModuleNotFoundError: No module named 'flux'

vchuravy avatar Feb 10 '21 18:02 vchuravy

Ah, I missed that one. You can try removing that command in rc1 when you are running the Julia-only version of flux-core. It pushes some cleanup commands into the broker to be run at instance exit.

grondo avatar Feb 10 '21 18:02 grondo

@dongahn it would be great to address this. Having out of the box binaries available for Julia though https://github.com/JuliaPackaging/Yggdrasil would make FluxRM.jl a lot easier to use for new-comers.

vchuravy avatar Nov 17 '21 21:11 vchuravy

@vchuravy: yes I will discuss this with the team. It appears that addressing this will allow you to ship Flux with Julia which is a good vehicle for us to reach a large Julia community. This will be a huge win win as Flux will be used by more and be battle tested. I will get back you.

dongahn avatar Nov 18 '21 03:11 dongahn