scalapack icon indicating copy to clipboard operation
scalapack copied to clipboard

`make -j all` (build using multiple jobs) may not work

Open weslleyspereira opened this issue 2 years ago • 7 comments

PR #53 fixed the dependencies of the ScaLAPACK libraries on BLACS. It should now be safe to use make -j lib, i.e., to use multiple jobs to build. However, make -j all may still fail as related in PR #53.

weslleyspereira avatar Mar 07 '22 17:03 weslleyspereira

Upon further inspection, make -j lib is still buggy, probably because of this:

$ cat SRC/Makefile
[...]
single: $(SLASRC) $(SCLAUX) $(ALLAUX)
	$(ARCH) $(ARCHFLAGS) ../$(SCALAPACKLIB) $(SLASRC) $(SCLAUX) \
	$(ALLAUX)
	$(RANLIB) ../$(SCALAPACKLIB)

complex: $(CLASRC) $(SCLAUX) $(ALLAUX)
	$(ARCH) $(ARCHFLAGS) ../$(SCALAPACKLIB) $(CLASRC) $(SCLAUX) \
	$(ALLAUX)
	$(RANLIB) ../$(SCALAPACKLIB)

double: $(DLASRC) $(DZLAUX) $(ALLAUX)
	$(ARCH) $(ARCHFLAGS) ../$(SCALAPACKLIB) $(DLASRC) $(DZLAUX) \
	$(ALLAUX)
	$(RANLIB) ../$(SCALAPACKLIB)

complex16: $(ZLASRC) $(DZLAUX) $(ALLAUX)
	$(ARCH) $(ARCHFLAGS) ../$(SCALAPACKLIB) $(ZLASRC) $(DZLAUX) \
	$(ALLAUX)
	$(RANLIB) ../$(SCALAPACKLIB)
[...]

The target files are being overwritten by the different rules, and one may end up with errors such as:

/usr/bin/ar: ../libscalapack.a: error reading blacs_grid_.oo: file truncated
/usr/bin/ar: ../libscalapack.a: error reading free_handle_.o: file truncated

prj- avatar Mar 08 '22 13:03 prj-

Just FYI, one functioning approach (at least that's what's being done when building ScaLAPACK in PETSc) is available in this commit: https://bitbucket.org/petsc/pkg-scalapack/commits/bf5290f5d002fe43a6fc59f83730ff5141860380. But I don't know if you'd be willing to merge that in @langou?

prj- avatar Mar 08 '22 13:03 prj-

Hi @prj-. I agree, there is a problem there. Thanks for taking your time to explain this!

I am sorry but I am not sure I understood your suggestion for the use of https://bitbucket.org/petsc/pkg-scalapack/commits/bf5290f5d002fe43a6fc59f83730ff5141860380. Are you suggesting we call $(ARCH) $(ARCHFLAGS) $(SCALAPACKLIB) ... only once for all precisions? Or is it something else?

Using a single call of $(ARCH) $(ARCHFLAGS) $(SCALAPACKLIB) ... is a good way to go in my opinion. That is also working on LAPACK (https://github.com/Reference-LAPACK/lapack/blob/a7c0392a190998f550cbf064f711b9575c8410cf/SRC/Makefile)

ALLOBJ = $(SLASRC) $(DLASRC) $(DSLASRC) $(CLASRC) $(ZLASRC) $(ZCLASRC) \
   $(SCLAUX) $(DZLAUX) $(ALLAUX)

ifdef USEXBLAS
ALLXOBJ = $(SXLASRC) $(DXLASRC) $(CXLASRC) $(ZXLASRC)
endif

ifdef BUILD_DEPRECATED
DEPRECATED = $(DEPRECSRC)
endif

.PHONY: all
all: $(LAPACKLIB)

LAPACKLIB_DEPS := $(ALLOBJ)

ifdef USEXBLAS
  LAPACKLIB_DEPS += $(ALLXOBJ)
endif

ifdef BUILD_DEPRECATED
  LAPACKLIB_DEPS += $(DEPRECATED)
endif

$(LAPACKLIB): $(LAPACKLIB_DEPS)
	$(AR) $(ARFLAGS) $@ $^
	$(RANLIB) $@

With this kind of approach, I think we would still be able to build for a single precision type.

weslleyspereira avatar Mar 11 '22 20:03 weslleyspereira

Are you suggesting we call $(ARCH) $(ARCHFLAGS) $(SCALAPACKLIB) ... only once for all precisions? Or is it something else?

I'm now realizing that the trick there is indeed to have a single rule for all precisions, which may not be acceptable to you.

With this kind of approach, I think we would still be able to build for a single precision type.

~~Right, I think this file https://bitbucket.org/petsc/pkg-scalapack/src/bf5290f5d002fe43a6fc59f83730ff5141860380/Makefile.objs could be parsed to split objects among the different precisions.~~ (not sure whether this is really needed) Then, the Makefile would match the one of LAPACK indeed.

prj- avatar Mar 12 '22 07:03 prj-

Using a single ar cr is the right thing to do, rather than multiple ar cr updating the same file. In terms of correctness and speed, I advocate moving to a non-recursive Makefile, where the top-level Makefile can see the whole dependency DAG. See "Recursive Make Considered Harmful". A non-recursive Makefile could be either a single Makefile for the entire project (SLATE uses this approach), or a top-level Makefile that includes Makefiles for each sub-directory (MAGMA uses this approach), which has the same logical effect. With this non-recursive structure, linking a shared library (.so) is easy, too.

Here's an example non-recursive Makefile structure. I'm sure there are many more subtleties in ScaLAPACK's Makefiles.

There are ways to write it such that running make in, say, the BLACS directory compiles only the BLACS, but I suspect this is overkill for ScaLAPACK.

The use of make single double complex complex16 seems inherently problematic for parallelism. It defines 4 targets that all write via ar cr to the same library. Better would be to define 4 variables that select the appropriate files to go into $(lib_obj). E.g., from the example repo:

single ?= 1
...
ifeq ($(single),1)
    lib_obj += $(dir1_s_obj)
endif

mgates3 avatar Mar 13 '22 17:03 mgates3

Here's a take on implementing this in ScaLAPACK, so far just the BLACS library. If this seems like a good direction, can @weslleyspereira or someone do the other directories? Or I can, but may not have time immediately. I use this myself and have seen several people burned by make -j failing, so it seems useful to fix. https://github.com/mgates3/scalapack/tree/non-recursive-make https://github.com/mgates3/scalapack/commit/d51c7090cf08375b3f485a2692ec36c63ab7d4cd?diff=split

mgates3 avatar Mar 14 '22 18:03 mgates3

Thanks for the complete answers, @mgates3 !! I also don't know if I have time to work on it now, but I will keep your solution in my radar.

weslleyspereira avatar Mar 14 '22 20:03 weslleyspereira