vcpkg icon indicating copy to clipboard operation
vcpkg copied to clipboard

[blas] Make all the Android platforms consistent in ci.baseline.txt

Open BillyONeal opened this issue 1 year ago • 7 comments

Extended from that originally authored by @Cheney-W in https://github.com/microsoft/vcpkg/pull/38097

The reason arm-neon-android behaved differently is that there were these skips applied to arm64 and x64 android.

BillyONeal avatar Apr 29 '24 21:04 BillyONeal

/azp run

BillyONeal avatar Apr 30 '24 00:04 BillyONeal

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Apr 30 '24 00:04 azure-pipelines[bot]

So this is my analysis:

# Required fails due to broken

openblas:x64-android=fail # broken! ######### Why is this broken but arm64-android works ?
lapack-reference:arm-neon-android=fail # broken!


clapack:x64-android=skip
lapack-reference:x64-android=fail # broken!
clapack:arm64-android=skip
lapack-reference:arm64-android=fail # broken!

######## Maybe clapack works on android if openblas works?

# Test -> Should always pass if blas/lapack is supported
blas-test:x86-windows=pass # openblas
lapack-test:x86-windows=pass # lapack-reference[noblas]
blas-test:x64-windows-static=pass # lapack-reference[blas]
lapack-test:x64-windows-static=pass # lapack-reference[blas]
blas-test:x64-windows=pass # openblas
lapack-test:x64-windows=pass # lapack-reference[noblas]
blas-test:x64-windows-static-md=pass # lapack-reference[blas]
lapack-test:x64-windows-static-md=pass # lapack-reference[blas]
blas-test:arm64-windows=pass # openblas
lapack-test:arm64-windows=pass # clapack
blas-test:x64-uwp=pass # openblas
lapack-test:x64-uwp=pass # clapack
blas-test:arm64-uwp=pass # openblas
lapack-test:arm64-uwp=pass # clapack
blas-test:x64-linux=pass # openblas
lapack-test:x64-linux=pass # lapack-reference[noblas]
blas-test:arm64-osx=pass # accelerate framework
lapack-test:arm64-osx=pass # accelerate framework
blas-test:x64-osx=pass # accelerate framework
lapack-test:x64-osx=pass # accelerate framework

blas-test:arm64-android=pass # openblas ############## why does this work but x64-android not ?

#Failing tests (might be unnecessary if the dep already is broken.):
lapack-test:arm64-android=fail # lapack-reference[noblas] broken!
# arm-neon-android is broken!
blas-test:arm-neon-android=fail # openblas broken!
lapack-test:arm-neon-android=fail # lapack-reference[noblas] broken!
# x64-android is broken!

lapack-test:x64-android=fail # lapack-reference[noblas]

# unnecessary fails:
blas-test:x64-android=fail ###### openblas -> openblas is already failing

# Unnecessary skips/no collision only good to avoid pulling them accidently in, 
# however having an error due to them would be good since it would indicate a 
# location in need of a fix:
lapack-reference:arm64-osx=skip # Either this or the one before
openblas:arm64-osx=skip
lapack-reference:x64-osx=skip # Either this or the one before
openblas:x64-osx=skip

openblas:x64-windows-static=skip
openblas:x64-windows-static-md=skip

# Required skips due to possible collisions between clapack/lapack-reference
clapack:x86-windows=skip
clapack:x64-windows=skip
clapack:x64-windows-static-md=skip
clapack:x64-windows-static=skip
clapack:x64-linux=skip
clapack:arm64-osx=skip
clapack:x64-osx=skip
lapack-reference:x64-uwp=skip
lapack-reference:arm64-windows=skip
lapack-reference:arm64-uwp=skip

clapack:arm-neon-android=skip #### clapack should probably be used if openblas works. 

# Unnecessary pass (will be enforce by blas/lapack-test):
lapack-reference:x86-windows=pass
openblas:x86-windows=pass
lapack-reference:x64-windows=pass
openblas:x64-windows=pass
lapack-reference:x64-windows-static=pass
lapack-reference:x64-windows-static-md=pass
openblas:x64-uwp=pass
clapack:x64-uwp=pass
clapack:arm64-windows=pass
clapack:arm64-uwp=pass
openblas:arm64-windows=pass
openblas:arm64-uwp=pass
openblas:arm64-android=pass

lapack-reference:x64-linux=pass
openblas:x64-linux=pass

# Questionable pass since seemingly not useable:
openblas:arm-neon-android=pass


a) Please don't add unnecessary pass/fail, switching blas/lapack implementations is otherwise a pain. Having the test ports pass is enough. b) Please don't mark stuff as skip if there is no collision between the ports e.g. only clapack/lapack-reference should be marked with skip c) Only mark root ports like openblas/lapack-reference/clapack as fail if the corresponding blas/lapack-test port is failing. The test ports do a compile check if the found blas/lapack implementation has an expected symbol and only fails if that does not compile. If that is the case the root ports could also be marked with "supports": "!<what-failed>" d) the android situation needs some more debugging. arm64 works but x64 does not for openblas? That seems questionable. Also forcing clapack here could be a way forward (currently using lapack-reference?). Android cannot use lapack-reference due to the gfortran libs not being compiled for android??

So from that analyzes the following should be applied:

blas-test:x86-windows=pass # openblas
lapack-test:x86-windows=pass # lapack-reference[noblas]
blas-test:x64-windows-static=pass # lapack-reference[blas]
lapack-test:x64-windows-static=pass # lapack-reference[blas]
blas-test:x64-windows=pass # openblas
lapack-test:x64-windows=pass # lapack-reference[noblas]
blas-test:x64-windows-static-md=pass # lapack-reference[blas]
lapack-test:x64-windows-static-md=pass # lapack-reference[blas]
blas-test:arm64-windows=pass # openblas
lapack-test:arm64-windows=pass # clapack
blas-test:x64-uwp=pass # openblas
lapack-test:x64-uwp=pass # clapack
blas-test:arm64-uwp=pass # openblas
lapack-test:arm64-uwp=pass # clapack
blas-test:x64-linux=pass # openblas
lapack-test:x64-linux=pass # lapack-reference[noblas]
blas-test:arm64-osx=pass # accelerate framework
lapack-test:arm64-osx=pass # accelerate framework
blas-test:x64-osx=pass # accelerate framework
lapack-test:x64-osx=pass # accelerate framework

blas-test:arm64-android=pass # openblas ############## why does this work but x64-android not ?

# Required skips due to possible collisions between clapack/lapack-reference
clapack:x86-windows=skip
clapack:x64-windows=skip
clapack:x64-windows-static-md=skip
clapack:x64-windows-static=skip
clapack:x64-linux=skip
clapack:arm64-osx=skip
clapack:x64-osx=skip
lapack-reference:x64-uwp=skip
lapack-reference:arm64-windows=skip
lapack-reference:arm64-uwp=skip

<+ more skips for the android situation probably 
lapack-reference:x64-android=skip
lapack-reference:arm64-android=skip 
# and force clapack
>

#### Should be marked via supports:
# Questionable pass since seemingly not useable:
openblas:arm-neon-android=pass

openblas:x64-android=fail # broken! ######### Why is this broken but arm64-android works ?
# clapack:x64-android=fail # openblas already fails
clapack:arm64-android=????? # openblas works so this could also work?

Neumann-A avatar Apr 30 '24 06:04 Neumann-A

Furthermore: If Android uses GCC then the Fortran flags for Android are probably not being setup by the Android toolchain? So switching to clapack could solve that issue maybe.

Neumann-A avatar Apr 30 '24 07:04 Neumann-A

openblas:x64-android=fail # broken! ######### Why is this broken but arm64-android works ? lapack-reference:arm-neon-android=fail # broken!

I don't know. I am making ci.baseline.txt consistent with the status quo so that people editing other PRs don't experience baseline issues, I'm not trying to improve Android support right now.

The status quo I'm looking at is Friday's build: https://dev.azure.com/vcpkg/public/_build/results?buildId=102383&view=results

Installing 207/2154 lapack-reference[blas-select,core,noblas]:[email protected]#5...
Building lapack-reference[blas-select,core,noblas]:[email protected]#5...
-- Downloading https://github.com/Reference-LAPACK/lapack/archive/v3.11.0.tar.gz -> Reference-LAPACK-lapack-v3.11.0.tar.gz...
-- Extracting source /vcpkg/downloads/Reference-LAPACK-lapack-v3.11.0.tar.gz
-- Applying patch cmake-config.patch
-- Applying patch lapacke.patch
-- Applying patch fix_prefix.patch
-- Using source at /mnt/vcpkg-ci/b/lapack-reference/src/v3.11.0-6ae738f586.clean
-- The Fortran compiler identification is unknown
CMake Error at scripts/cmake/vcpkg_find_fortran.cmake:62 (message):
  Unable to find a Fortran compiler using 'CMakeDetermineFortranCompiler'.
  Please install one (e.g.  gfortran) and make it available on the PATH!
Call Stack (most recent call first):
  ports/lapack-reference/portfile.cmake:47 (vcpkg_find_fortran)
  scripts/ports.cmake:175 (include)


error: building lapack-reference:arm-neon-android failed with: BUILD_FAILED
Elapsed time to handle lapack-reference:arm-neon-android: 1.1 s
Installing 208/2154 lapack:arm-neon-android@2023-06-10...
Building lapack:arm-neon-android@2023-06-10...
error: building lapack:arm-neon-android failed with: CASCADED_DUE_TO_MISSING_DEPENDENCIES
  due to the following missing dependencies:
    lapack-reference[blas-select]:arm-neon-android
    lapack-reference[core]:arm-neon-android
Elapsed time to handle lapack:arm-neon-android: 47.7 us
Installing 206/2253 openblas:[email protected]...
Building openblas:[email protected]...
-- Using cached OpenMathLib-OpenBLAS-v0.3.27.tar.gz.
-- Extracting source /vcpkg/downloads/OpenMathLib-OpenBLAS-v0.3.27.tar.gz
-- Applying patch uwp.patch
-- Applying patch fix-redefinition-function.patch
-- Applying patch install-tools.patch
-- Using source at /mnt/vcpkg-ci/b/openblas/src/v0.3.27-b066c33329.clean
-- Configuring x64-android
CMake Error at scripts/cmake/vcpkg_execute_required_process.cmake:112 (message):
    Command failed: /vcpkg/downloads/tools/ninja/1.10.2-linux/ninja -v
    Working Directory: /mnt/vcpkg-ci/b/openblas/x64-android-rel/vcpkg-parallel-configure
    Error code: 1
    See logs for more information:
      /mnt/vcpkg-ci/b/openblas/config-x64-android-dbg-CMakeCache.txt.log
      /mnt/vcpkg-ci/b/openblas/config-x64-android-rel-CMakeCache.txt.log
      /mnt/vcpkg-ci/b/openblas/config-x64-android-out.log

Call Stack (most recent call first):
  /mnt/vcpkg-ci/installed/x64-linux/share/vcpkg-cmake/vcpkg_cmake_configure.cmake:252 (vcpkg_execute_required_process)
  ports/openblas/portfile.cmake:63 (vcpkg_cmake_configure)
  scripts/ports.cmake:175 (include)


error: building openblas:x64-android failed with: BUILD_FAILED
Elapsed time to handle openblas:x64-android: 1.6 s

# unnecessary fails: blas-test:x64-android=fail ###### openblas -> openblas is already failing

I want someone who fixes openblas to need to change this to =pass.

a) Please don't add unnecessary pass/fail, switching blas/lapack implementations is otherwise a pain. Having the test ports pass is enough.

I think anything that changes anything about which blas/lapack backend is selected needs to evaluate this list; that this was not done in https://github.com/microsoft/vcpkg/pull/38097 is why we are having this baseline issue. (I didn't know about this separate handling in ci.baseline.txt when I reviewed over there)

b) Please don't mark stuff as skip if there is no collision between the ports e.g. only clapack/lapack-reference should be marked with skip

I marked them skip because it is important that blas-test and lapack-test pass without them.

c) Only mark root ports like openblas/lapack-reference/clapack as fail if the corresponding blas/lapack-test port is failing.

That is not possible. People editing PRs unrelated to blas and lapack should not be getting baseline errors. ci.baseline.txt is a reflection of the status quo, not what we want the world to be.

d) the android situation needs some more debugging.

I agree! But I don't myself have time to do that right now, and blocking unrelated PRs while that is debugged is an unacceptable outcome.

BillyONeal avatar Apr 30 '24 19:04 BillyONeal

I would like the other maintainers to consider Neumann-A's feedback above

BillyONeal avatar Apr 30 '24 19:04 BillyONeal

I marked them skip because it is important that blas-test and lapack-test pass without them.

No the important part is that they pass selecting the correct blas/lapack implementation. Since BLA_VENDOR is fixed that shouldn't be an issue. You could probably inspect the cmakecache of the test to make sure the correct implementation was selected. (Before BLA_VENDOR was fixed it would have been an issue.) Marking them as skipped in the baseline just means reduced ci test coverage no longer building the ports.

That is not possible. People editing PRs unrelated to blas and lapack should not be getting baseline errors. ci.baseline.txt is a reflection of the status quo, not what we want the world to be.

I agree with that but the only observed CI error seems to be in android. So marking:

openblas:x64-android=fail/skip
blas:arm-neon-android=fail/skip

should be enough or not?

I think anything that changes anything about which blas/lapack backend is selected needs to evaluate this list;

No. This only needs to be evaluated for lapack-test:=pass (lapack should always depend on blas, so marking this pass means blas-test also passes) (or blas-test:=pass # if lapack-test is a known failure)

The idea is to mark the final downstream dep as passing since that means all upstream deps pass. Marking everything as pass is just micromanagement.

also changing blas/lapack backend should have no visible impact on the test. They still should pass. The only impact it can have is conflicting libraries being installed which would require a selection in the baseline.

Neumann-A avatar Apr 30 '24 19:04 Neumann-A

@ras0219-msft @AugP @data-queue @BillyONeal @JavierMatosD

Before the meeting @JavierMatosD indicated that the supports expression outcome seemed reasonable.

(discussion for) We are happy with skipping more than absolutely necessary: (Neumann-A part B)

@ras0219-msft : This is morally what we have to do whenever there are alternatives so it isn't that much of a disservice if we do that for the BLAS alternatives.

(discussion for) We want supports expressions for blas and lapack if the corresponding tests are broken: (Neumann-A part C)

@ras0219-msft: In this case, supports expressions make sense however if for whatever reason supports did not make sense we want ci.baseline.txt to reflect the reality of the baseline, even if that combination might not make sense.

(discussion for) We want extra pass for the backends: (Neumann-A part A)

@BillyONeal I'm OK with this on the grounds that they don't do anything; they're more of a comment than an actually meaningful directive. I'm not sold on the "but that means if the backend changes I have to edit ci.baseline.txt" because I think any time backend selection changes the list needs to be re-audited.

We did not discuss Neumann-A's part D because we agree it should be fixed but we can't block baseline issues on that being fixed.

Result action items:

  • Billy to remove the extra =pass on each of the backends
  • Billy to change the supports expression on blas and lapack to reflect which tests work
  • The extra skips in ci.baseline.txt for supported platforms for BLAS not made irrelevant by the supports expressions remain

BillyONeal avatar May 16 '24 21:05 BillyONeal