HiGHS icon indicating copy to clipboard operation
HiGHS copied to clipboard

help debugging HiGHS crash

Open rdzman opened this issue 6 months ago • 24 comments

I have a rather large MILP that consistently causes to HiGHS to crash. Initially, I experienced it while calling HiGHS 1.11.0 from MATLAB, via HiGHSMEX, where it crashed with a "bad allocation" error. So I tried exporting the model (which, unfortunately, I am not authorized to share) and running it from the command-line.

It still crashes, but no error message is shown (see full output below).

This is on a machine that appears to have plenty of memory, 128 GB RAM total, with ~55GB free during the vast majority of the run. Right before crashing the memory usage goes up temporarily another 10 GB or so, but at least the Task Manager never shows the memory getting close to fully used.

I have Visual Studio installed and I built my own HiGHS binary from source, but am not a C developer. I was wondering if someone would be willing to help walk me through debugging this crash to find the cause.

Console output:

>highs.exe highs_milp_crash.mps
Running HiGHS 1.11.0 (git hash: 364c83a51): Copyright (c) 2025 HiGHS under MIT licence terms
Set option log_file to "HiGHS.log"
Number of BV entries in BOUNDS section is 65104
MIP  highs_milp_crash has 758520 rows; 1429008 cols; 387428763 nonzeros; 73416 integer variables (65104 binary)
Coefficient ranges:
  Matrix [1e-03, 3e+04]
  Cost   [3e-04, 4e+05]
  Bound  [6e-05, 1e+08]
  RHS    [1e+00, 2e+03]
Presolving model
711903 rows, 1330735 cols, 328598695 nonzeros  100s
677746 rows, 1247711 cols, 250547442 nonzeros  358s

Solving MIP model with:
   677746 rows
   1247711 cols (65004 binary, 0 integer, 1076 implied int., 1181631 continuous, 0 domain fixed)
   250547442 nonzeros

Src: B => Branching; C => Central rounding; F => Feasibility pump; J => Feasibility jump;
     H => Heuristic; L => Sub-MIP; P => Empty MIP; R => Randomized rounding; Z => ZI Round;
     I => Shifting; S => Solve LP; T => Evaluate node; U => Unbounded; X => User solution;
     z => Trivial zero; l => Trivial lower; u => Trivial upper; p => Trivial point

        Nodes      |    B&B Tree     |            Objective Bounds              |  Dynamic Constraints |       Work     
Src  Proc. InQueue |  Leaves   Expl. | BestBound       BestSol              Gap |   Cuts   InLp Confl. | LpIters     Time

         0       0         0   0.00%   -971516.964608  inf                  inf        0      0      0         0  2184.1s

>

rdzman avatar Jun 18 '25 00:06 rdzman

Thank you for getting in touch with us about this issue, let's try to get to the bottom of it! I have a few questions which may help us get started.

How did you build HiGHS? Did you use cmake commands from powershell, or the CMake extention or integration that they have in Visual Studio?

Could you possibly share the CMake output log that you get when you build the executable, before the compilation step? From powershell, it is what you get when you run

cmake -S . -B build

from the root directory of HiGHS.

Can you step through the lines of the code if you run try to debug the executable you've built?

galabovaa avatar Jun 18 '25 00:06 galabovaa

Here's how I built HiGHS:

  1. Cloned the repo from GitHub.
  2. In CMakeLists.txt, changed set(CMAKE_CXX_STANDARD 11) to set(CMAKE_CXX_STANDARD 11) (recommended by HiGHSMEX author for building HiGHSMEX)
  3. Using "x64 Native Tools Command Prompt for VS 2022"
cmake -S. -B build -DCMAKE_INSTALL_PREFIX=<my-HiGHS-install-dir>
cmake --build build --config Release
cmake --install build

The output of the first cmake command is:

cmake -S. -B build -DCMAKE_INSTALL_PREFIX=<my-HiGHS-install-dir>
-- Building for: Visual Studio 17 2022
-- The CXX compiler identification is MSVC 19.44.35211.0
-- The C compiler identification is MSVC 19.44.35211.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Professional/VC/Tools/MSVC/14.44.35207/bin/Hostx64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Professional/VC/Tools/MSVC/14.44.35207/bin/Hostx64/x64/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- HIGHS version: 1.11.0
-- Git hash: 364c83a51
-- Build C++ library: ON
-- Build C++ library: ON
-- Build Fortran: OFF
-- Build CSharp: OFF
-- Build Python: OFF
-- Build all tests: OFF
-- ZLIB: ON
-- Build pdlp with GPU: OFF
-- Use FindCUDAConf: OFF
-- Configuration types: Debug;Release;MinSizeRel;RelWithDebInfo
-- Looking for C++ include sys/types.h
-- Looking for C++ include sys/types.h - found
-- Looking for C++ include stdint.h
-- Looking for C++ include stdint.h - found
-- Looking for C++ include stddef.h
-- Looking for C++ include stddef.h - found
-- Check size of long
-- Check size of long - done
-- Found long size: 4
-- Check size of long long
-- Check size of long long - done
-- Found long long size: 8
-- Check size of int64_t
-- Check size of int64_t - done
-- Found int64_t size: 8
-- Check size of unsigned long
-- Check size of unsigned long - done
-- Found unsigned long size: 4
-- Check size of unsigned long long
-- Check size of unsigned long long - done
-- Found unsigned long long size: 8
-- Check size of uint64_t
-- Check size of uint64_t - done
-- Found uint64_t size: 8
-- Check size of int *
-- Check size of int * - done
-- Found int * size: 8
-- IPO / LTO supported by compiler: YES
-- IPO / LTO: disabled by default when building a static library; set CMAKE_INTERPROCEDURAL_OPTIMIZATION=ON to enable
-- Performing Test HIGHS_HAVE_MM_PAUSE
-- Performing Test HIGHS_HAVE_MM_PAUSE - Success
-- Performing Test HIGHS_HAVE_BITSCAN_REVERSE
-- Performing Test HIGHS_HAVE_BITSCAN_REVERSE - Success
-- Performing Test COMPILER_SUPPORTS_POPCNT
-- Performing Test COMPILER_SUPPORTS_POPCNT - Failed
-- Could NOT find ZLIB (missing: ZLIB_LIBRARY ZLIB_INCLUDE_DIR) (Required is at least version "1.2.3")
-- FAST_BUILD set to on.
-- Build examples: ON
-- Build C++ example: ON
-- Build CSharp example: OFF
-- Build dotnet package: OFF
-- Performing Test COMPILER_SUPPORTS_INVALID_OFFSET
-- Performing Test COMPILER_SUPPORTS_INVALID_OFFSET - Failed
-- No CSharp support
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - not found
-- Found Threads: TRUE
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR - Failed
-- Performing Test COMPILER_HAS_DEPRECATED
-- Performing Test COMPILER_HAS_DEPRECATED - Success
-- Configuring test <my-HiGHS-src-dir>/examples/call_highs_from_cpp.cpp: ...
-- Configuring test <my-HiGHS-src-dir>/examples/call_highs_from_cpp.cpp: ...DONE
<my-HiGHS-src-dir>/examples/call_highs_from_c_minimal.c
-- Configuring test <my-HiGHS-src-dir>/examples/call_highs_from_c_minimal.c: ...
-- Configuring test <my-HiGHS-src-dir>/examples/call_highs_from_c_minimal.c: ...DONE
-- .Net: Use .Net Framework 2.1 support: ON
-- Configuring done (21.2s)
-- Generating done (0.2s)
-- Build files have been written to: <my-HiGHS-src-dir>/build

I have also removed the --config Release from the second line to build a debug version, but haven't yet tried running the model with that executable. As far as stepping through lines of code ... I'm embarrassed to say, that's the part I don't know how to get to. How do I "try to debug the executable"? What are the steps for launching the exe in the debugger?

rdzman avatar Jun 18 '25 16:06 rdzman

Great so far, nothing to be embarrassed about!

We ourselves do not use Visual Studio much, we use Visual Studio Code because it is lightweight and platform independent. Still, VS is a very powerful tool and I, too, am still learning about it.

Reading up on how to debug in VS, as I do every time, I see that there are several options. Let’s try the one with VS’s native CMake integration first.

Please note, that for this you do not need to change the cxx standard like for HiGHSMEX, and you also do not need to specify an install directory. For debugging, the executable in the generated build directory is easier to use.

Before you start, please run the debug executable you generated, and check if you get any error messages.

Then, run

cmake —build build —parallel —config Debug

To make sure Debug is explicitly specified and check whether that compiles anything further. The executable is located in

HiGHS/build/Debug/bin/highs.exe

Or

HiGHS/build/Release/bin/highs.exe

For the Release config. So, you don’t need to call the cmake install step. Please let me know if running this on your problem prints out any new information.

Then delete the build/ subfolder of HiGHS. Possibly from the Recycle Bin as well.

Open CMake Project Directly in VS

  1. Open VS and select "Open a local folder"

    • Navigate to the HiGHS folder
    • VS should automatically configure CMake settings.
  2. Configure build settings

    • Ensure you're building Debug configuration (-DCMAKE_BUILD_TYPE=Debug)
    • In VS, check the configuration dropdown (next to the green debug arrow)
  3. Set startup item

    • Right-click on your executable target in Solution Explorer
    • Select "Set as Startup Item"
  4. Debug

    • Set breakpoints in source files, for instance in the run(..) method of HiGHS/highs/Highs.cpp
    • Press F5 to start debugging

Please let me know how it goes, and also if you are not sure about how to do any of the steps above!

galabovaa avatar Jun 18 '25 19:06 galabovaa

Thank you. Before you responded, I was eventually able to figure out how to get into the VS debugger with the debug version I'd already built above.

Any particular place I should set a breakpoint or single-step? At the moment, I'm in the debugger sitting at line 708 in HighsMipSolverData.cpp, ready to call presolve.run().

I'm happy to go back and follow your step-by-step directions exactly if that's important, but I'd like to make use of this current run to get something useful, if possible, since it takes forever (like about an hour) to load the model from the .mps file when debugging.

rdzman avatar Jun 18 '25 21:06 rdzman

Btw, I also use VS Code more than VS, so I'm happy to run it from there if that's easier.

rdzman avatar Jun 18 '25 21:06 rdzman

No, the steps above were only to help you get started with the debugger. VS Code is good, but the debugging should be identical, I imagine.

Going from presolve.run() is fine. Since your presolve finished OK, you should step over this step, and keep stepping over until, hopefully, it pauses on the bit where it fails with some error message.

You could also break at

HighsMipSolver::run() before the restart: at mipdata_->runSetup() and single step from there and keep an eye at the output for anything new from the debug version

galabovaa avatar Jun 19 '25 00:06 galabovaa

Thanks again. Is it normal for C++ code to run an order of magnitude (or more) slower in the debugger, or is something wrong? The non-debug run took just over 1/2 an hour to do the presolve. In the debugger it's been running for many hours and still hasn't finished. And as I mentioned above, the reading of the .mps file took a lot longer with the debugger.

rdzman avatar Jun 19 '25 19:06 rdzman

Yes, not only is the code not optimized, but additional checking code is run in debug, sorry

jajhall avatar Jun 19 '25 20:06 jajhall

Just curious ... is the slowness because of running in the debugger itself ... or simply because it's the debug build? Just wondering whether running a debug build normally (outside the debugger) might produce a useful error message in much less time.

rdzman avatar Jun 19 '25 22:06 rdzman

Indeed, I think it's significantly faster run from the command line in debug.

jajhall avatar Jun 19 '25 22:06 jajhall

What happens if you use the MATLAB integer programming solver - which (since 2024a) is HiGHS?

jajhall avatar Jun 20 '25 10:06 jajhall

I had already tried that and thought I'd reported the result, but apparently I never mentioned it after all.

While the output before the crash is different, it does look to me like HiGHS is probably crashing and MATLAB isn't quite prepared to handle it gracefully. Here's the output ...

Running HiGHS 1.7.1: Copyright (c) 2024 HiGHS under MIT licence terms
Coefficient ranges:
  Matrix [1e-03, 3e+04]
  Cost   [3e-04, 4e+05]
  Bound  [6e-05, 1e+08]
  RHS    [1e+00, 2e+03]
Presolving model
711928 rows, 1330735 cols, 328598749 nonzeros  78s
677710 rows, 1247699 cols, 250547370 nonzeros  292s

Solving MIP model with:
   677710 rows
   1247699 cols (65004 binary, 0 integer, 1076 implied int., 1181619 continuous)
   250547370 nonzeros

        Nodes      |    B&B Tree     |            Objective Bounds              |  Dynamic Constraints |       Work      
     Proc. InQueue |  Leaves   Expl. | BestBound       BestSol              Gap |   Cuts   InLp Confl. | LpIters     Time

         0       0         0   0.00%   -971516.964608  inf                  inf        0      0      0         0   363.8s
         0       0         0   0.00%   163264410.5179  inf                  inf        0      0      2    321826   773.1s
         0       0         0   0.00%   163268912.0735  inf                  inf    16739    999      5    324633  1036.0s
         0       0         0   0.00%   170319878.854   inf                  inf    17668   9331      8    410050  2179.9s
         0       0         0   0.00%   173701240.5982  inf                  inf    24794  14883     10    451828  3523.1s
         0       0         0   0.00%   174568417.0171  inf                  inf    29934  18007     12    477488  4717.1s
Error using intlinprog (line 174)
Unrecognized field name "optimstatus".

I was previously encountering what appeared to be the same issue on the LP relaxation of this problem, which I reported to The MathWorks. More recently, I was not able to reproduce the problem with the LP relaxation.

Btw, still waiting for the debugger to finish the presolve step. While I wait, I think I'll try to run the debug version from the command line on my other machine and see if it gives something useful. It's only got 64 GB of RAM (vs 128 GB) so I'm afraid it might encounter unrelated issues due to memory being tight. I'll let you know what I find.

rdzman avatar Jun 20 '25 16:06 rdzman

Indeed, I think it's significantly faster run from the command line in debug.

Hmmm. I'm not convinced 😜. Based on what I'm seeing so far, it appears running the Debug build from the command line without the debugger attached is nearly as slow as running it in the debugger. It's at least multiple times slower than the Release build. I plan to leave both running over the weekend. Hopefully, we'll have some helpful info from at least one of the runs by Monday.

rdzman avatar Jun 20 '25 23:06 rdzman

Quick update. Both runs are still going. The one running in the debugger stepped into heuristics.randomizedRounding(...) inside evaluateRootNode() yesterday sometime and hasn't yet completed. Based on the pattern of memory usage seen in Task Manager, I'm guessing the other run is doing the same thing. I'll keep you posted.

FWIW, getting to the point where it prints the first progress line after presolve takes ~2 orders of magnitude longer with the debug build than the release build (on this problem).

rdzman avatar Jun 24 '25 16:06 rdzman

Just encountered the following in the run from the command line (without the debugger)

Image

Should I click "Retry"? Are there logs somewhere that would be useful?

rdzman avatar Jun 24 '25 18:06 rdzman

now that it has paused, see if in VS it is paused on the line where the application has crashed.

See if you can find the call stack anywhere

I am not sure if clicling Retry would get you back to where you were or re-start the debugging

galabovaa avatar Jun 24 '25 18:06 galabovaa

sorry, you did mention it is from the command line. I don't think there is any output log there.

probably your debugger will make it to the same point relatively soon

galabovaa avatar Jun 24 '25 18:06 galabovaa

Yeah, clicking Retry just exited with no new info. Hopefully the run in the debugger on my other machine will prove more useful.

rdzman avatar Jun 24 '25 18:06 rdzman

Well, the run where I'm stepping in the debugger is still somewhere in heuristics.randomizedRounding(...) after 2 weeks and I don't know how much longer I can let it run (machine needed for other tasks).

Any other suggestions for how to debug the case using a release build? It's looking like the debug build, and especially running in the debugger, is simply too slow to be useful. I'd be happy to build and run a version that has strategically placed print or logging statements included if one of you would care to supply the code for it.

rdzman avatar Jul 07 '25 14:07 rdzman

FWIW, if I pause the debugger, it is consistently stopped here ...

Image

rdzman avatar Jul 07 '25 20:07 rdzman

It is hard to tell from this whether the problem really is in randomizedRounding.

You could possibly add print statements at some stages eg. start of the root node and before heuristics, to see at which stage the release version would crash.

You could try commenting out heuristics.randomizedRounding(...) to see if this would work. Any code with commented out bits in release mode, you could run on a small problem just as a sanity check.

At the moment, there is no easy way to switch some bits off, but with others like heuristics, simply commenting out the code would work.

You could also try to run the release version with an increased logging setting. You would need to make an options file (.txt should work, say option_file_name.txt) with the line

log_dev_level=1

then pass it to highs with

highs.exe --options_file=../option_file_name.txt model.mps

The possible values of of log_dev_level are 0,1,2,3 where 0 is the default minimum logging and 3 gives the most output.

galabovaa avatar Jul 10 '25 13:07 galabovaa

You could also try compiling with

cmake -S. -B build -DHIGHS_NO_DEFAULT_THREADS=ON

to eliminate a possible bug releated to the multithreading on Windows

galabovaa avatar Jul 10 '25 13:07 galabovaa

I tried with the log_dev_level set to 3, and with a bunch of print lines added to Highs.cpp to attempt to find where the crash happens. I've attached the Highs.xpp I used and a log of what was printed. Let me know if there is anything further I can do to help debug this.

Highs.cpp.txt output-log-1.txt

I'll try again with a version built with the -DHIGHS_NO_DEFAULT_THREADS=ON flag to see if that results in something different.

rdzman avatar Jul 30 '25 16:07 rdzman

The -DHIGHS_NO_DEFAULT_THREADS=ON version failed at the same place.

rdzman avatar Jul 31 '25 16:07 rdzman