HiGHS icon indicating copy to clipboard operation
HiGHS copied to clipboard

Overhead using highspy

Open jajhall opened this issue 1 year ago • 8 comments

I'm just calling your attention to a potential pitfall for users of highspy, which is that accessing solution.col_value[icol] is a quite slow operation, I guess because of some overhead in pybind11. Doing over 100k independent accesses to retrieve a solution takes a few minutes, but can be reduced to milliseconds by first converting to a Python list and accessing that instead.

I made the relevant change in PuLP recently since whoever wrote the interface had made that mistake which was obviously a massive and avoidable overhead. https://github.com/coin-or/pulp/pull/641

Perhaps it should remain the responsibility of consumers of highspy to be aware of and manage, but you could consider implementing something like a solution.col_values() function that returns a python list of values. Wouldn't even need to be cached as that introduces a risk of invalidation. I'd be happy to implement it and open a PR, but I wasn't sure what your policy would be on unsolicited feature PRs(?). In general I hope to be a contributor to HiGHS in the near future 🙂.

jajhall avatar Apr 28 '23 11:04 jajhall

I would love to use highspy in favor of the open source solver we use now, but solution.col_value[icol] is way too slow for me. I found that the access time of solution.col_value[icol] scales with the number of variables in the program. I have in the order of 1^10 variables in my program which I can build and solve reasonably fast in highspy, but accessing solution.col_value[icol] takes around 0.16s (per variable!). It would take many days for me to retrieve all variable values. I hope this issue will get some attention!

@jajhall Thanks for the tip regarding the Python list. I will try that out if it can temporarily solve my performance issues.

matt035343 avatar Sep 18 '23 07:09 matt035343

This is very odd. Addressing this is on our ToDo list, and we've mentioned it in discussions, but that's as far as it's got. I Your comment raises the priority!

jajhall avatar Sep 18 '23 07:09 jajhall

So, I've reproduced the behaviour for highspy_speed_test.py and now see what "converting to a Python list" means.

Maybe this is a known pybind11 issue, for which there is a solution - perhaps to be implemented where py::class_<HighsSolution>(m, "HighsSolution") is defined in highspy/highs_bindings.cpp, but I'm loathed to invest time in it myself since there is a solution that can be implemented in the Python call, that can be flagged up via a warning in our documentation.

The original comment in this issues is one that I pasted from an email that I no longer have. Could it have come from @aphi?

jajhall avatar Sep 18 '23 17:09 jajhall

Yes that comment was from an email I sent and referring to the change we made in PuLP to avoid this overhead. Hope the solution works for you @matt035343 .

aphi avatar Sep 18 '23 18:09 aphi

@aphi It did work, thanks. I now use it as a temporary workaround. For anyone experiencing this issue, you can use following snippet:

values = list(solution.col_value)
values[icol]

However, I do still believe it should be fixed permanently somewhere upstream

matt035343 avatar Sep 19 '23 07:09 matt035343

Anyone have an idea how to fix it upstream?

I'd have thought it was a known pybind11 issue

jajhall avatar Sep 19 '23 07:09 jajhall

I've submitted a query to pybind11

jajhall avatar Sep 19 '23 10:09 jajhall

Only solution at the moment is a warning in the documentation

jajhall avatar Oct 31 '23 13:10 jajhall