pysmps icon indicating copy to clipboard operation
pysmps copied to clipboard

Possibility of taking out numpy dependency?

Open pchtsp opened this issue 4 years ago • 47 comments

Hello!

Thanks for this library. We'd like to add it as dependency in PuLP (https://github.com/coin-or/pulp/) but I do not want to have to add numpy as dependency. In fact, we maintain an numpy-free version of your mps reader: https://github.com/coin-or/pulp/blob/9a71ae08e9416a6dc37c702dab1dd752e6bb0d00/pulp/mps_lp.py#L31

I'd have no issue in helping with a PR if needed.

in any case, thanks again!

pchtsp avatar Jul 09 '21 17:07 pchtsp

Hello!

Is there any particular reason for the need of a project to be numpy-free? I actually never encountered any problem with numpy.

Anyway; i can look into it and try to modify it towards being numpy-free if that really helps you out.

Best Regards!

jmaerte avatar Jul 09 '21 17:07 jmaerte

I do like numpy, it's a great tool. It's just that it's too big (in size, in number of dependencies, also, sometimes hard to install in windows) and I wouldn't want to add a dependency to numpy unless it's really necessary (i.e. unless the efficiency gains are considerable or replicate the needed numpy functionality is too much work).

As I said, in our repo we have a numpy-free version you can check and I can definitely help in any way.

I'd prefer to have the mps logic in your project and then import your library as dependency than what we have now which is not really elegant.

Thanks again!

On Fri, Jul 9, 2021, 19:56 Julian Märte @.***> wrote:

Hello!

Is there any particular reason for the need of a project to be numpy-free? I actually never encountered any problem with numpy.

Anyway; i can look into it and try to modify it towards being numpy-free if that really helps you out.

Best Regards!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jmaerte/pysmps/issues/6#issuecomment-877358843, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJUZ42P6WMPZ7T5YHJGJA3TW4Z5HANCNFSM5ADFKKNA .

pchtsp avatar Jul 09 '21 18:07 pchtsp

I see.

No problem. From my side numpy is not really needed as it is only used in the return-types. I can definitely factor it out.

I just used it since what I'm returning are matrices and it is the most common representation of a matrix in python imo.

Nonetheless - I will probably refactor the code in the next couple of days.

Best regards!

jmaerte avatar Jul 09 '21 18:07 jmaerte

Hello!

To this topic: I have created a new branch called "numpy_free", see https://github.com/jmaerte/pysmps/tree/numpy_free.

Since I do not have a test case on my PC atm I would need you to test this version if you don't mind (I assume you might have plenty of examples).

Can you also test the stochastic part of the reader? That would be great.

If you can confirm, I will merge this into the master branch and will commit it as the latest version in pypi.

Thanks in advance!

jmaerte avatar Jul 10 '21 18:07 jmaerte

Cool! Thanks for the work. This week I will give it a try and let you know how it goes.

On Sat, Jul 10, 2021, 20:56 Julian Märte @.***> wrote:

Hello!

To this topic: I have created a new branch called "numpy_free", see https://github.com/jmaerte/pysmps/tree/numpy_free.

Since I do not have a test case on my PC atm I would need you to test this version if you don't mind (I assume you might have plenty of examples).

Can you also test the stochastic part of the reader? That would be great.

If you can confirm, I will merge this into the master branch and will commit it as the latest version in pypi.

Thanks in advance!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jmaerte/pysmps/issues/6#issuecomment-877686812, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJUZ44KRFQKUVP3BC2BHK3TXCJVVANCNFSM5ADFKKNA .

pchtsp avatar Jul 11 '21 17:07 pchtsp

Has everything worked out so far?

Best regards!

jmaerte avatar Jul 14 '21 08:07 jmaerte

Hello! In fact I was just testing it out before you wrote. I'm trying, first of all, to load the same mps file in pulp and pysmps and see if they load the same information. But for some reason pysmps "blocks" with the first file I tried:

dance_pairing_-pulp.mps.txt

I did the following:

import pysmps.smps_loader as mps
file = 'dance_pairing_-pulp.mps.txt'
pysmps_data = mps.load_mps(file)
print(pysmps_data)

Once I manage to run and successfully checked manually some mps files, I will adapt pulp to integrate directly with pysmps and then I will be able to run all pulp unittests using pysmps and see if anything breaks.

pchtsp avatar Jul 14 '21 09:07 pchtsp

So I have loaded this file myself now and the problem with your code is not the loading (even tho it takes quite long too - round about half a minute; maybe I should implement a loading bar) but the printing of the returned matrices and vectors... so the "blocking" you are describing lays in print(pysmps_data) on my side..

I would rather test it with a smaller problem to check it on plausibility by the eye.

Note aswell: The current pysmps version on pypi is still using numpy as i do not want to update it to a version that might not be running properly.

Best regards!

jmaerte avatar Jul 14 '21 09:07 jmaerte

Just so you can check if the loading is blocking or if something else causes this: I have just commited a version to the numpy_free branch of this repository that implements a progress bar for loading the file.

Best regards.

jmaerte avatar Jul 14 '21 09:07 jmaerte

I would suggest cloning this branch and just executing the python code in the console so you get a local function called load_mps instead of one imported by pypi.

jmaerte avatar Jul 14 '21 09:07 jmaerte

Yes, I actually cloned the branch and ran the source code when I tested it to be sure it's not using numpy. Curiously, the pulp version I have took less than a second to run with that same file. I'm not sure why though.

On Wed, Jul 14, 2021, 11:52 Julian Märte @.***> wrote:

I would suggest cloning this branch and just executing the python code in the console so you get a local function called load_mps instead of one imported by pypi.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jmaerte/pysmps/issues/6#issuecomment-879756653, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJUZ4YI22TAORPYQ7V4IX3TXVM5RANCNFSM5ADFKKNA .

pchtsp avatar Jul 14 '21 20:07 pchtsp

I will check out your pulp version for differences to what I did.

However: The Result returned should be the same and should take an eternity to print to the console even with your pulp version, right?

jmaerte avatar Jul 14 '21 20:07 jmaerte

The complete code I ran was:

import pulp as pl
import pysmps.smps_loader as mps

file = 'dance_pairing_-pulp.mps.txt'

# this runs very fast
pul_data = pl.mpslp.readMPS(file, pl.LpMinimize)
print(pul_data)

# this takes X seconds to run
pysmps_data = mps.load_mps(file)
# this blocks for some reason
print(pysmps_data)

Unfortunately, the returned information is no quite the same for my pulp function and yours (I modified the code to match pulp's own internal format). But the print(pul_data) is also instantaneous (it prints a dictionary). You're right in that the mps.load_mps takes just some seconds and block during the print(pysmps_data). Knowing this, I'll continue to validate the output data and probably integrate it in pulp to test it.

Still, the current execution time for the load_mps method is too large. Specially compared with the current pulp version (I'm not sure if I took out parts, I don't think so). I think we can also work in getting it to run faster. If you check the pulp implementation maybe it will give some hints.

Thanks!

pchtsp avatar Jul 15 '21 07:07 pchtsp

So I see some problems with my code for this size of linear programs. I will try to fix this asap.

Though I can't guarantee that the result is exactly what your MPS function returns.

jmaerte avatar Jul 15 '21 08:07 jmaerte

As far as I can judge it from looking at it your code can't handle multiple different RHS? Also it only parses the first RHS given in the respective line. Compare to this linear program given in the wikipedia article on MPS file formats:

https://en.wikipedia.org/wiki/MPS_(format)

First line of RHS1 has 2 identifiers. One for LIM1 and one for LIM2. You would only catch LIM1 as far as I can see?

Best regards!

jmaerte avatar Jul 15 '21 08:07 jmaerte

It's entirely possible that I have an old version of your library. It's also possible that I made some (many) mistakes while porting it.

All the more reasons to use yours instead of our copy : )

On Thu, Jul 15, 2021, 10:36 Julian Märte @.***> wrote:

As far as I can judge it from looking at it your code can't handle multiple different RHS? Also it only parses the first RHS given in the respective line. Compare to this linear program given in the wikipedia article on MPS file formats:

https://en.wikipedia.org/wiki/MPS_(format)

First line of RHS1 has 2 identifiers. One for LIM1 and one for LIM2. You would only catch LIM1 as far as I can see?

Best regards!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jmaerte/pysmps/issues/6#issuecomment-880509060, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJUZ45P2KDS7PSJF4PNZLDTX2M2FANCNFSM5ADFKKNA .

pchtsp avatar Jul 15 '21 09:07 pchtsp

I have just uploaded a new commit to the numpy_free branch. Feel free to test it.

You now need to import the code from mps_reader.py.

The reason for this is that i want to keep compatability with old code using the library. So the old mps reader for now stays inside the smps_loader.py file so I do not need to change the whole smps reader aswell.

As a stand-alone MPS file reader i would offer the one in the mps_loader.py file since the output is now completely different, not returning the constraint matrix as a 2d array anymore (this was by the way the computational bottleneck of the code earlier since it needed to initialize an all-0-matrix of the size of the problem because it was not programmed for sparse or large problems).

I'm excited for your feedback on the new MPS reader.

P.S.: The output of the new MPS function comes closer to what you are returning i guess.

jmaerte avatar Jul 15 '21 16:07 jmaerte

Haha I think maybe we're taking this initial issue too far. Thanks for all the work.

I haven't had the time to test it. On the other hand I'm not too convinced of the idea of having to mps readers... One for pulp, the other for the rest of the world. Because we would almost be in the same place as before (with my ported code inside pulp).

Would there be a way to share most of the code in one function and then return one format or the other depending on the desired format? That way new changes and fixes would be shared by both functions.

On Thu, Jul 15, 2021, 18:40 Julian Märte @.***> wrote:

I have just uploaded a new commit to the numpy_free branch. Feel free to test it.

You now need to import the code from mps_reader.py.

The reason for this is that i want to keep compatability with old code using the library. So the old mps reader for now stays inside the smps_loader.py file so I do not need to change the whole smps reader aswell.

As a stand-alone MPS file reader i would offer the one in the mps_loader.py file since the output is now completely different, not returning the constraint matrix as a 2d array anymore (this was by the way the computational bottleneck of the code earlier since it needed to initialize an all-0-matrix of the size of the problem because it was not programmed for sparse or large problems).

I'm excited for your feedback on the new MPS reader.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jmaerte/pysmps/issues/6#issuecomment-880849545, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJUZ4YEECQP4OHBV6RR6FTTX4FQPANCNFSM5ADFKKNA .

pchtsp avatar Jul 17 '21 10:07 pchtsp

Well i do not plan to keep it like this. I want to make my smps reader reference back to the improved mps reader. So it is not written just for pulp but rather as a new stand-alone mps reader. But to keep compatability i need the old output format.

So my plan is to use the improved reader in the old mps reader function and just reparse the output.

jmaerte avatar Jul 17 '21 11:07 jmaerte

Is there a need for an mps reader still?

jmaerte avatar Jul 20 '21 11:07 jmaerte

Hello Julian,

There is definitely a need for an mps reader, yes. I haven't yet tested the new mps_reader.py file. As soon as I do that I'll let you know.

Thanks,

Franco

On Tue, Jul 20, 2021 at 1:43 PM Julian Märte @.***> wrote:

Is there a need for an mps reader still?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jmaerte/pysmps/issues/6#issuecomment-883325805, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJUZ455YA6ITTSUS7KM6OLTYVOO5ANCNFSM5ADFKKNA .

pchtsp avatar Jul 20 '21 15:07 pchtsp

I have just uploaded a new commit to the numpy_free branch. Feel free to test it.

-            matrix = np.zeros((len(self.places),1))
+           self.matrix = [[0]] * len(self.places)

This way of creating a "matrix" in Python is not recommended because lists are mutable objects that are shallow copied, which means that matrix[0][0] is the same variable as matrix[1][0]. The following testcase shows the problem:

vector = [0] * 5
vector[0] = 5
print(vector)

bad_matrix = [[0]] * 5
bad_matrix[0].append(0) 
print(bad_matrix)
bad_matrix[0][0] = 5
print(bad_matrix)

matrix = [[0] for _ in range(5) ]
matrix[0].append(0) 
print(matrix)
matrix[0][0] = 5
print(matrix)

gives the following output:

[5, 0, 0, 0, 0]
[[0, 0], [0, 0], [0, 0], [0, 0], [0, 0]]
[[5, 0], [5, 0], [5, 0], [5, 0], [5, 0]]
[[0, 0], [0], [0], [0], [0]]
[[5, 0], [0], [0], [0], [0]]

MLopez-Ibanez avatar Aug 27 '21 09:08 MLopez-Ibanez

Of course. I see.

Nonetheless it seems like this inofficial branch is dead anyway due to no interest in a numpy-free version anymore.

Thanks anyway; I will fix this.

Best regards!

jmaerte avatar Aug 27 '21 09:08 jmaerte

I think there is interest, just that @pchtsp has lots on his plate and may take some time to get to this, but I trust he will eventually do. (It is good to see more collaboration between Pulp and other projects rather than wheel re-invention).

MLopez-Ibanez avatar Aug 27 '21 10:08 MLopez-Ibanez

By the way, I should have said that this is not the only instance where this happens. There is at least one more (strange that no test failed because of this):

                       A = [[0]] * len(row_names)

MLopez-Ibanez avatar Aug 27 '21 10:08 MLopez-Ibanez

Thanks again - now all occurrences of this should be resolved.

jmaerte avatar Aug 27 '21 10:08 jmaerte

Hello again. Thanks both @jmaerte @MLopez-Ibanez . Indeed, as @MLopez-Ibanez said, my plate is quite full (and then I added holidays). I've pulled the new version of the branch and did some tests:

import pulp as pl
import pysmps.smps_loader as mps
try:
    from time import process_time as clock
except ImportError:
    from time import clock

file = 'dance_pairing_-pulp.mps.txt'

time = - clock()
pul_data = pl.mpslp.readMPS(file, pl.LpMinimize)
time += clock()
print(time)

time = - clock()
pysmps_data = mps.load_mps(file)
time += clock()
print(time)

First of all, minor comment: there is a import sys missing in the smps_loader.py that I had to add to avoid an import error. Also, pycharm shows there is a reference to a function named __load_mps that is not imported here.

Regarding time: the time difference is still too large unfortunately. For this file (which is not a large problem I think): the pulp readMPS function takes 0.53s while the mps.load_mps function takes 21s

pchtsp avatar Aug 31 '21 09:08 pchtsp

Sorry, I just realized I was using the incorrect import. If I change my second line to import pysmps.mps_loader as mps then the speed is the same. I will continue with the tests.

pchtsp avatar Aug 31 '21 09:08 pchtsp

I thought so - the smps_loader in this package should have nothing changed as far as i remember (except that i swapped numpy for built-in lists).

The speed boost by using sparse matrices for the boundary conditions is only implemented in the mps_loader file.

jmaerte avatar Aug 31 '21 09:08 jmaerte

Hello again,

I've adapted the code in the following pulp branch: https://github.com/coin-or/pulp/tree/pysmps Tests still fail to run. If you want to try the tests you can clone the repository and then create a virtualenv installing pysmps locally.

# clone repo
# branch to pysmps
python3 -m venv venv
source venv/bin/activate
pip install -U -e /LOCAL/PATH/TO/pysmps
python3 -m unittest pulp/tests/test_pulp.py

I haven't yet added pysmps to the package requirements so it will fail if not installed.

Some issues to iron out:

  1. Is there a reason why load_mps returns a dictionary with keys constraints and variable (one in singular, one in plural)?
  2. Binary variables are getting an upper bound of inf instead of 1. In the following example:
*SENSE:Maximize
NAME          test_importMPS_binary
ROWS
 N  OBJ
 E  _C1
 L  _C2
COLUMNS
    MARK      'MARKER'                 'INTORG'
    c1        _C1        1.000000000000e+00
    c1        _C2        1.000000000000e+00
    MARK      'MARKER'                 'INTEND'
    MARK      'MARKER'                 'INTORG'
    c2        _C1        1.000000000000e+00
    MARK      'MARKER'                 'INTEND'
    dummy     OBJ        1.000000000000e+00
RHS
    RHS       _C1        2.000000000000e+00
    RHS       _C2        0.000000000000e+00
BOUNDS
 BV BND       c1      
 BV BND       c2      
 FR BND       dummy   
ENDATA

I get the following:

pysmps_data = mps.load_mps(file)
print(pysmps_data['variable']['c1'])
# {'type': 'Integer', 'name': 'c1', 'bnd_lower': 0, 'bnd_upper': inf}
  1. Some lower bounds are not being read correctly by load_mps, I think. In the following example:
*SENSE:Minimize
NAME          test_importMPS_integer
ROWS
 N  obj
 L  c1
 G  c2
 E  c3
COLUMNS
    x         c1         1.000000000000e+00
    x         c2         1.000000000000e+00
    x         obj        1.100000000000e+00
    y         c1         1.000000000000e+00
    y         c3        -1.000000000000e+00
    y         obj        4.100000000000e+00
    MARK      'MARKER'                 'INTORG'
    z         c2         1.000000000000e+00
    z         c3         1.000000000000e+00
    z         obj        9.100000000000e+00
    MARK      'MARKER'                 'INTEND'
RHS
    RHS       c1         5.000000000000e+00
    RHS       c2         1.000000000000e+01
    RHS       c3         7.500000000000e+00
BOUNDS
 UP BND       x          4.000000000000e+00
 LO BND       y         -1.000000000000e+00
 UP BND       y          1.000000000000e+00
 LO BND       z          0.000000000000e+00
ENDATA

I get the following:

pysmps_data = mps.load_mps(file)
print(pysmps_data['variable']['y'])
# {'type': 'Continuous', 'name': 'y', 'bnd_lower': 0, 'bnd_upper': 1.0}

But I'd say y should have bnd_lower=-1. Right?

pchtsp avatar Aug 31 '21 12:08 pchtsp