idaes-pse
idaes-pse copied to clipboard
More efficient method for check_parallel_jacobian
Depends on #1429
Summary/Motivation:
Presently in check_parallel_jacobian
, many arithmetic operations are being carried out in Python code, featuring a triple for
loop for some reason. This new method is simpler and takes advantage of more vectorized operations. However, it doesn't knock out "negligible coefficients" like the old method did, and so some of the answers it produces are different given the current criterion for parallel constraints:
diff <= tolerance or diff <= tolerance * max(unorm, vnorm)
We should revisit this criterion---just using a relative tolerance might be enough.
Legal Acknowledgement
By contributing to this software project, I agree to the following terms and conditions for my contribution:
- I agree my contributions are submitted under the license terms described in the LICENSE.txt file at the top level of this directory.
- I represent I am authorized to make the contributions and grant the license. If my employer has rights to intellectual property that includes these contributions, I represent that I have received permission to make contributions and grant the required license on behalf of that employer.
I just profiled this with the following script, which uses my branch: https://github.com/Robbybp/idaes-pse/tree/parallel_constraints-timing. This relies on the distill_DAE.py
and distill.dat
files from the Pyomo examples directory.
from pyomo.environ import *
from pyomo.dae import *
from distill_DAE import model
instance = model.create_instance('distill.dat')
# Discretize using Finite Difference Approach
discretizer = TransformationFactory('dae.finite_difference')
discretizer.apply_to(instance, nfe=1000, scheme='BACKWARD')
from idaes.core.util.model_diagnostics import check_parallel_jacobian, check_parallel_jacobian_old
from pyomo.common.timing import TicTocTimer, HierarchicalTimer
timer = TicTocTimer()
htimer = HierarchicalTimer()
timer.tic()
htimer.start("old")
check_parallel_jacobian_old(instance, timer=htimer)
htimer.stop("old")
timer.toc("old")
htimer.start("new")
check_parallel_jacobian(instance, timer=htimer)
htimer.stop("new")
timer.toc("new")
print()
print(htimer)
I see a profile that looks like this:
[ 0.00] Resetting the tic/toc delta timer
[+ 33.05] old
[+ 31.19] new
Identifier ncalls cumtime percall %
-------------------------------------------------------
new 1 31.189 31.189 48.6
--------------------------------------------------
check-dotprods 1 0.273 0.273 0.9
get-jacobian 1 28.324 28.324 90.8
matmul 1 0.012 0.012 0.0
norms 1 2.508 2.508 8.0
other n/a 0.073 n/a 0.2
==================================================
old 1 33.048 33.048 51.4
--------------------------------------------------
get-jacobian 1 28.638 28.638 86.7
norm 100068 0.343 0.000 1.0
sort-by-nz 1 0.227 0.227 0.7
vectors 1 3.682 3.682 11.1
other n/a 0.158 n/a 0.5
==================================================
=======================================================
The new implementation is indeed slightly faster, although both are dominated by the cost of getting the Jacobian.
I can make some small performance improvements to the old implementation, giving the following profile:
[ 0.00] Resetting the tic/toc delta timer
[+ 31.44] old
[+ 30.82] new
Identifier ncalls cumtime percall %
-------------------------------------------------------
new 1 30.817 30.817 49.5
--------------------------------------------------
check-dotprods 1 0.268 0.268 0.9
get-jacobian 1 27.966 27.966 90.7
matmul 1 0.011 0.011 0.0
norms 1 2.501 2.501 8.1
other n/a 0.071 n/a 0.2
==================================================
old 1 31.443 31.443 50.5
--------------------------------------------------
get-jacobian 1 28.255 28.255 89.9
norm 100068 0.330 0.000 1.1
sort-by-nz 1 2.667 2.667 8.5
vectors 1 0.029 0.029 0.1
other n/a 0.162 n/a 0.5
==================================================
=======================================================
Nothing too major. Both implementations seem to be dominated by the cost of extracting vectors from the Jacobian. I have no real problem with the new implementation. Do you have any performance benchmarks that motivated it?
Thank you for profiling it @Robbybp . The main motivation was trying to read the old code, finding it unnecessarily complex, seeing a triple for
loop in Python, and deciding to do an implementation myself. For a while, I did work in Octave, in which there was a huge performance difference between using built-ins and writing for-loops. I guess there isn't as big difference in Python, at least in this instance.
I'm curious about the time spent in norms
, though. Is that dominated by lookup times, or by list creation? I just pushed a version that preallocates a Numpy array then fills it with norms---could you see how that performs?
Edit: Just realized you posted the script you used to profile it. Will check myself then.
On a different note, there's the failing test. I am also having problems with test_display_near_parallel_variables
, which I know is changed in #1380 .
The original test expects only (v1, v2)
, (v1,v4)
, and (v2, v4)
as parallel. The new method additionally returns (v1, v3)
, (v4, v3)
, and (v2, v3)
. Here's the transposed Jacobian matrix, which has these vectors as rows:
matrix([[ 1.00000e+00, 1.00000e+00, 1.00000e+08],
[-1.00000e+00, 0.00000e+00, 1.00000e+10],
[ 9.99990e-01, 1.00001e+00, 1.00000e+08],
[ 0.00000e+00, -1.00000e-08, -1.00000e-06]])
Frankly, I'm shocked that the (v1, v3)
pair was missed by the old method (and that seems to have become undone with minor changes).
Frankly, I'm shocked that the (v1, v3) pair was missed by the old method (and that seems to have become undone with minor changes).
This is because, with a tolerance of 1e-4, the v3 vector was considered a zero-vector, and was not compared with other vectors. I think you could reasonably argue for considering a zero vector as parallel with everything or parallel with nothing.
This is what I get with my latest version. Indeed, creating norms
was the bottleneck, probably because it wasn't preallocated.
Identifier ncalls cumtime percall %
-------------------------------------------------------
new 1 58.482 58.482 50.3
--------------------------------------------------
check-dotprods 1 4.389 4.389 7.5
get-jacobian 1 53.934 53.934 92.2
matmul 1 0.022 0.022 0.0
norms 1 0.002 0.002 0.0
other n/a 0.135 n/a 0.2
==================================================
old 1 57.727 57.727 49.7
--------------------------------------------------
get-jacobian 1 51.437 51.437 89.1
norm 100068 0.623 0.000 1.1
sort-by-nz 1 5.356 5.356 9.3
vectors 1 0.048 0.048 0.1
other n/a 0.263 n/a 0.5
==================================================
=======================================================
We have 4.548 s for the new method after subtracting get-jacobian
and 6.290 s for the old method.
Frankly, I'm shocked that the (v1, v3) pair was missed by the old method (and that seems to have become undone with minor changes).
This is because, with a tolerance of 1e-4, the v3 vector was considered a zero-vector, and was not compared with other vectors. I think you could reasonably argue for considering a zero vector as parallel with everything or parallel with nothing.
I ended up just screening out elements based on a norm_tolerance
argument in my latest version. When it's called by the diagnostics toolbox, it's going to be given the same tolerance we're using for jacobian_rows
and jacobian_columns
so it gets flagged there.
I did some (probably excessive) optimization, and now we have:
Identifier ncalls cumtime percall %
-------------------------------------------------------
new 1 0.211 0.211 3.6
--------------------------------------------------
check-dotprods 1 0.145 0.145 68.5
get-jacobian 1 0.000 0.000 0.0
matmul 1 0.016 0.016 7.4
norms 1 0.002 0.002 0.8
other n/a 0.049 n/a 23.3
==================================================
old 1 5.593 5.593 96.4
--------------------------------------------------
get-jacobian 1 0.000 0.000 0.0
norm 100068 0.576 0.000 10.3
sort-by-nz 1 4.773 4.773 85.3
vectors 1 0.052 0.052 0.9
other n/a 0.192 n/a 3.4
==================================================
=======================================================
Note that I precalculated the Jacobian and passed it in so that it didn't take so long to run and we didn't have to subtract it out.
The test failures in Python 3.8 are because some element of Numpy or Scipy decided to change the order it reports elements. The set of parallel vectors is the same, it's just being presented in a different order.
The test failures in Python 3.8 are because some element of Numpy or Scipy decided to change the order it reports elements. The set of parallel vectors is the same, it's just being presented in a different order.
Something to keep in mind that might or might not be relevant is that the versions of NumPy (and possibly SciPy as well) running on Python 3.8 will be older than for Python 3.9+, as NumPy dropped support for 3.8 a while ago. So this difference in sorting (or lack thereof) might be due to the NumPy version rather than the Python version.
A related note: Numpy is in the course of releasing v2.0 which may have impacts both here and in our code in general. The Pyomo team is tracking things on their end, but we should be aware of this coming change.
Codecov Report
Attention: Patch coverage is 94.44444%
with 1 lines
in your changes are missing coverage. Please review.
Project coverage is 77.62%. Comparing base (
3a1d54a
) to head (6d53c8d
).
Files | Patch % | Lines |
---|---|---|
idaes/core/util/model_diagnostics.py | 94.44% | 0 Missing and 1 partial :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #1395 +/- ##
==========================================
- Coverage 77.63% 77.62% -0.01%
==========================================
Files 391 391
Lines 64391 64383 -8
Branches 14264 14256 -8
==========================================
- Hits 49989 49977 -12
- Misses 11830 11834 +4
Partials 2572 2572
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@Robbybp , any additional comments about this new method before I bug @andrewlee94 about it?
@dallan-keylogic I'll review this later, but I see no reason not to request another review.
I have a diagnostics example I'm working on that relies on check_parallel_jacobian
. When I run with the current implementation, I get 11 pairs of parallel constraints, but when I run with this branch's implementation, I get 1012 pairs of parallel constraints. I will try to make this example public soon.
I propose we move ahead with #1385, then come back to this implementation when my diagnostics example is public and try to fix the regression.
I have a diagnostics example I'm working on that relies on
check_parallel_jacobian
. When I run with the current implementation, I get 11 pairs of parallel constraints, but when I run with this branch's implementation, I get 1012 pairs of parallel constraints. I will try to make this example public soon.I propose we move ahead with #1385, then come back to this implementation when my diagnostics example is public and try to fix the regression.
We can defer this after we do #1385, but are you running this method (and check_numerical_issue
on the scaled model (i.e., after a Pyomo scaling transformation) or the unscaled model?
At the moment, this PR reveals many of the unit models to be poorly scaled because checking for numerical issues is carried out on the model before a Pyomo scaling transformation is carried out. For unit models tested on the BTX property package, the provided default scaling factors are sufficient to make the warnings disappear. However, the Saponification package has no default scaling factors.
I took a look about adding scaling factors to Saponification, but there are several things I'd like to change about its structure first (like changing the rate constant from a Constraint
-Var
pair to an Expression
). Also, @andrewlee94 is working on a new scaling framework and would probably rather just let the old one die. What happens next is up for discussion.
@dallan-keylogic For now, I have been doing hard-coded scaling where it is necessary (e.g. the Gibbs reactor) until the new scaling tools are ready. We will need to switch all the models to the new workflow at that time, so we can clean up the tests then.
@Robbybp @bpaul4 could one of you review this PR now that it's ready to be merged?
This PR still displays way more parallel constraints than I expect on my example, which has been merged into the examples repo. Here is code to reproduce:
from idaes_examples.mod.diagnostics.gas_solid_contactors.example import (
create_square_model_with_new_variable_and_constraint
)
from idaes.core.util.model_diagnostics import DiagnosticsToolbox
m = create_square_model_with_new_variable_and_constraint()
dt = DiagnosticsToolbox(m)
dt.report_numerical_issues()
dt.display_near_parallel_constraints()
On main, this gives 11 parallel constraints. Here is the relevant output:
====================================================================================
Model Statistics
Jacobian Condition Number: Undefined (Exactly Singular)
------------------------------------------------------------------------------------
4 WARNINGS
WARNING: 110 Constraints with large residuals (>1.0E-05)
WARNING: 77 Variables with extreme Jacobian values (<1.0E-08 or >1.0E+08)
WARNING: 77 Constraints with extreme Jacobian values (<1.0E-08 or >1.0E+08)
WARNING: 11 pairs of constraints are parallel (to tolerance 1.0E-08)
------------------------------------------------------------------------------------
6 Cautions
Caution: 1254 Variables with value close to zero (tol=1.0E-08)
Caution: 1058 Variables with extreme value (<1.0E-04 or >1.0E+04)
Caution: 99 Variables with None value
Caution: 1619 Variables with extreme Jacobian values (<1.0E-04 or >1.0E+04)
Caution: 1870 Constraints with extreme Jacobian values (<1.0E-04 or >1.0E+04)
Caution: 3553 extreme Jacobian Entries (<1.0E-04 or >1.0E+04)
------------------------------------------------------------------------------------
Suggested next steps:
display_constraints_with_large_residuals()
compute_infeasibility_explanation()
display_variables_with_extreme_jacobians()
display_constraints_with_extreme_jacobians()
display_near_parallel_constraints()
====================================================================================
====================================================================================
The following pairs of constraints are nearly parallel:
fs.MB.solid_super_vel[0.0], fs.MB.density_flowrate_constraint[0.0,1.0]
fs.MB.solid_super_vel[30.0], fs.MB.density_flowrate_constraint[30.0,1.0]
fs.MB.solid_super_vel[60.0], fs.MB.density_flowrate_constraint[60.0,1.0]
fs.MB.solid_super_vel[90.0], fs.MB.density_flowrate_constraint[90.0,1.0]
fs.MB.solid_super_vel[120.0], fs.MB.density_flowrate_constraint[120.0,1.0]
fs.MB.solid_super_vel[150.0], fs.MB.density_flowrate_constraint[150.0,1.0]
fs.MB.solid_super_vel[180.0], fs.MB.density_flowrate_constraint[180.0,1.0]
fs.MB.solid_super_vel[210.0], fs.MB.density_flowrate_constraint[210.0,1.0]
fs.MB.solid_super_vel[240.0], fs.MB.density_flowrate_constraint[240.0,1.0]
fs.MB.solid_super_vel[270.0], fs.MB.density_flowrate_constraint[270.0,1.0]
fs.MB.solid_super_vel[300.0], fs.MB.density_flowrate_constraint[300.0,1.0]
====================================================================================
Here is the output of report_numerical_issues
with this branch:
====================================================================================
Model Statistics
Jacobian Condition Number: Undefined (Exactly Singular)
------------------------------------------------------------------------------------
5 WARNINGS
WARNING: 110 Constraints with large residuals (>1.0E-05)
WARNING: 77 Variables with extreme Jacobian values (<1.0E-08 or >1.0E+08)
WARNING: 77 Constraints with extreme Jacobian values (<1.0E-08 or >1.0E+08)
WARNING: 1012 pairs of constraints are parallel (to tolerance 1.0E-08)
WARNING: 462 pairs of variables are parallel (to tolerance 1.0E-08)
------------------------------------------------------------------------------------
6 Cautions
Caution: 1254 Variables with value close to zero (tol=1.0E-08)
Caution: 1058 Variables with extreme value (<1.0E-04 or >1.0E+04)
Caution: 99 Variables with None value
Caution: 1619 Variables with extreme Jacobian values (<1.0E-04 or >1.0E+04)
Caution: 1870 Constraints with extreme Jacobian values (<1.0E-04 or >1.0E+04)
Caution: 3553 extreme Jacobian Entries (<1.0E-04 or >1.0E+04)
------------------------------------------------------------------------------------
Suggested next steps:
display_constraints_with_large_residuals()
compute_infeasibility_explanation()
display_variables_with_extreme_jacobians()
display_constraints_with_extreme_jacobians()
display_near_parallel_constraints()
display_near_parallel_variables()
====================================================================================
Note that 1012 constraints are considered parallel. I would like this to be fixed before merging.
It isn't clear to me that there's anything there to "fix". The Jacobian is singular to machine precision. Something is clearly wrong with the example numerically. I can go through and calculate the angle between those constraints or variables by hand, but if they're getting flagged it's less than 0.01 degrees.
Do you have scaling for your example? In order to get the diagnostics toolbox to recognize them, you need to do a Pyomo scaling transformation and create a toolbox with the scaled model.
Okay, if we tighten dt.config.parallel_component_tolerance
from 1e-8
to 1e-15
, we get the expeced output:
dt.config.parallel_component_tolerance = 1e-15
dt.display_near_parallel_constraints()
====================================================================================
The following pairs of constraints are nearly parallel:
fs.MB.solid_super_vel[0.0], fs.MB.density_flowrate_constraint[0.0,1.0]
fs.MB.solid_super_vel[30.0], fs.MB.density_flowrate_constraint[30.0,1.0]
fs.MB.solid_super_vel[60.0], fs.MB.density_flowrate_constraint[60.0,1.0]
fs.MB.solid_super_vel[90.0], fs.MB.density_flowrate_constraint[90.0,1.0]
fs.MB.solid_super_vel[120.0], fs.MB.density_flowrate_constraint[120.0,1.0]
fs.MB.solid_super_vel[150.0], fs.MB.density_flowrate_constraint[150.0,1.0]
fs.MB.solid_super_vel[180.0], fs.MB.density_flowrate_constraint[180.0,1.0]
fs.MB.solid_super_vel[210.0], fs.MB.density_flowrate_constraint[210.0,1.0]
fs.MB.solid_super_vel[240.0], fs.MB.density_flowrate_constraint[240.0,1.0]
fs.MB.solid_super_vel[270.0], fs.MB.density_flowrate_constraint[270.0,1.0]
fs.MB.solid_super_vel[300.0], fs.MB.density_flowrate_constraint[300.0,1.0]
====================================================================================
The problem seems to be that we're presently testing 1-<u,v>/(||u||*||v||) against a tolerance of 1e-8
. If there's a variable in one of them that has a scaling factor off by a factor of 10^4, the scaling error is squared in the inner product, resulting in an error of 10^8, which then shows up in the parallel constraints. (This is the same issue that you run into when solving the normal equations or performing the SVD. If you form certain intermediate matrices, you lose half your digits of precision, going from 1e-16 to 1e-8). Presently, we're just giving cautions for badly scaled vars/constraints with magnitude 10^4, so this method is much more sensitive than our other methods. I'll make the default tolerance 1e-15
to resolve the inconsistency.
This method really should only be used on a model that has already had scaling issues resolved, and the example, being a legacy model, has severe scaling issues. However, this change is good enough to get by now. Maybe later we can add autoscaling to the example when @andrewlee94 finishes with that.
I'm not sure how I feel about further reducing the tolerance. Then Jacobian rows (1.0, 1.0)
and (1.0, 1.0+1e-14)
will not be considered parallel, which doesn't seem like what we want.
This method really should only be used on a model that has already had scaling issues resolved
I'm not sure I agree with this.
Then Jacobian rows
(1.0, 1.0)
and(1.0, 1.0+1e-14)
will not be considered parallel
This is not correct. I see your point that our current measure of "distance-from-parallel" scales with epsilon**2
for two vectors (1.0, 1.0)
and (1.0, 1.0+epsilon)
. Maybe we should use sqrt(unorm*vnorm - uvnorm)
as the measure? It would be nice for the tolerance to be the same order of magnitude as a perturbation that will cause two vectors to no longer be parallel.
Maybe we should use sqrt(unorm*vnorm - uvnorm) as the measure?
Is the goal here to make the measure of colinearity extensive? Then we run into problems if we have rows/columns that are small in norm. We could also make sqrt(1 - <u,v>/(||u||*||v||) ) the measure of colinearity, which is intensive. However, I'm not sure whether using an epsilon smaller than 1e-8 would be meaningful in that case unless the vectors were parallel to machine precision due to floating point error in the calculation. There are ways of computing the angle in a more precise way (check out this StackExchange post and the PDF it links to) but they'll take longer to calculate (we'd probably revert to your version of the code, but take out the part that automatically declares two vectors "not parallel" if they're structurally different) and be overkill for our purposes.
One of the tests we're now failing is this one:
@pytest.mark.component
def test_display_near_parallel_constraints(self):
model = ConcreteModel()
model.v1 = Var(initialize=1e-8)
model.v2 = Var()
model.v3 = Var()
model.c1 = Constraint(expr=model.v1 == model.v2)
model.c2 = Constraint(expr=model.v1 == 1e-8 * model.v3)
model.c3 = Constraint(expr=1e8 * model.v1 + 1e10 * model.v2 == 1e-6 * model.v3)
model.c4 = Constraint(expr=-model.v1 == -0.99999 * model.v2)
dt = DiagnosticsToolbox(model=model)
stream = StringIO()
dt.display_near_parallel_constraints(stream)
expected = """====================================================================================
The following pairs of constraints are nearly parallel:
c1, c4
====================================================================================
"""
It turns out they're parallel only to a tolerance of 1e-11
(Pdb) from numpy.linalg import norm
(Pdb) v1 = np.array([1, -1])
(Pdb) v2 = np.array([-1, 0.99999])
(Pdb) norm(v1)
1.4142135623730951
(Pdb) norm(v2)
1.414206491322961
(Pdb) v1.dot(v2)
-1.99999
(Pdb) v1.dot(v2)/(norm(v1)*norm(v2))
-0.9999999999874997
(Pdb) out = 1- np.abs(v1.dot(v2)/(norm(v1)*norm(v2)))
(Pdb) out
1.2500334101162025e-11
(Pdb)
So the solution to @Robbybp 's example might just be to turn down the tolerance there (to 1e-15) and we can leave the default value looser. Is this solution satisfactory?
Codecov Report
Attention: Patch coverage is 90.38462%
with 10 lines
in your changes missing coverage. Please review.
Project coverage is 76.32%. Comparing base (
10b42e4
) to head (bf5f26d
).
Additional details and impacted files
@@ Coverage Diff @@
## main #1395 +/- ##
==========================================
- Coverage 76.36% 76.32% -0.04%
==========================================
Files 394 394
Lines 64953 64884 -69
Branches 14404 14420 +16
==========================================
- Hits 49601 49525 -76
- Misses 12793 12803 +10
+ Partials 2559 2556 -3
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@dallan-keylogic I opened a PR into this branch, https://github.com/dallan-keylogic/idaes-pse/pull/4, that attempts to implement the original algorithm in a vectorized manner. Take a look when you have a chance.
@Robbybp @dallan-keylogic We don't seem to be making any progress towards converging on a solution to this - we have lots of ideas but nothing seems to stand out as a clear winner yet. Could someone summarise what ideas have been proposed, along with the pros and cons of each so we can start to work out what the path forward should be.
Also, if the underlying issue is that these checks are inherently sensitive to scaling, then maybe we should just wait until we have the new scaling tools ready so that we can test on actual scaled models.
@andrewlee94 I propose we do the following:
- Merge scaling tools ( #1429 )
- I update my example to (a) use scaling tools and (b) still not solve the numerically singular optimization problem.
- Merge this PR