OptimizationProblems.jl icon indicating copy to clipboard operation
OptimizationProblems.jl copied to clipboard

Discrepancies with other test suites

Open timholy opened this issue 6 years ago • 10 comments

In #36 it was discovered that these implementations differ from CUTEst. In several cases that were checked by hand against the original paper, there were some errors in this package but also at least as many in CUTEst. Consequently one should be agnostic about how one interprets these. For the record (at the time of testing, Oct 12 2019, with 168 problems), here are the discrepancies:

  • [X] broydn7d (AMPL model at https://github.com/mpf/Optimization-Test-Problems is wrong)
  • [X] brybnd
  • [ ] fletchcr (TH: eyeball-check we agree with AMPL, couldn't access paper)
  • [ ] genrose (TH: differs from CUTEst only by a constant offset of 1.0)
  • [ ] hs46 (TH: eyeball-check we agree with AMPL)
  • [ ] hs47 (TH: eyeball-check we agree with AMPL)
  • [ ] hs55 (TH: eyeball-check we agree with AMPL)
  • [ ] hs70 (CUTEst gives NaN here)
  • [ ] hs83 (TH: eyeball-check we agree with AMPL)
  • [ ] hs93
  • [ ] hs100
  • [ ] hs104
  • [ ] hs105
  • [ ] hs113
  • [ ] hs114
  • [x] morebv (we were buggy but the remaining discrepancy seems to be a CUTEst bug, see https://www.osti.gov/servlets/purl/6650344)
  • [X] nondia (was buggy here too but was fixed in #36; remaining discrepancy is an error in CUTEst and https://github.com/mpf/Optimization-Test-Problems)
  • [X] penalty3 (TH: eyeball-check ours looks OK vs http://www.cs.cas.cz/matonoha/download/V1081.pdf)
  • [X] powellsg (ours was buggy)
  • [X] sbrybnd (there are claims of a typo in Luskan that are fixed here, that agrees with their squaring of the unscaled problem brybnd)
  • [X] scosine
  • [X] sinquad (ours had an error in component 1, fixed, remaining discrepancy is eyeball-correct)

A check mark means that someone has looked at the original paper and decided that the implementation here is correct, or that any discrepancies have been resolved. (Just checking against AMPL does not mean the paper has been checked.)

timholy avatar Oct 13 '19 09:10 timholy

  • fletchcr: the published paper doesn't use the problem given (it uses the chained Rosenbrock problem). I requested the original report from the university of Dundee to check if the problem changed during the publication process. However, the problem matches Luksan & Vlcek's problem 32. I'll update the doc in #40.

Screen Shot 2019-10-14 at 17 44 17

  • hs46 matches Hock & Schittkowski's book
  • hs47 matches Hock & Schittkowski's book
  • hs55 had an erroneous index in a constraint (fix in #40)
  • hs70 matches Hock & Schittkowski's book
  • hs83 matches Hock & Schittkowski's book
  • hs93 had a small bug in a constant in the objective (fix in #40)
  • hs100 matches Hock & Schittkowski's book
  • hs104 matches Hock & Schittkowski's book
  • hs105 has a bug in the objective scaling (fix in #40)
  • hs113 matches Hock & Schittkowski's book
  • hs114 matches Hock & Schittkowski's book

dpo avatar Oct 14 '19 23:10 dpo

Looks like you can basically close this once that merges. Nice work!

timholy avatar Oct 15 '19 01:10 timholy

Your script reports other discrepancies that we should investigate.

dpo avatar Oct 15 '19 05:10 dpo

Some of those might be roundoff, but perhaps you've checked and concluded they are not. There was some manual triage, and in a couple of cases I was puzzled by what I found when I came back to it later, so possibly I got confused at some point.

timholy avatar Oct 15 '19 06:10 timholy

The generalized Rosebrock function appears to differ from CUTEst by more than an offset of 1.0, as mentioned in the comments of genrose.jl.

My translation of Princeton's AMPL genrose.mod is

f = x -> begin
        N = lastindex(x)
        return 1.0 + 100sum((x[i] - x[i-1]^2)^2 for i = 2:N) + sum((x[i] - 1.0)^2 for i = 2:N)
end

which matches CUTEst (interfaced via CUTEst.jl) to a much higher precision than this repository's specification (while accounting for the shift of 1.0). Note, the difference in the summation indexing. Additionally, genrose.jl specifies the default dimension as 100 whereas GENROSE.SIF and AMPL have a default dimension of 500.

This has been rather concerning since most people think of the generalized Rosebrock as the chained rosenbrock from fletcher's article above. It appears that the CUTEst/AMPL specification yields an easier problem, from an empirical analysis of the critical structure of the two variants (in dimension n=2 to n=8).

The depth of my literature search was AMPL, and I have not dug into the original sources. (Which seems to be a futile pursuit by Dominque's comments in genrose.jl).

Has anyone else run across this discrepancy?

danphenderson avatar Sep 25 '21 22:09 danphenderson

To be sure I understand you, the difference is the final sum over (x[i] - 1.0)^2, which runs in this package from 1:N-1 and in AMPL from 2:N? Or am I misreading something?

timholy avatar Sep 26 '21 11:09 timholy

To be sure I understand you, the difference is the final sum over (x[i] - 1.0)^2, which runs in this package from 1:N-1 and in AMPL from 2:N? Or am I misreading something?

Correct! In the first summation it is a shift in indexing but not in the second.

danphenderson avatar Sep 26 '21 15:09 danphenderson

Good. It would appear that this package has it correctly (by the article in https://github.com/JuliaSmoothOptimizers/OptimizationProblems.jl/issues/37#issuecomment-541971088), and that the Princeton AMPL needs to be corrected?

timholy avatar Sep 26 '21 18:09 timholy

Good. It would appear that this package has it correctly (by the article in #37 (comment)) and that the Princeton AMPL needs to be corrected?

No, that is not what I was reporting. The comment in this repository's genrose.jl is wrong. Specifically, there is more of a difference from CUTEst than 1.0 offset. However, genrose.jl specifies a shifted variant of the generalized Rosebrock that everyone knows and loves. Whereas Princeton's AMPL specification of the generalized Rosebrock aligns with the CUTEst specification in the GENROSE.SIF file, which is different from what most people think it is.

So what are the goals of this repository? Is it to match CUTEst, or is it to specify the generalized Rosebrock that everyone knows? Regardless of the goals, the comment in this package's genrose.jl should be updated.

It is concerning that one of the most well known test problems in the CUTEr/st set is not what people think it is, and I believe it to be a slightly easier problem.

danphenderson avatar Sep 26 '21 19:09 danphenderson

The AMPL problems are manual translations of the problems as they appeared in a relatively old version of the CUTE collection (in the late 1990s). I've never verified that they matched accurately.

The goal of this repository is not to reproduce exactly the problems of CUTEst. Many problems in this repository come from http://www.cs.cas.cz/matonoha/download/V1081.pdf, which is entitled "modified CUTE problems ...". It's certainly not impossible that there are bugs, though we tried to be careful. The comments are difficult to get right because no two papers give the same formulation of a "known" problem. Compare for example the two references in extrosnb.jl.

I wouldn't say that "most people" expect the generalized Rosenbrock function to be the chained Rosenbrock function. Over time, variations appeared. Sometimes they had different names, and sometimes not. We can certainly add variants to this repository.

dpo avatar Sep 27 '21 16:09 dpo