lmfit-py icon indicating copy to clipboard operation
lmfit-py copied to clipboard

Loading of parameters from file fails if they were saved without using dill, but dill can be imported at load time

Open Dan-Sk opened this issue 3 years ago • 11 comments

First Time Issue Code

Yes, I read the instructions and I am sure this is a GitHub Issue.

Description

If parameters have been saved to file using Parameters.dump() while dill is not importable, and later an attempt is made to Parameter.load() while dill is importable, jsonutils.py blindly assumes that saving also was done using dill, and attempts a call to dill.loads. This results in a TypeError.

A Minimal, Complete, and Verifiable example
# somehow make dill not importable
import lmfit
p = lmfit.Parameters()
with open("somefile.params", "w") as dumpfile:
    p.dump(dumpfile)
# make dill importable, start a new python instance
import lmfit
p = lmfit.Parameters()
with open("somefile.params", "r") as dumpfile:
    p.load(dumpfile)
Error message:
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "C:\Users\...\miniconda3\envs\lmfitbug\lib\site-packages\lmfit\parameter.py", line 524, in load
    return self.loads(fp.read(), **kws)
  File "C:\Users\...\miniconda3\envs\lmfitbug\lib\site-packages\lmfit\parameter.py", line 470, in loads
    unique_symbols = {key: decode4js(tmp['unique_symbols'][key]) for key
  File "C:\Users\...\miniconda3\envs\lmfitbug\lib\site-packages\lmfit\parameter.py", line 470, in <dictcomp>
    unique_symbols = {key: decode4js(tmp['unique_symbols'][key]) for key
  File "C:\Users\...\miniconda3\envs\lmfitbug\lib\site-packages\lmfit\jsonutils.py", line 137, in decode4js
    out = dill.loads(b64decode(obj['value']))
  File "C:\Users\...\miniconda3\envs\lmfitbug\lib\base64.py", line 80, in b64decode
    s = _bytes_from_decode_data(s)
  File "C:\Users\...\miniconda3\envs\lmfitbug\lib\base64.py", line 45, in _bytes_from_decode_data
    raise TypeError("argument should be a bytes-like object or ASCII "
TypeError: argument should be a bytes-like object or ASCII string, not 'NoneType'
Version information
Python: 3.8.13 (default, Mar 28 2022, 06:59:08) [MSC v.1916 64 bit (AMD64)]

lmfit: 1.0.3, scipy: 1.8.1, numpy: 1.22.4,asteval: 0.9.27, uncertainties: 3.1.7

Dan-Sk avatar Jun 22 '22 14:06 Dan-Sk

@Dan-Sk I think this is an argument for making dill a required dependency.

newville avatar Jun 22 '22 15:06 newville

@newville I have no opinion on whether dill should be a requirement or not, but either way loading should not fail for reasons like the above. It is easy enough to test whether loading with dill works, and to fall back to the other method if it doesn't.

Dan-Sk avatar Jun 22 '22 15:06 Dan-Sk

@Dan-Sk

It is easy enough to test whether loading with dill works, and to fall back to the other method if it doesn't.

I'm not sure I agree with this. What's easy to tell is "did loading work". If that fails, we would not necessarily know why. We could guess "well, maybe this got saved without dill", but that would be a guess. It would be worse the other way: if the file got saved with dill installed, and the loading program could not import dill, there would be nothing to "fall back" on.

OTOH, if we require dill to be installed, then a failure at load time would again be, well, a failure.

newville avatar Jun 22 '22 16:06 newville

Mostly I think that you should have backwards compatibility for data that was saved to file, and therefore future versions of lmfit should, when loading with dill fails, try loading without dill.

Dan-Sk avatar Jun 23 '22 09:06 Dan-Sk

@Dan-Sk Well, I'm just not sure that is a matter of "backward compatibility".

That is, to even get to this point, you're in the section of the "unpack save file code" where it is trying to unpack a callable function. And we know that dill was not available at "save time" and is available at "load time". One could easily consider those to be "different environments" (even if the only thing that changed was that dill got installed). What gets saved for a callable function when dill is not available is the name of the callable function and a guess (but a good guess) of the name module that callable came from. At load time, the loading code will try to import that callable from that module. That is sort of a "best guess" already -- it should work fine if the environment has not changed, but it can definitely fail if that module is not available or has changed. That's all to say that "fall back to try without dill" is really not a guarantee of success.

What would be a better guarantee (but still require a consistent Python version) is to use dill.

newville avatar Jun 23 '22 14:06 newville

@Dan-Sk As I look at this more, I would say, sure, let's add a "try/except" around the "unpack with dill"

It might actually be best to reverse the order here, and try the non-dill load first: that is loading by name first will work for all of 1. the codes using the built-in modules, 2. user scripts that define their own functions and then keep them/distribute those scripts 3. most (including mine!) larger applications that will have custom but findable functions. 4. it will sometimes work in other cases.

OTOH, loading with dill is actually a bit more fragile, including being dependent on the Python version (or not 100% version independent). That is, if dill loading is going to fail, it might be good to already know that "load by name" did not work and we are sort of grasping at straws, with dill being one of those straws.

@Dan-Sk @reneeotten opinions?

newville avatar Jun 23 '22 17:06 newville

@newville I haven't had time to look at this more closely, but your proposal sounds reasonable.

I suppose another option would be to add a flag to the load function in the Parameters and Model class to specify whether dill is to be used. That would allow the end-user, who presumably knows how the file was generated, to override the automatic detection of dill being installed. That would avoid doing a trial-and-error approach of loading the data, but I'm certainly not against using that approach either.

reneeotten avatar Jun 24 '22 09:06 reneeotten

@newville That sounds good to me.

[...] the end-user, who presumably knows how the file was generated [...]

@reneeotten I have to disagree with you there. I certainly didn't know (or care) how lmfit saves data. It was only when the failure occurred, I opened up jsonutils.py, read through it and then happened to remember seeing out of the corner of my eye that installation of some package had caused dill to be installed as well just an hour before. Had any of these factors been different, I would simply have assumed that the file got corrupted somehow.

Dan-Sk avatar Jun 24 '22 10:06 Dan-Sk

@Dan-Sk I sympathize, bu just to be clear, saving a Model is sort of complicated by the fact that it is hard to reliably save a Python function - the "model function" - in a way that is portable, especially across Python versions.

The options are

a) save the name and module name that the function came from. For this, you assume the function will be available at load time, and be the same function. At save time, that is impossible for us to guarantee. But it will often work....

b) try to save a serialized compiled code. Pickle cannot do that. Dill can, though portable across Python major versions cannot br guaranteed into the future. This is "the clever" solution: really slick when it works and a disaster when it doesn't.

c) don't save the model at all, and instead save the Python code that creates and uses it. If you are looking for long-term archiving of an analysis, this is definitely the best approach. If that code starts to fail (say, if it used Python2) it will likely be able to be updated.

So to agree with @reneeotten , if you are saving a Model or a ModelResult, you should be aware of these limitations and make an informed decision.

newville avatar Jun 24 '22 14:06 newville

@newville Not disagreeing, just want to point out that my original issues relates to Parameters, not Models or ModelResults.

Dan-Sk avatar Jun 29 '22 09:06 Dan-Sk

@Dan-Sk yes, thanks we do understand that- for us it sort of all goes together in the category of "can we save functions in a way that can be reliably restored at some other time and in some other Python session?" And then, "if so, how"?

newville avatar Jun 29 '22 12:06 newville

Fixed in 8b5a2952b5ca8f5bd64abcde8e2b5eab145971f5

reneeotten avatar Nov 19 '22 04:11 reneeotten