lightning-thunder icon indicating copy to clipboard operation
lightning-thunder copied to clipboard

Pickling failure for auto-registered symbols pointing to Pytorch

Open mattteochen opened this issue 1 year ago • 1 comments

🐛 Bug

Pickling a TraceCtx currently fails if it contains auto-registered symbols having .module pointing to Pytorch. These symbols cannot be looked up leading to a pickle error.

To Reproduce

Code sample

import thunder, torch
import dill as pickle
def fn(x):
   return torch.positive(x)
jfn = thunder.jit(fn)
jfn(torch.randn(1))
pickle.dumps(thunder.last_traces(jfn)[0])

Traceback

Traceback (most recent call last):
  File "/workspace/workdir/examples/dev/pickling.py", line 9, in <module>
    pickle.dumps(thunder.last_traces(jfn)[0])
  File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 280, in dumps
    dump(obj, file, protocol, byref, fmode, recurse, **kwds)#, strictio)
  File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 252, in dump
    Pickler(file, protocol, **_kwds).dump(obj)
  File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 420, in dump
    StockPickler.dump(self, obj)
  File "/usr/lib/python3.10/pickle.py", line 487, in dump
    self.save(obj)
  File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/usr/lib/python3.10/pickle.py", line 603, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/lib/python3.10/pickle.py", line 717, in save_reduce
    save(state)
  File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/usr/lib/python3.10/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 1217, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/usr/lib/python3.10/pickle.py", line 972, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.10/pickle.py", line 998, in _batch_setitems
    save(v)
  File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/usr/lib/python3.10/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/usr/lib/python3.10/pickle.py", line 932, in save_list
    self._batch_appends(obj)
  File "/usr/lib/python3.10/pickle.py", line 956, in _batch_appends
    save(x)
  File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/usr/lib/python3.10/pickle.py", line 603, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/lib/python3.10/pickle.py", line 717, in save_reduce
    save(state)
  File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/usr/lib/python3.10/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 1217, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/usr/lib/python3.10/pickle.py", line 972, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.10/pickle.py", line 998, in _batch_setitems
    save(v)
  File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 414, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/usr/lib/python3.10/pickle.py", line 578, in save
    rv = reduce(self.proto)
  File "/workspace/workdir/thunder/core/symbol.py", line 233, in __reduce__
    assert getattr(sys.modules[self.module.__name__], self.name, None) is self
AssertionError

Environment

  • PyTorch Version (e.g., 1.0): 2.5.0a0+gitb0fc6aa
  • OS (e.g., Linux): Linux
  • Python version: 3.10.12
  • CUDA/cuDNN version: 12.6
  • GPU models and configuration: RTX ADA 6000
  • Any other relevant information: Tested on NVIDIA internal docker containers

mattteochen avatar Aug 19 '24 09:08 mattteochen

triage review

  • we need to make the auto-registered functions available in a module (thunder.torch?)
  • follow-up with @t-vi for details

mruberry avatar Aug 19 '24 15:08 mruberry