lightning-thunder
lightning-thunder copied to clipboard
Pickling failure for auto-registered symbols pointing to Pytorch
🐛 Bug
Pickling a TraceCtx currently fails if it contains auto-registered symbols having .module pointing to Pytorch. These symbols cannot be looked up leading to a pickle error.
To Reproduce
Code sample
import thunder, torch
import dill as pickle
def fn(x):
return torch.positive(x)
jfn = thunder.jit(fn)
jfn(torch.randn(1))
pickle.dumps(thunder.last_traces(jfn)[0])
Traceback
Traceback (most recent call last):
File "/workspace/workdir/examples/dev/pickling.py", line 9, in <module>
pickle.dumps(thunder.last_traces(jfn)[0])
File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 280, in dumps
dump(obj, file, protocol, byref, fmode, recurse, **kwds)#, strictio)
File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 252, in dump
Pickler(file, protocol, **_kwds).dump(obj)
File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 420, in dump
StockPickler.dump(self, obj)
File "/usr/lib/python3.10/pickle.py", line 487, in dump
self.save(obj)
File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 414, in save
StockPickler.save(self, obj, save_persistent_id)
File "/usr/lib/python3.10/pickle.py", line 603, in save
self.save_reduce(obj=obj, *rv)
File "/usr/lib/python3.10/pickle.py", line 717, in save_reduce
save(state)
File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 414, in save
StockPickler.save(self, obj, save_persistent_id)
File "/usr/lib/python3.10/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 1217, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "/usr/lib/python3.10/pickle.py", line 972, in save_dict
self._batch_setitems(obj.items())
File "/usr/lib/python3.10/pickle.py", line 998, in _batch_setitems
save(v)
File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 414, in save
StockPickler.save(self, obj, save_persistent_id)
File "/usr/lib/python3.10/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python3.10/pickle.py", line 932, in save_list
self._batch_appends(obj)
File "/usr/lib/python3.10/pickle.py", line 956, in _batch_appends
save(x)
File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 414, in save
StockPickler.save(self, obj, save_persistent_id)
File "/usr/lib/python3.10/pickle.py", line 603, in save
self.save_reduce(obj=obj, *rv)
File "/usr/lib/python3.10/pickle.py", line 717, in save_reduce
save(state)
File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 414, in save
StockPickler.save(self, obj, save_persistent_id)
File "/usr/lib/python3.10/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 1217, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "/usr/lib/python3.10/pickle.py", line 972, in save_dict
self._batch_setitems(obj.items())
File "/usr/lib/python3.10/pickle.py", line 998, in _batch_setitems
save(v)
File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 414, in save
StockPickler.save(self, obj, save_persistent_id)
File "/usr/lib/python3.10/pickle.py", line 578, in save
rv = reduce(self.proto)
File "/workspace/workdir/thunder/core/symbol.py", line 233, in __reduce__
assert getattr(sys.modules[self.module.__name__], self.name, None) is self
AssertionError
Environment
- PyTorch Version (e.g., 1.0): 2.5.0a0+gitb0fc6aa
- OS (e.g., Linux): Linux
- Python version: 3.10.12
- CUDA/cuDNN version: 12.6
- GPU models and configuration: RTX ADA 6000
- Any other relevant information: Tested on NVIDIA internal docker containers
triage review
- we need to make the auto-registered functions available in a module (thunder.torch?)
- follow-up with @t-vi for details