dask-ml icon indicating copy to clipboard operation
dask-ml copied to clipboard

Incremental Wrapper ValueError: Layer not in the HighLevelGraph's layers:

Open cwoolfo1 opened this issue 3 years ago • 5 comments

What happened**: I am getting value errors when I implement an incremental wrapper around my PyTorch model with skorch.

ValueError: Layer ('fit-cb2461b73d6a9abbf8a6eacb8d7c983d', 3) not in the HighLevelGraph's layers: ['original-array-ca0ac32624c3fb9bb23d2ca89c89e186', 'array-ca0ac32624c3fb9bb23d2ca89c89e186', 'transpose-c840e9da8930d4f38c6497326ee6e59d', 139711252509824, 'getitem-4f71595287eec8f0490287973829e6f2', 'reshape-fe8d57675329c0348f3a175d62f073d0']

What you expected to happen: I expected my neural network model to begin training with my dataset incrementally

Minimal Complete Verifiable Example:


import dask.array as da
import torch.nn as nn
from skorch import NeuralNetRegressor
import torch.optim as optim
from dask_ml.wrappers import Incremental
from dask_ml.datasets import make_regression

class GRU(nn.Module):
    def __init__(self, inputsize, outputsize):
        super(GRU, self).__init__()
        self.inputsize = inputsize
        self.outputsize = outputsize
        self.hiddenlayers = nn.GRU(self.inputsize, self.inputsize, num_layers=2, batch_first=True)
        self.outputlayer = nn.Linear(self.inputsize, self.outputsize)
        
    def forward(self, x):
        output, hidden = self.hiddenlayers(x)
        x = hidden[-1]
        x = self.outputlayer(x)
        return x
    

niceties = {
    "callbacks": False,
    "warm_start": False,
    "train_split": None,
    "max_epochs": 1,
}

model = NeuralNetRegressor(
    module=GRU,
    module__inputsize=10,
    module__outputsize=1,
    criterion=nn.L1Loss(),
    optimizer=optim.SGD,
    optimizer__lr=0.01,
    optimizer__momentum=0.9,
    batch_size=100,
    **niceties
    )

inc = Incremental(model, scoring="r2")
# Trains the model incrementally using chunked data


X, y = make_regression(n_samples=10000, n_features=10, n_targets=1, chunks=100)
# Creates data for github example

y = X[:, 1].reshape(-1, 1)
# reshapes the input for sequential model
X = da.stack([X, X, X], axis=1)


print(X.shape)
print(y.shape)


inc.fit(X, y)
print(inc.score(X, y))

Anything else we need to know?:

Environment: Anaconda 3

  • Dask version: 1.7.0
  • Python version: 3.9.7
  • Operating System: Pop OS 21.10
  • Install method (conda, pip, source): conda

cwoolfo1 avatar Apr 14 '22 16:04 cwoolfo1

Can you post the full traceback?

Does NeuralNetRegressor implement partial_fit?

TomAugspurger avatar Apr 16 '22 13:04 TomAugspurger

Traceback (most recent call last):
  File "/coolstuff.py", line 66, in <module>
    inc.fit(X, y)
  File "/home/anaconda3/envs/Torch/lib/python3.9/site-packages/dask_ml/wrappers.py", line 493, in fit
    self._fit_for_estimator(estimator, X, y, **fit_kwargs)
  File "/home/anaconda3/envs/Torch/lib/python3.9/site-packages/dask_ml/wrappers.py", line 477, in _fit_for_estimator
    result = fit(
  File "/home/anaconda3/envs/Torch/lib/python3.9/site-packages/dask_ml/_partial.py", line 136, in fit
    value = Delayed((name, nblocks - 1), new_dsk)
  File "/home/anaconda3/envs/Torch/lib/python3.9/site-packages/dask/delayed.py", line 497, in __init__
    raise ValueError(
ValueError: Layer ('fit-3a23b2bee452186733b389b1c5a56db7', 99) not in the HighLevelGraph's layers: ['stack-a184c924722ff5ca46fb6c0ef85f0d12', 'normal-862d9ba0e2bc042085f36c63710cfe0b', 140630240623552, 'getitem-0d109ff96323bf4206bd2b008436a85a', 'reshape-84d7afd23da422ab179600d3f13f97b8']

cwoolfo1 avatar Apr 16 '22 19:04 cwoolfo1

The neural net regressors from skorch does implement partial fit

cwoolfo1 avatar Apr 16 '22 19:04 cwoolfo1

Dask-ML v1.7.0 is from Sep. 2020. Try upgrading, I think it's been fixed since then (it certainly raises an error on the main branch): https://github.com/dask/dask-ml/blob/67d28b15dfff7869e9a04def203aa129a3540b27/dask_ml/_partial.py#L96-L98

stsievert avatar Apr 16 '22 23:04 stsievert

That fixed one issue that I was having. However, there is still 1 error with the incremental wrapper:

Traceback (most recent call last):
  File "/home/christopherwoolford/Documents/Research/Deep learning research with Dr. Aledhari/Gene Regulatory Elements Prediction/GRU/GRUStuff/GRU/GRUwithDask.py", line 89, in <module>
    train = inc.fit(X, y)
  File "/home/christopherwoolford/anaconda3/envs/Torch/lib/python3.9/site-packages/dask_ml/wrappers.py", line 579, in fit
    self._fit_for_estimator(estimator, X, y, **fit_kwargs)
  File "/home/christopherwoolford/anaconda3/envs/Torch/lib/python3.9/site-packages/dask_ml/wrappers.py", line 563, in _fit_for_estimator
    result = fit(
  File "/home/christopherwoolford/anaconda3/envs/Torch/lib/python3.9/site-packages/dask_ml/_partial.py", line 137, in fit
    return value.compute()
  File "/home/christopherwoolford/anaconda3/envs/Torch/lib/python3.9/site-packages/dask/base.py", line 290, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/home/christopherwoolford/anaconda3/envs/Torch/lib/python3.9/site-packages/dask/base.py", line 573, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/christopherwoolford/anaconda3/envs/Torch/lib/python3.9/site-packages/dask/threaded.py", line 81, in get
    results = get_async(
  File "/home/christopherwoolford/anaconda3/envs/Torch/lib/python3.9/site-packages/dask/local.py", line 506, in get_async
    raise_exception(exc, tb)
  File "/home/christopherwoolford/anaconda3/envs/Torch/lib/python3.9/site-packages/dask/local.py", line 314, in reraise
    raise exc
  File "/home/christopherwoolford/anaconda3/envs/Torch/lib/python3.9/site-packages/dask/local.py", line 219, in execute_task
    result = _execute_task(task, data)
  File "/home/christopherwoolford/anaconda3/envs/Torch/lib/python3.9/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/christopherwoolford/anaconda3/envs/Torch/lib/python3.9/site-packages/dask_ml/_partial.py", line 20, in _partial_fit
    model.partial_fit(x, y, **kwargs)
  File "/home/christopherwoolford/anaconda3/envs/Torch/lib/python3.9/site-packages/skorch/net.py", line 1174, in partial_fit
    self.fit_loop(X, y, **fit_params)
  File "/home/christopherwoolford/anaconda3/envs/Torch/lib/python3.9/site-packages/skorch/net.py", line 1074, in fit_loop
    self.check_data(X, y)
  File "/home/christopherwoolford/anaconda3/envs/Torch/lib/python3.9/site-packages/skorch/regressor.py", line 69, in check_data
    if get_dim(y) == 1:
  File "/home/christopherwoolford/anaconda3/envs/Torch/lib/python3.9/site-packages/skorch/utils.py", line 193, in get_dim
    return y.dim()
AttributeError: 'tuple' object has no attribute 'dim'

There appears to be an error in the wrapper where the inserted data is not being converted into torch tensor. I modified the code and just used skorch and that was running just fine. However the incremental wrapper is not converting the input data into the necessary form for skorch

cwoolfo1 avatar Apr 19 '22 21:04 cwoolfo1