pymc4 icon indicating copy to clipboard operation
pymc4 copied to clipboard

ValueError: Cannot create a tensor proto whose content is larger than 2GB.

Open cadama opened this issue 4 years ago • 2 comments

Hello,

I am hitting tensorflow limits when feeding a larger dataset to the model.

my model looks like:

@pm.model
def model(X, clicks, conversions):

    b_0 = yield pm.Normal(loc=0., scale=10, name='b_0')
    betas = yield pm.Normal(loc=0., scale=10, name='betas', batch_stack=X.shape[1])

    # Expected value
    p = tf.math.sigmoid( b_0 + tf.tensordot(betas, tf.cast(X.T, tf.float32), axes=1) )

    # Data likelihood
    obs = yield pm.Binomial('obs', clicks, p, observed=conversions)

In this way I believe tensorflow is including the whole dataset into the graph. Is that the correct way of doing linear regression? How can I avoid hitting such limit? Other examples are doing smth similar, e.g.:

https://github.com/pymc-devs/pymc4/blob/master/notebooks/radon_hierarchical.ipynb

Thanks in advance

C

cadama avatar Aug 20 '20 10:08 cadama

What do you mean by "cannot create a tensor..."? Is it because the model graph doesn't fit into the memory?

rrkarim avatar Aug 23 '20 04:08 rrkarim

There exist a protobuf limit reported here:

https://stackoverflow.com/questions/34128872/google-protobuf-maximum-size/34186672

This is not a memory limit of the machine. How can one train a dataset that exceeds this size?

cadama avatar Aug 25 '20 08:08 cadama