pykeen icon indicating copy to clipboard operation
pykeen copied to clipboard

Passing a dropout value to a model through the pipeline API

Open sbonner0 opened this issue 2 years ago • 5 comments

It would be great to be able to pass a dropout value to all models, thus enabling dropout, via the pipeline api, rather than having to create a new ERModel instance of existing models with dropout added in.

For example, say I wanted to add dropout to the model ComplEx to make use of the uncertainty code you have available. I can do the following to achieve this:


from pykeen.datasets import Nations
from pykeen.models import ERModel
from pykeen.pipeline import pipeline

dataset = Nations()
model = ERModel(
	triples_factory=dataset.training,
	interaction='complex',
	entity_representations_kwargs=dict(embedding_dim=10, dropout=0.1),
	relation_representations_kwargs=dict(embedding_dim=10, dropout=0.1),
)
pipeline_result = pipeline(
    dataset='Nations',
    model=model,

)

I thought I could overwrite the entity_representations_kwargs as follows:

from pykeen.pipeline import pipeline

pipeline_result = pipeline(
    dataset='Nations',
    model="ComplEx",
    model_kwargs=dict(
    	embedding_dim=10,
        entity_representations_kwargs=dict(dropout=0.1),
    ),
)

However this results in the following error: TypeError: pykeen.models.nbase.ERModel.__init__() got multiple values for keyword argument 'entity_representations_kwargs'

So would it be possible to allow dropout to be passed through somehow from the pipeline? It seems that some of the model code will need to be changed to propagate a dropout value through to the entity and relation kwargs.

Thanks!

sbonner0 avatar Jun 10 '22 19:06 sbonner0

You can directly use ERModel with the pipeline, e.g.,

from pykeen.pipeline import pipeline
from pykeen.models import ERModel

pipeline_result = pipeline(
    dataset="Nations",
    model=ERModel,  # notice that I this is the class, rather than its name
    model_kwargs=dict(
        interaction="complex",
        # more explicitly, we could also pass
        # entity_representations=[Embedding],
        # it defaults to a single Representation; the representation defaults to Embedding
        entity_representations_kwargs=dict(shape=10, dropout=0.1),
        relation_representations_kwargs=dict(shape=10),
    ),
)

Notice that you may need to provide more parameters than usual, as the generic ERModel does not know that the ComplEx interaction needs entity and relation representations with same shape.

On another note, the formulation above will not use "proper" complex embedding, but rather rely on a reshaping operation in the interaction function to view a real tensor as complex (effectively halving the dimension). In theory, you can change that by

entity_representations_kwargs=dict(shape=10, dropout=0.1, dtype=torch.cfloat),

however, I encountered an issue with dropout on native complex tensors.

RuntimeError: "bernoulli_scalar_cpu_" not implemented for 'ComplexFloat'

mberr avatar Jun 14 '22 11:06 mberr

despite the promise of more complex tensor support in the last few versions of PyTorch, there are still many features missing that make it not yet possible. Let's relegate that discussion to the pre-existing issue at https://github.com/pykeen/pykeen/issues/82

cthoyt avatar Jun 14 '22 14:06 cthoyt

You can directly use ERModel with the pipeline, e.g.,


from pykeen.pipeline import pipeline

from pykeen.models import ERModel



pipeline_result = pipeline(

    dataset="Nations",

    model=ERModel,  # notice that I this is the class, rather than its name

    model_kwargs=dict(

        interaction="complex",

        # more explicitly, we could also pass

        # entity_representations=[Embedding],

        # it defaults to a single Representation; the representation defaults to Embedding

        entity_representations_kwargs=dict(shape=10, dropout=0.1),

        relation_representations_kwargs=dict(shape=10),

    ),

)

Notice that you may need to provide more parameters than usual, as the generic ERModel does not know that the ComplEx interaction needs entity and relation representations with same shape.

On another note, the formulation above will not use "proper" complex embedding, but rather rely on a reshaping operation in the interaction function to view a real tensor as complex (effectively halving the dimension). In theory, you can change that by


entity_representations_kwargs=dict(shape=10, dropout=0.1, dtype=torch.cfloat),

however, I encountered an issue with dropout on native complex tensors.


RuntimeError: "bernoulli_scalar_cpu_" not implemented for 'ComplexFloat'

Thanks @mberr - I'll make a custom ERModel for now then :) I do however think it would be nice to be able to request dropout for all models from the main pipeline api without the need to remake the model every time, especially as the functionality seems to be already in place and it would appear to be just a matter of passing the right args through.

sbonner0 avatar Jun 14 '22 18:06 sbonner0

After a second look, I'm leaning on the side of rejecting this. It would add more complexity for a special use case - and ultimately make the pipeline less usable for newcomers.

cthoyt avatar Jul 04 '22 13:07 cthoyt

Sure no problem at all, thanks for considering it.

I guess I had hoped specifying a dropout value could just be an additional argument passed into model_kwargs and thus hopefully would not have caused too much added complexity but totally get your point about it being a niche use-case.

sbonner0 avatar Jul 04 '22 15:07 sbonner0