pykeen
pykeen copied to clipboard
Passing a dropout value to a model through the pipeline API
It would be great to be able to pass a dropout value to all models, thus enabling dropout, via the pipeline api, rather than having to create a new ERModel
instance of existing models with dropout added in.
For example, say I wanted to add dropout to the model ComplEx to make use of the uncertainty code you have available. I can do the following to achieve this:
from pykeen.datasets import Nations
from pykeen.models import ERModel
from pykeen.pipeline import pipeline
dataset = Nations()
model = ERModel(
triples_factory=dataset.training,
interaction='complex',
entity_representations_kwargs=dict(embedding_dim=10, dropout=0.1),
relation_representations_kwargs=dict(embedding_dim=10, dropout=0.1),
)
pipeline_result = pipeline(
dataset='Nations',
model=model,
)
I thought I could overwrite the entity_representations_kwargs
as follows:
from pykeen.pipeline import pipeline
pipeline_result = pipeline(
dataset='Nations',
model="ComplEx",
model_kwargs=dict(
embedding_dim=10,
entity_representations_kwargs=dict(dropout=0.1),
),
)
However this results in the following error: TypeError: pykeen.models.nbase.ERModel.__init__() got multiple values for keyword argument 'entity_representations_kwargs'
So would it be possible to allow dropout to be passed through somehow from the pipeline? It seems that some of the model code will need to be changed to propagate a dropout value through to the entity and relation kwargs.
Thanks!
You can directly use ERModel
with the pipeline, e.g.,
from pykeen.pipeline import pipeline
from pykeen.models import ERModel
pipeline_result = pipeline(
dataset="Nations",
model=ERModel, # notice that I this is the class, rather than its name
model_kwargs=dict(
interaction="complex",
# more explicitly, we could also pass
# entity_representations=[Embedding],
# it defaults to a single Representation; the representation defaults to Embedding
entity_representations_kwargs=dict(shape=10, dropout=0.1),
relation_representations_kwargs=dict(shape=10),
),
)
Notice that you may need to provide more parameters than usual, as the generic ERModel
does not know that the ComplEx interaction needs entity and relation representations with same shape.
On another note, the formulation above will not use "proper" complex embedding, but rather rely on a reshaping operation in the interaction function to view a real tensor as complex (effectively halving the dimension). In theory, you can change that by
entity_representations_kwargs=dict(shape=10, dropout=0.1, dtype=torch.cfloat),
however, I encountered an issue with dropout on native complex tensors.
RuntimeError: "bernoulli_scalar_cpu_" not implemented for 'ComplexFloat'
despite the promise of more complex tensor support in the last few versions of PyTorch, there are still many features missing that make it not yet possible. Let's relegate that discussion to the pre-existing issue at https://github.com/pykeen/pykeen/issues/82
You can directly use
ERModel
with the pipeline, e.g.,from pykeen.pipeline import pipeline from pykeen.models import ERModel pipeline_result = pipeline( dataset="Nations", model=ERModel, # notice that I this is the class, rather than its name model_kwargs=dict( interaction="complex", # more explicitly, we could also pass # entity_representations=[Embedding], # it defaults to a single Representation; the representation defaults to Embedding entity_representations_kwargs=dict(shape=10, dropout=0.1), relation_representations_kwargs=dict(shape=10), ), )
Notice that you may need to provide more parameters than usual, as the generic
ERModel
does not know that the ComplEx interaction needs entity and relation representations with same shape.On another note, the formulation above will not use "proper" complex embedding, but rather rely on a reshaping operation in the interaction function to view a real tensor as complex (effectively halving the dimension). In theory, you can change that by
entity_representations_kwargs=dict(shape=10, dropout=0.1, dtype=torch.cfloat),
however, I encountered an issue with dropout on native complex tensors.
RuntimeError: "bernoulli_scalar_cpu_" not implemented for 'ComplexFloat'
Thanks @mberr - I'll make a custom ERModel for now then :) I do however think it would be nice to be able to request dropout for all models from the main pipeline api without the need to remake the model every time, especially as the functionality seems to be already in place and it would appear to be just a matter of passing the right args through.
After a second look, I'm leaning on the side of rejecting this. It would add more complexity for a special use case - and ultimately make the pipeline less usable for newcomers.
Sure no problem at all, thanks for considering it.
I guess I had hoped specifying a dropout value could just be an additional argument passed into model_kwargs
and thus hopefully would not have caused too much added complexity but totally get your point about it being a niche use-case.