GraKeL icon indicating copy to clipboard operation
GraKeL copied to clipboard

Missing args in kernel.fit_transform()

Open cshjin opened this issue 2 years ago • 1 comments

Describe the bug The args in kernel.fit_transform is missing y. https://github.com/ysig/GraKeL/blob/33ffff18d99c13f8afc0438a5691cb1206b119fb/grakel/kernels/kernel.py#L169 This will cause problems when send into pipeline, for some kernel methods do not have local fit_transform implantation.

To Reproduce

from grakel.datasets import fetch_dataset
from grakel.kernels.graphlet_sampling import GraphletSampling
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC

dataset = fetch_dataset("PTC_MR", verbose=False)
G, y = dataset.data, dataset.target
gk = GraphletSampling(normalize=True, k=3)
G_train, G_test, y_train, y_test = train_test_split(G, y, test_size=0.2)

pipe = Pipeline([("gk", gk), ("svc", SVC(kernel="precomputed"))])
pipe.fit(G_train, y_train)
y_pred = pipe.predict(G_test)
print("accuracy ", accuracy_score(y_test, y_pred))

Expected behavior

 def fit_transform(self, X, y=None): 

should solve the problem.

cshjin avatar Feb 21 '22 00:02 cshjin

I have no objection in adding this, but please be specific on what breaks. If y is set and not used it may be deceiving for what our method does with it which is nothing. However when we made the library we had a very older version of scikit.

ysig avatar Sep 12 '22 12:09 ysig

I had the same issue with the VertexHistogram kernel. In fact, the sklearn documentation mentions

Pipeline compatibility

For an estimator to be usable together with pipeline.Pipeline in any but the last step, it needs to provide a fit or fit_transform function. To be able to evaluate the pipeline on any data but the training set, it also needs to provide a transform function. There are no special requirements for the last step in a pipeline, except that it has a fit function. All fit and fit_transform functions must take arguments X, y, even if y is not used. Similarly, for score to be usable, the last step of the pipeline needs to have a score function that accepts an optional y.

(I don't want to sound condescending, but I recently read a lot of this documentation, related to my other Issue)

pwelke avatar Jun 27 '23 17:06 pwelke

Ok I'll pass the change and add also the graph clone one in the following days.

ysig avatar Jun 27 '23 19:06 ysig

@ysig any news on this? GraKeL is a great library, but not being able to use Pipelines is quite limiting, especially for experiments involving many different kernels.

j-adamczyk avatar Aug 28 '23 12:08 j-adamczyk

@j-adamczyk Thank you for reaching out.

This update will be on grakel 1.10.0 and we will push it soon (I was waiting in case something else comes up so I can accumulate changes).

If it bottlenecks you from what you are currently doing, a simple solution is to either patch it on the source code or simply do something like:

class Patch(<KernelName>):
  def fit_transform(self, x, y=None):
    return super().fit_transform(x)

kernel = Patch(normalize=True)
kernel.fit_transform(graphs)

Although now it is already done, grakel is more than open to pull-requests and would love to be a repo evolved through the needs of the graph-ml community.

ysig avatar Aug 28 '23 17:08 ysig

@ysig how about the release? I will be doing workshops about graph classification in about a week and I wanted to showcase GraKeL, and this workaround is doable, but quite ugly nevertheless.

j-adamczyk avatar Oct 19 '23 19:10 j-adamczyk

@j-adamczyk should be ok in the wheels we pushed today. Sorry for the delay. @pwelke should also be ok now.

ysig avatar Oct 27 '23 12:10 ysig