skops icon indicating copy to clipboard operation
skops copied to clipboard

ENH - Gets SciKeras script working

Open lazarust opened this issue 1 year ago • 46 comments

WIP still need to fix one test

Reference Issues/PRs

Fixes #388

What does this implement/fix? Explain your changes.

This fixes a recursion error that was happening when dumping a scikeras model.

Any other comments?

lazarust avatar Oct 10 '23 01:10 lazarust

It looks like all the pytest tests are failing with You have exceeded our daily quotas for action: createRepo. We invite you to retry later.

lazarust avatar Oct 13 '23 13:10 lazarust

Should be addressed by #398

BenjaminBossan avatar Oct 13 '23 13:10 BenjaminBossan

@lazarust Could you please solve the merge conflict? Regarding the uncovered new line: Would that be solved by adding tests for scikeras?

BenjaminBossan avatar Oct 26 '23 15:10 BenjaminBossan

@BenjaminBossan I've fixed the merge conflict.

Yeah, I believe it would be but I was unsure if we wanted to have tests that included another library like that. Should I add one?

lazarust avatar Oct 29 '23 00:10 lazarust

Yeah, I believe it would be but I was unsure if we wanted to have tests that included another library like that. Should I add one?

Yes, it would be good, since that was the initial reason for the change. We have external library tests, see here:

https://github.com/skops-dev/skops/blob/main/skops/io/tests/test_external.py

For scikeras, it won't be possible to add a comprehensive coverage of all possible models, so a simple example should be enough. If it's possible to add a unit test independently of scikeras that explicitly tests weakrefs, that would also be good, not sure how easy it is to do.

Regarding the failing tests, at first glance, it appears to be a change in the model repr in the latest sklearn version, so not related to the changes here.

BenjaminBossan avatar Oct 31 '23 09:10 BenjaminBossan

@BenjaminBossan Sorry this took me so long to get back to. I've added a test to hit that line.

lazarust avatar Nov 14 '23 01:11 lazarust

@adrinjalali The tests for sklearn nightly are failing because the model repr was changed (not sure why). Normally, we could fix that by having the test check the sklearn verison, but this is a doctest. Any idea how it can be fixed, short of skipping it whole?

BenjaminBossan avatar Nov 17 '23 14:11 BenjaminBossan

Ugh, the list of trusted modules is giant now :D I guess it's related to the tensorflow change. Could you please explain why that was necessary? Also, we now get this error on CI:

E AttributeError: module 'scikeras' has no attribute 'wrappers'

BenjaminBossan avatar Nov 24 '23 13:11 BenjaminBossan

@BenjaminBossan Yeah, sorry I realized I had marked the test method as a @pytest.fixture 🤦🏽, so the test wasn't ever running (which is why it wasn't erroring when switching from dump to dumps). I'm hoping to get the test fixed and cleanup that list of trusted modules today.

lazarust avatar Nov 24 '23 15:11 lazarust

@BenjaminBossan After staring at this all day, I could use some help lol. For some reason, there's some infinite recursion happening when constructing the tree that I can't figure out why.

Initially, I thought it was due to the CachedNodes in the tree, but the construct method isn't getting hit for those. I've gone through the tree and didn't see anything outrageous so if you could take a look that'd be great!

lazarust avatar Nov 25 '23 01:11 lazarust

I'll have to check when I'm back. Still off till end of November. But I remember CacheNode caused some issues when I was working on it, partly due to small integers and other common objects having the same id in python.

adrinjalali avatar Nov 25 '23 11:11 adrinjalali

@adrinjalali It looks like you're right, there are some JSONNodes that have the same id as some of the booleans that are getting cached.

To me, there are a couple of solutions:

  1. Find a better way to identify objects uniquely
    • Maybe we should use uuid instead?
  2. Don't cache boolean values since these seem to have duplicate ids.
  3. Include the data type in the __id__ (so it would be something like "id/dtype")
    • This isn't the best idea since we'd have to update everywhere that uses that id.

lazarust avatar Nov 30 '23 00:11 lazarust

Find a better way to identify objects uniquely

  • Maybe we should use uuid instead?

Could you explain further how that would work?

Don't cache boolean values since these seem to have duplicate ids.

Note that booleans are not the only values that could be affected, e.g. small integers will have the same issue.

BenjaminBossan avatar Dec 04 '23 12:12 BenjaminBossan

@BenjaminBossan On that first point, I don't think that would work anymore. I had thought uuid was just a more complicated id function but looking into the documentation it seems to be doing something differently than I understood.

I also agree with your note, but I'm unsure of the best way forward. Do you have any thoughts?

lazarust avatar Dec 05 '23 01:12 lazarust

Thinking about this, WDYT of saving TF objects using TF and saving them in our zipfile, not trusting any TF object by default, and when raising a warning about untrusted code, we also point users to this page (https://github.com/tensorflow/tensorflow/blob/master/SECURITY.md)

adrinjalali avatar Dec 05 '23 14:12 adrinjalali

@adrinjalali That sounds good to me! After reading through the TensorFlow security documentation, showing the warning makes sense.

Would we show the warning on dumps or only on loads? I would think both probably but I could see an argument to just show it on loads.

lazarust avatar Dec 12 '23 01:12 lazarust

It would only be on loading time, and by default load fails anyway, we just need to make sure when it fails, we also give them the right information and the right link.

adrinjalali avatar Dec 12 '23 16:12 adrinjalali

@adrinjalali Sweet, so things that need to be changed:

  1. When dumping a TensorFlow model save the TF object using TF and store it in the .zip file.
  2. When loading a Tensorflow model raise a warning linking to the TF docs a. Since the model fails to load anyway, we're just changing the error message at this point, if I'm understanding things correctly.

Do I have that right? Thanks for your help on this solution.

lazarust avatar Dec 12 '23 17:12 lazarust

@adrinjalali I am just pinging you again to see if you have any thoughts before I get started on a solution.

lazarust avatar Jan 02 '24 02:01 lazarust

@lazarust yes, that sounds perfect to me.

adrinjalali avatar Jan 04 '24 10:01 adrinjalali

@adrinjalali @BenjaminBossan This is ready for y'all to look at again! Sorry it took a while, got a little busy around the holidays.

lazarust avatar Jan 24 '24 00:01 lazarust

@adrinjalali This is ready for you to take a look at again. The failing tests are due to this bug in scikeras https://github.com/adriangb/scikeras/issues/313, so we'll probably need to wait for that to get fixed before moving this forward

lazarust avatar Apr 01 '24 21:04 lazarust

I think you forgot a future import and old python is complaining.

adrinjalali avatar Apr 23 '24 11:04 adrinjalali

@adrinjalali I got it working for older versions of Python other than Windows on 3.8 (I think the Ubuntu 3.9 failure was just a codecov issue). Could you help me figure out how to get it working? Downgrading tensorflow to 2.11 just breaks more versions.

lazarust avatar Apr 24 '24 02:04 lazarust

One of the main reasons we support 3.8 was colab, which now supports 3.10. I think we can drop 3.8 alltogether.

Would you be kind enough to submit a PR to remove 3.8 support?

adrinjalali avatar Apr 24 '24 11:04 adrinjalali

@adrinjalali I created this PR to remove 3.8 support #418, but I'm running into an issue where pip install scikit-learn-intelex isn't finding any valid versions only on MacOS (this is happening both in the GH action and locally for me). I'm still looking into a fix

lazarust avatar Apr 28 '24 21:04 lazarust

MacOS issue could be this: https://github.com/actions/setup-python/issues/850

BenjaminBossan avatar Apr 29 '24 08:04 BenjaminBossan

Intelex is very brittle, and I don't think it's used much if at all. I'd be in favor of removing it. It was only to handle the inference side and HF inference is not that well maintained anyway.

@BenjaminBossan any objections to remove it?

adrinjalali avatar May 03 '24 14:05 adrinjalali

Works for me.

BenjaminBossan avatar May 03 '24 14:05 BenjaminBossan

@lazarust you've been very kind here. Would you remove sklearn-intelex in a separate PR too?

adrinjalali avatar May 03 '24 14:05 adrinjalali