tfx-addons icon indicating copy to clipboard operation
tfx-addons copied to clipboard

HFSpacePusher component proposal

Open deep-diver opened this issue 3 years ago • 16 comments
trafficstars

Wrote a HFSpacePusher component proposal (cc: @sayakpaul as the co-developer)

also along with @rcrowe-google and @casassg, @codesue is included as a possible reviewer :)

deep-diver avatar Aug 25 '22 14:08 deep-diver

Thanks for the PR! :rocket:

Instructions: Approve using /lgtm and mark for automatic merge by using /merge.

github-actions[bot] avatar Aug 25 '22 14:08 github-actions[bot]

Qq: why not merge the 2 proposals and components (aka a component that can push a model to hugging face and to a space if specified)

casassg avatar Aug 25 '22 14:08 casassg

since their purposes are different, I want TFX users to choose both or just HFSpacePusher in their pipeline.

deep-diver avatar Aug 25 '22 14:08 deep-diver

Qq: why not merge the 2 proposals and components (aka a component that can push a model to hugging face and to a space if specified)

I am somewhat inclined toward this one since this component has a dependency on the HFHubPusher component. Do we have a good idea of the sequence of the steps that need to be followed if only model is supplied to this component?

One concern is -- what happens if a particular model can't be loaded using from_pretrained() or other utilities provided by Hugging Face? What are other immediate corner cases that may arise?

sayakpaul avatar Aug 25 '22 14:08 sayakpaul

@sayakpaul

This component is designed to work without HFModelPusher. Optionally, it can work with Trainer only.

Since HFModelPusher can potentially be complex if the model card feature is added later, I want to keep this as a separate component.

deep-diver avatar Aug 25 '22 14:08 deep-diver

Fair enough. In that case, where would the user push the model? The same Space repo or a separate model repo on the HF hub? I think the latter is better since it clears separates the concerns.

sayakpaul avatar Aug 25 '22 14:08 sayakpaul

@sayakpaul

I thought the former is ideal since this component pushes things to the Space. WDYT? maybe @gante could join this discussion

deep-diver avatar Aug 25 '22 15:08 deep-diver

Hi there 👋

@deep-diver am I right in saying that the component proposed in #174 would push the TFX model to the HF Hub, and that this component would create a space that uses the model pushed by the previous component? (i.e. this component pushes no model, only the space)

If so, it can be really powerful when paired up with templates! We can team up with the Gradio team to create a few for the most common tasks so that users can copy and/or adapt.

Regarding it being merged into a single component or two components: Personally, I'd prefer two simpler components -- I suspect that a significant number of users won't be interested in having a demo. But the answer here should be given by the TFX team: homogeneity in the library > personal opinion :D

gante avatar Aug 25 '22 16:08 gante

@gante

@deep-diver am I right in saying that the component proposed in https://github.com/tensorflow/tfx-addons/pull/174 would push the TFX model to the HF Hub, and that this component would create a space that uses the model pushed by the previous component? (i.e. this component pushes no model, only the space)

Yes you got it right :) Thanks for jumping in!

The component proposed in this PR has two different ways to work

  1. leverage information from the HFModelPusher proposed in #174.

    • ppl can write any arbitrary source codes under a designated directory, then any placeholders related to the Model Hub (such as repo-id and branch) in every files will be replaced by the information emitted by the HFModelPusher.
  2. leverage a trained model from the Trainer component

    • in this case, a function (or a file with a function) will be auto-generated. the name would be get_model(), and ppl can write a template source codes while keeping this in mind.
    • the thing is I proposed to include model and app files in a Space repository all together in this case which is not ideal. But I think some ppl just want have a Space app without hosting a model in a separate Model repository.

and yes! it would be too good to have Gradio team for this :)

deep-diver avatar Aug 25 '22 16:08 deep-diver

But I think some ppl just want have a Space app without hosting a model in a separate Model repository.

On the HF side we would like to avoid this 🙏 Although both are git repos, so you can technically do that, we have many features tailored to certain kinds of repos. For instance, if a model lives inside a spaces repo, it would not be indexed in the model zoo, or interact with our evaluation tools.

gante avatar Aug 25 '22 16:08 gante

regarding @gante's comment, I think HFSpacePusher works with HFModelPusher is the best.

Even if HFSpacePusher might be useless without HFModelPusher in this case, it still makes sense in two ways:

  • like gante said, some ppl just want to host their models without building apps for them
  • the results of each component are cached and reusable for other ML pipeline runs. When someone decides to add space app later, the Model Hub part can be skipped.

What do you guys think?

deep-diver avatar Aug 25 '22 17:08 deep-diver

Non-blocking but I fear that having 2 components will make this more complex to use, preferably I would rather have 1 component which can have more functionality depending on what arguments you pass it. For example adding space config to HFModelPusher like:

HFModelPusher(
model: tfx.Model,
blessing: tfx.Blessing, 
hf_model_config: HFModelConfig,
space_config: Optional[HFSpaceConfig] = None,
...
)

casassg avatar Aug 25 '22 19:08 casassg

I like what @casassg is suggesting. It allows people to optionally host the pushed model (HF Hub) as a Space application. I think it's a soft enforcement to follow that people host their models on HF Hub first before putting together the Space demo.

sayakpaul avatar Aug 26 '22 02:08 sayakpaul

Fair enough! Then, let's go with HFModelPusher optionally enabling space app generation. If the component gets bigger and complier, we can decide to split them later.

@gante how Gradio team can be involved with this?

deep-diver avatar Aug 26 '22 02:08 deep-diver

@deep-diver brainstorming ideas with the Gradio team atm, will share ideas soon :)

gante avatar Aug 26 '22 15:08 gante

@deep-diver after talking with the Gradio team, it seems like a working solution is already available. All that is needed is that the model repo gets parameterized correctly!

In essence, when a model is pushed to the HF hub, tags can be added (e.g. see here; the model repo can also be updated after the initial push). Among other things, a tag can be one of the ML tasks available in our website. Pushing a model tagged with the right task is not only a good practice, but it also enables a bunch of things under the hood -- including automatic spaces templates!

For tagged models, an app with these 3 lines is all you need to create a simple demo for the task. A nice thing about this workflow is that it relies on gradio examples that are actively maintained and constantly improved 😎

gante avatar Aug 30 '22 14:08 gante

It seems like this proposal is still being defined, so I'll wait until it's ready for review. Please advise.

rcrowe-google avatar Aug 31 '22 22:08 rcrowe-google

@rcrowe-google

thanks for jumping in! This PR is going to be closed since this feature will be included in the HFModelPusher(#174) component. It is just open for further discussions for a short amount of time :)

deep-diver avatar Sep 01 '22 03:09 deep-diver

@deep-diver after talking with the Gradio team, it seems like a working solution is already available. All that is needed is that the model repo gets parameterized correctly!

In essence, when a model is pushed to the HF hub, tags can be added (e.g. see here; the model repo can also be updated after the initial push). Among other things, a tag can be one of the ML tasks available in our website. Pushing a model tagged with the right task is not only a good practice, but it also enables a bunch of things under the hood -- including automatic spaces templates!

For tagged models, an app with these 3 lines is all you need to create a simple demo for the task. A nice thing about this workflow is that it relies on gradio examples that are actively maintained and constantly improved 😎

@gante this is definitely exciting. I am just a little concerned about the types of models that are supported. The output model here is very likely going to be a SavedModel at its core (@deep-diver correct me if I am wrong). So, if the HF Hub model repo contains a SavedModel type, will it be well supported as you described above?

sayakpaul avatar Sep 01 '22 03:09 sayakpaul

@sayakpaul that is okay, we expect pure Keras models on the hub (pushed with the aid of our Mixin) to be in the SavedModel format 👍

gante avatar Sep 01 '22 09:09 gante