tfx-addons
tfx-addons copied to clipboard
HFSpacePusher component proposal
Wrote a HFSpacePusher component proposal (cc: @sayakpaul as the co-developer)
also along with @rcrowe-google and @casassg, @codesue is included as a possible reviewer :)
Thanks for the PR! :rocket:
Instructions: Approve using /lgtm and mark for automatic merge by using /merge.
Qq: why not merge the 2 proposals and components (aka a component that can push a model to hugging face and to a space if specified)
since their purposes are different, I want TFX users to choose both or just HFSpacePusher in their pipeline.
Qq: why not merge the 2 proposals and components (aka a component that can push a model to hugging face and to a space if specified)
I am somewhat inclined toward this one since this component has a dependency on the HFHubPusher component. Do we have a good idea of the sequence of the steps that need to be followed if only model is supplied to this component?
One concern is -- what happens if a particular model can't be loaded using from_pretrained() or other utilities provided by Hugging Face? What are other immediate corner cases that may arise?
@sayakpaul
This component is designed to work without HFModelPusher. Optionally, it can work with Trainer only.
Since HFModelPusher can potentially be complex if the model card feature is added later, I want to keep this as a separate component.
Fair enough. In that case, where would the user push the model? The same Space repo or a separate model repo on the HF hub? I think the latter is better since it clears separates the concerns.
@sayakpaul
I thought the former is ideal since this component pushes things to the Space. WDYT? maybe @gante could join this discussion
Hi there 👋
@deep-diver am I right in saying that the component proposed in #174 would push the TFX model to the HF Hub, and that this component would create a space that uses the model pushed by the previous component? (i.e. this component pushes no model, only the space)
If so, it can be really powerful when paired up with templates! We can team up with the Gradio team to create a few for the most common tasks so that users can copy and/or adapt.
Regarding it being merged into a single component or two components: Personally, I'd prefer two simpler components -- I suspect that a significant number of users won't be interested in having a demo. But the answer here should be given by the TFX team: homogeneity in the library > personal opinion :D
@gante
@deep-diver am I right in saying that the component proposed in https://github.com/tensorflow/tfx-addons/pull/174 would push the TFX model to the HF Hub, and that this component would create a space that uses the model pushed by the previous component? (i.e. this component pushes no model, only the space)
Yes you got it right :) Thanks for jumping in!
The component proposed in this PR has two different ways to work
-
leverage information from the
HFModelPusherproposed in #174.- ppl can write any arbitrary source codes under a designated directory, then any placeholders related to the Model Hub (such as repo-id and branch) in every files will be replaced by the information emitted by the
HFModelPusher.
- ppl can write any arbitrary source codes under a designated directory, then any placeholders related to the Model Hub (such as repo-id and branch) in every files will be replaced by the information emitted by the
-
leverage a trained model from the
Trainercomponent- in this case, a function (or a file with a function) will be auto-generated. the name would be
get_model(), and ppl can write a template source codes while keeping this in mind. - the thing is I proposed to include model and app files in a Space repository all together in this case which is not ideal. But I think some ppl just want have a Space app without hosting a model in a separate Model repository.
- in this case, a function (or a file with a function) will be auto-generated. the name would be
and yes! it would be too good to have Gradio team for this :)
But I think some ppl just want have a Space app without hosting a model in a separate Model repository.
On the HF side we would like to avoid this 🙏 Although both are git repos, so you can technically do that, we have many features tailored to certain kinds of repos. For instance, if a model lives inside a spaces repo, it would not be indexed in the model zoo, or interact with our evaluation tools.
regarding @gante's comment, I think HFSpacePusher works with HFModelPusher is the best.
Even if HFSpacePusher might be useless without HFModelPusher in this case, it still makes sense in two ways:
- like gante said, some ppl just want to host their models without building apps for them
- the results of each component are cached and reusable for other ML pipeline runs. When someone decides to add space app later, the Model Hub part can be skipped.
What do you guys think?
Non-blocking but I fear that having 2 components will make this more complex to use, preferably I would rather have 1 component which can have more functionality depending on what arguments you pass it. For example adding space config to HFModelPusher like:
HFModelPusher(
model: tfx.Model,
blessing: tfx.Blessing,
hf_model_config: HFModelConfig,
space_config: Optional[HFSpaceConfig] = None,
...
)
I like what @casassg is suggesting. It allows people to optionally host the pushed model (HF Hub) as a Space application. I think it's a soft enforcement to follow that people host their models on HF Hub first before putting together the Space demo.
Fair enough! Then, let's go with HFModelPusher optionally enabling space app generation. If the component gets bigger and complier, we can decide to split them later.
@gante how Gradio team can be involved with this?
@deep-diver brainstorming ideas with the Gradio team atm, will share ideas soon :)
@deep-diver after talking with the Gradio team, it seems like a working solution is already available. All that is needed is that the model repo gets parameterized correctly!
In essence, when a model is pushed to the HF hub, tags can be added (e.g. see here; the model repo can also be updated after the initial push). Among other things, a tag can be one of the ML tasks available in our website. Pushing a model tagged with the right task is not only a good practice, but it also enables a bunch of things under the hood -- including automatic spaces templates!
For tagged models, an app with these 3 lines is all you need to create a simple demo for the task. A nice thing about this workflow is that it relies on gradio examples that are actively maintained and constantly improved 😎
It seems like this proposal is still being defined, so I'll wait until it's ready for review. Please advise.
@rcrowe-google
thanks for jumping in! This PR is going to be closed since this feature will be included in the HFModelPusher(#174) component. It is just open for further discussions for a short amount of time :)
@deep-diver after talking with the Gradio team, it seems like a working solution is already available. All that is needed is that the model repo gets parameterized correctly!
In essence, when a model is pushed to the HF hub, tags can be added (e.g. see here; the model repo can also be updated after the initial push). Among other things, a tag can be one of the ML tasks available in our website. Pushing a model tagged with the right task is not only a good practice, but it also enables a bunch of things under the hood -- including automatic spaces templates!
For tagged models, an app with these 3 lines is all you need to create a simple demo for the task. A nice thing about this workflow is that it relies on gradio examples that are actively maintained and constantly improved 😎
@gante this is definitely exciting. I am just a little concerned about the types of models that are supported. The output model here is very likely going to be a SavedModel at its core (@deep-diver correct me if I am wrong). So, if the HF Hub model repo contains a SavedModel type, will it be well supported as you described above?
@sayakpaul that is okay, we expect pure Keras models on the hub (pushed with the aid of our Mixin) to be in the SavedModel format 👍