modelcards icon indicating copy to clipboard operation
modelcards copied to clipboard

Extentions for integrations

Open adrinjalali opened this issue 2 years ago • 11 comments

How should a library integration developer extend the modelcard here to fit their usecase?

It would be nice to start with a documentation to explain how the workflow works for us to set the API. I would suggest something like this file (https://github.com/skops-dev/skops/blob/main/examples/plot_hf_hub.py) instead of a notebook, which can be converted into a notebook.

adrinjalali avatar Jun 07 '22 13:06 adrinjalali

How should a library integration developer extend the modelcard here to fit their usecase?

They would write their own template and use the ModelCard.from_template(..., template_path='path/to/their/template'). (CC @merveenoyan if you wanna give this a shot at some point)


Totally agree this workflow should be included in the examples. I would love to have both a script and a notebook, because I want it to be easy for folks to open the example in colab, as its easier to convince them to check out the code.

There's tons of tools to do this, I am not sure which is best/easiest for this use case. Do you have any suggestions @adrinjalali ? (Also CC my friend @borda who I know has done a lot of this).

nateraw avatar Jun 07 '22 17:06 nateraw

Just an update - this workflow is now in both the README and the colab walkthrough 😄

nateraw avatar Jun 07 '22 21:06 nateraw

I was a bit busy with the Keras sprint, sorry 😅 I find the UX of passing metadata like this a bit too much (I can't explain what I don't like about it though 😂) link to line in context. It's just an opinion though. This looks good to me. (the template argument)

merveenoyan avatar Jun 08 '22 14:06 merveenoyan

Nevertheless, I'll implement this. Feel free to assign to me.

merveenoyan avatar Jun 08 '22 14:06 merveenoyan

@merveenoyan let's revisit this for Keras in huggingface_hub maybe a little later down the line once we're really happy with the default card. Then you can copy paste that + just update it to have anything Keras specific you might want (i.e. model plot at end)

nateraw avatar Jun 09 '22 19:06 nateraw

I'm not familiar with those tools @nateraw . But what I wanted as a resolution to this issue is not specific to any framework, so implementing the keras modelcard would be a separate issue to me.

This is about documenting what a third party developer would need to read in order to write a new template, and fill the template (and figure out if there are changes we need to implement on this library's side)

adrinjalali avatar Jun 10 '22 09:06 adrinjalali

@adrinjalali For sure - Did you read the updates in the README/Walkthrough? I think once default model card is in good place, we suggest people copy paste it and update as needed, then pass it to template_path when creating new card.

nateraw avatar Jun 10 '22 18:06 nateraw

We should have model examination part (or have a better naming for this part 😅 @mgerchick @EziOzoani) on the model card.

  • The common things we can extract from models across all frameworks are hyperparameters and nothing more.
  • If it's a decision tree or neural network based model, we can have plot of the graph.
  • If it's a neural network model with recorded history, we can have plot of history on model card. (also it would be good to enable users to push tensorboard traces at some point)
  • If it's a statistical model and if user is willing to provide predictions and test data (or maybe cross validation results?), we have different things to be visualized:
  • AUC/ROC, confusion matrix for classification, a classification report (which can also be applied for neural network models though)
  • R squared and MSE for regression
  • Visual clusters and homogenity score for clustering
  • Decision boundaries of models
  • Visualization of pipeline if it's a scikit-learn model
  • Feature importances 😏 @adrinjalali

I say we should initially support following libraries:

  • Tensorflow/Keras (which model card is provided at the moment already with Hub mixin)
  • PyTorch
  • Transformers (which is supported through Keras and PyTorch and currently exists inside transformers library)
  • Scikit-learn
  • fast.ai (which is supported through PyTorch)

I thought of writing a couple of functions that would extract the above information programmatically and add it to the jinja template. I will write the functions and populate them in a notebook that when template is approved, we can add these functionalities.

merveenoyan avatar Jun 17 '22 12:06 merveenoyan

I would strongly advise against feature importances, they have many biases which are addressed in other methods. We can instead make it easy for users to include PDP and ICE plots: https://scikit-learn.org/stable/modules/partial_dependence.html#partial-dependence and permutation feature importance: https://scikit-learn.org/stable/modules/permutation_importance.html#permutation-importance if it's feasible for the user.

We should also make it easy for users to include metrics sliced by features which are relevant in the dataset, like here: https://fairlearn.org/v0.7.0/user_guide/assessment.html#metrics

adrinjalali avatar Jun 20 '22 07:06 adrinjalali

@nateraw would you like me to open the PRs to existing ones in HF Hub client library after merger?

merveenoyan avatar Jul 06 '22 12:07 merveenoyan

That sounds like a good idea, yea!

nateraw avatar Jul 06 '22 13:07 nateraw