optimum-intel
optimum-intel copied to clipboard
Add JPQD evaluation notebook
Add JPQD evaluation notebook. Since JPQD QA takes about 12 hours to train, it doesn't make sense to do it in a notebook (if the browser crashes or the computer goes to sleep, training would stop). So I just refer to the example and use the notebook to evaluate the model.
This makes the notebook similar to the PTQ QA notebook. I thought about removing duplication but I think duplication in examples is not so bad, at least for now. It's nice that examples are standalone.
Since JPQD starts from a plain bert-base-uncased model I finetuned a bert-base-uncased model following the transformers run_qa.py example to compare performance.
Instead of making this a JPQD specific notebook, it could make more sense to make it a generic QA INT8 evaluation notebook, but on the other hand, it's an example, people can surely change it for similar purposes, and it's nice to promote JPQD.
TODO: the intro text at the top needs to explain a bit more about JPQD.
Colab link: https://colab.research.google.com/github/helena-intel/optimum-intel/blob/jpqd-notebook/notebooks/openvino/question_answering_quantization_jpqd.ipynb (performance is probably bad on Colab because there is no AVX512/VNNI).
@vuiseng9
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.
I think that it could be more useful if we can show the performance and accuracy trade-offs for three models:
- Original Transformer model (fp32)
- Quantized model (PTQ/QAT)
- Pruned and quantized (JPQD, distillation is an auxiliary method here)
@yujiepan-work and @vuiseng9 implemented very nice lightweight tests for JPQD training. 9 epochs take just a few seconds on a single card. I'd reuse them for this notebook. https://github.com/openvinotoolkit/nncf/blob/develop/tests/torch/sparsity/movement/test_training.py#L237
if we need a very good accuracy/performance results, there are longer tests to consider: https://github.com/openvinotoolkit/nncf/blob/develop/tests/torch/sparsity/movement/test_training.py#L318 If I am not mistaken, it takes minutes. Probably, @yujiepan-work could say the exact time.