evaluate
evaluate copied to clipboard
Add missing metrics
As per @douwekiela's suggestion, we should find the blind spots that we have in terms of missing metrics, especially from domains like speech recognition and computer vision.
Suggestions are welcome below!
We should probably look into GAN metrics as well, like Kernel Inception Distance (KID), Inception Score (IS) and Fréchet Inception Distance (FID) (maybe we should let people import them directly from a library like Torch Fidelity?
How about RL metrics? e.g. https://analyticsindiamag.com/metrics-for-reinforcement-learning/
Computer vision metrics: SSIM PSNR
There are various object detection metrics implemented by Tensorflow
Also listed a few libraries in #11 e.g. NetworkX for graph metrics.
For image generation, also consider LPIPS as well as unpaired metrics such as NIQE, PIQE, Brisque, SR-Metric
This is also an interesting method/library for evaluating text generation https://github.com/neulab/BARTScore