frechet-audio-distance icon indicating copy to clipboard operation
frechet-audio-distance copied to clipboard

Improvement suggestions

Open ThibaultCastells opened this issue 1 year ago • 3 comments

Hello, Thank you for providing this tool! It's great! If you are interested in improvement suggestions, I would like to suggest the following:

  • When using vggish the model checkpoint is downloaded automatically, but it is not the case with pann. Therefore, I get the following error: FileNotFoundError: [Errno 2] No such file or directory: 'mypath/.cache/torch/hub/Cnn14_16k_mAP%3D0.438.pth' If this is intended, I would suggest to update the readme to explain how to add this model.
  • in the readme, it would be nice to have an indicator of how long it takes to generate the FAD for each method
  • in the readme, more examples of how to use the library would be great (I had to read the code to understand how I can reuse an embedding multiple times in different comparisons)
  • when computing the FAD between a data folder and itself, I do not get exactly 0 (I get a near zero value of -1.1368683772161603e-13 instead). I think it would be better to get exactly 0 in this case.

Again, thank you for your work!

ThibaultCastells avatar Aug 03 '23 09:08 ThibaultCastells

Hi @ThibaultCastells,

Thank you for all your detailed suggestions. My replies as follow:

  1. pann should be loaded if not cached, thanks for raising that. I shall address this soon.
  2. How long it takes to generate the FAD depends on, e.g. CPU / GPU, length & amount of audio. IMO there isn't much motivation to provide a benchmark as it varies across different use case. Also, so far speed is not an issue for my own use cases. But if it is an issue in your case, please feel free to raise it.
  3. I personally care more about the FAD score instead of the embeddings. From what you raised, I think I can detail more on (1) how to extract the embeddings; (2) how to calculate scores from the extracted embeddings, other than the one-liner that we have in README now.
  4. -1e-13 is pretty small to me and passable using e.g. np.allclose. To me getting 0 would be a nice-to-have, and might have a lower priority among the above.

Feel free to raise a PR if you are interested in contributing too. Thanks again!

gudgud96 avatar Aug 03 '23 13:08 gudgud96

Thank you for this quick answer!

How long it takes to generate the FAD depends on, e.g. CPU / GPU, length & amount of audio. IMO there isn't much motivation to provide a benchmark as it varies across different use case. Also, so far speed is not an issue for my own use cases. But if it is an issue in your case, please feel free to raise it.

I made this suggestion because it seemed to me that pann takes much longer, so what I meant is not to make a benchmark but to give an idea of the time difference between the 2 methods.

I personally care more about the FAD score instead of the embeddings. From what you raised, I think I can detail more on (1) how to extract the embeddings; (2) how to calculate scores from the extracted embeddings, other than the one-liner that we have in README

I made this suggestion because in my case I need to compare generative models output to a baseline model output. I am assuming people would use this tool for research purpose, so this must be a common situation.

Also, I wanted to mention that none of those are real 'issues' to me, I am able to use your tool and am really happy with it. It was simple improvement suggestions to improve its usability, because I think your work deserves to become the go-to FAD tool :)

ThibaultCastells avatar Aug 04 '23 01:08 ThibaultCastells

  1. Got your point of view, thanks for raising that. I think it is useful to showcase latency difference between models, I shall mark that down too.
  2. Sure, I also got feedback about using only the embedding extraction part. As mentioned above, I shall provide more details on that.

Thanks again for your kind words and these very insightful suggestions. I was constrained by my own view and use case, and am glad to understand other perspectives of using this tool. Again, thanks for your contribution!

gudgud96 avatar Aug 04 '23 03:08 gudgud96