Sahar
Sahar
Follow [these steps](https://colab.research.google.com/github/borisdayma/dalle-mini/blob/main/dev/inference/inference_pipeline.ipynb#scrollTo=lYyOyZGH9dpb) and modify the backend accordingly.
Add the backend the ability to generate rankings using CLIP, then send these back along with the generated images so the user can see the score each image got. This...
* Should look roughly as in the attached screenshot * Should be defined by a boolean parameter isDarkMode 
see https://docs.aws.amazon.com/transcribe/latest/dg/streaming.html https://aws.amazon.com/blogs/developer/transcribe-streaming-sdk-for-python-preview/
Can be implemented either on the client-side or backend. If in the backend, consider using this code in the `listen_print_loop` method. ``` if re.search(r"\b(exit|quit)\b", transcript, re.I): print("Exiting..") break ```
Consider using react-speech-recognition as well
+ ensure it works on Windows and Mac
Logic will be to combine Whisper + pyannote.audio based on timestamps to output something along the lines of: ``` Person A: Hi Person B: Hello, how are you Person A:...