langtest icon indicating copy to clipboard operation
langtest copied to clipboard

Preparing LLM Benchmark Table ( LangTest)

Open ArshaanNazir opened this issue 1 year ago • 9 comments

ArshaanNazir avatar Dec 27 '23 17:12 ArshaanNazir

@ArshaanNazir please add yourself as an assignee to the task

JustHeroo avatar Dec 28 '23 08:12 JustHeroo

@ArshaanNazir any updates?

JustHeroo avatar Jan 03 '24 05:01 JustHeroo

We are working on it. Here is the link to the tracking sheet: https://johnsnowlabs-my.sharepoint.com/:x:/p/rakshit/ETX1Z44PipFOqm8Ue8Av3_UBycHH_9oK-oJJUpQfc_n54w?e=exe0Ja

ArshaanNazir avatar Jan 03 '24 05:01 ArshaanNazir

@ArshaanNazir did we publish any benchmark (LLM and embeddings) on the LangTest web site?

muhammetsnts avatar Jan 30 '24 10:01 muhammetsnts

We have created the streamlit apps for both of the benchmark tables. We are finalising their design and will be update on website by end of this week.

ArshaanNazir avatar Jan 30 '24 11:01 ArshaanNazir

@ArshaanNazir @vkocaman We have created a new folder for the langtest demos

https://github.com/JohnSnowLabs/streamlit-demo-apps/tree/master/langtest

do you need anything else?

Cabir40 avatar Feb 05 '24 10:02 Cabir40

I am not sure if we are going ahead with the streamlit apps now. @dcecchini can you confirm ?

ArshaanNazir avatar Feb 05 '24 10:02 ArshaanNazir

Hi @Cabir40 @ArshaanNazir @muhammetsnts @JustHeroo, we started creating the streamlit apps for the leaderboards but @vkocaman suggested to ask the design team to build them using web tools that look better.

They are preparing them; you can check a draft at in this link.

In the meantime, we are reviewing the information to be contained on the pages, as we need to make sure that the leaderboards show all the relevant information (adding more filters, improving the visualization, creating more data with benchmark results, etc.).

dcecchini avatar Feb 05 '24 11:02 dcecchini

I understand why you would want a more attractive web app. I was hoping for a streamlit app -- simply because I am looking for an LLM leaderboard in a box that I could deploy to enterprise clients.

rajshah4 avatar Feb 05 '24 13:02 rajshah4