multiwoz icon indicating copy to clipboard operation
multiwoz copied to clipboard

Plan for including LLMs' Zero-shot performance?

Open Leezekun opened this issue 1 year ago • 1 comments

Hi,

Thank you for the great work!

Given the current prevalence of Large Language Models (LLMs), are there any plans to include more LLM-based approaches in performance evaluations, especially focusing on zero-shot performance?

Here are a few relevant papers and approaches:

  • Hu, Yushi, et al. "In-context learning for few-shot dialogue state tracking." arXiv preprint arXiv:2203.08568 (2022).
  • Hudeček, Vojtěch, and Ondřej Dušek. "Are LLMs all you need for task-oriented dialogue?" arXiv preprint arXiv:2304.06556 (2023).
  • Heck, Michael, et al. "ChatGPT for zero-shot dialogue state tracking: A solution or an opportunity?" arXiv preprint arXiv:2306.01386 (2023).
  • Chung, Willy, et al. "Instructtods: Large language models for end-to-end task-oriented dialogue systems." arXiv preprint arXiv:2310.08885 (2023).
  • Li, Zekun, et al. "Large Language Models as Zero-shot Dialogue State Trackers through Function Calling." arXiv preprint arXiv:2402.10466 (2024).

Are there any plans to benchmark the performance of LLMs in zero-shot settings? I would be happy to assist with this if needed.

Leezekun avatar Jun 30 '24 22:06 Leezekun

Hi @Leezekun - thanks for posting this. A simple answer is - absolutely! There are numbers of efforts to work in a zero-shot manner. If you are happy to update the benchmarks that would be very helpful!

budzianowski avatar Jul 01 '24 15:07 budzianowski

I've updated it! @budzianowski

https://github.com/budzianowski/multiwoz/pull/136

Please check my pull request. Thanks for great dataset :)

dlwlgus53 avatar Jan 14 '25 17:01 dlwlgus53