eval-dev-quality icon indicating copy to clipboard operation
eval-dev-quality copied to clipboard

Follow up: Ollama Support

Open bauersimon opened this issue 1 year ago • 0 comments

  • [x] Introduce an "ID" method to the tool interface (like we have for model and provider) so the tools can be addressed deterministically instead of using the BinaryName method, which depends on the OS
  • [ ] Allow to filter by tool IDs in command install-tools and add the following test for all OSes (currently we only test Linux there):
validate(t, &testCase{
	Name: "Filtered",

	Arguments: []string{"symflower"},

	ExpectedInstalledToolNames: []string{
		"symflower" + osutil.BinaryExtension(),
	},
})
  • [x] Make Ollama version dependent. We want to use a minimum version like we do with Symflower. There is surely lot of code that we can share. (latest version is usually also faster!)
  • [ ] how to integration test Ollama in the CI? is there a "dummy" model that always does the same thing?
    • https://github.com/ollama/ollama/blob/1b0e6c9c0e5d53aa6110530da0befab7c95d1755/integration/llm_test.go
    • https://github.com/ollama/ollama/issues/4196
  • [x] use random ports for testing to avoid the synchronization of a single Ollama instance
  • [x] run models that are not pulled yet
    • [x] query available models https://github.com/ollama/ollama/issues/3922
    • [x] download selected models before the evaluation starts
  • [ ] better integration testing
    • [ ] we currently just test with a small model that it does not error, but it would be nicer to have something deterministic https://github.com/ollama/ollama/issues/4196
  • [x] allow to customize the Ollama server port (and host?) and remove the workaround that restricts to running only one test (depending on Ollama) at a time
  • [x] comment why we have a wait delay in the exec util

bauersimon avatar May 08 '24 10:05 bauersimon