langchain-benchmarks
langchain-benchmarks copied to clipboard

→

Metadata

🦜💯 Flex those feathers!

Reame
Issues

Results 22 langchain-benchmarks issues

Sort by recently updated

Test some other models

Whenever we're ready with tool calling ```python # ("fireworks-firefunction-v1", ChatFireworks(model="accounts/fireworks/models/firefunction-v1", temperature=0)), # ("cohere-command-light", ChatCohere(temperature=0, model="command-light")), # ("cohere-command", ChatCohere(temperature=0, model="command")), # ("cohere-command-r", ChatCohere(temperature=0, model="command-r")), # ("cohere-command-r-plus", ChatCohere(temperature=0, model="command-r-plus")), # ("mistral-large-2402", ChatMistralAI(model="mistral-large-2402",...

add notebook for custom chain benchmarking on extraction data

1

comment

ConnectionError

3

comment

Hi, I'm trying to run [custom_agent.py](http://localhost:8888/edit/Downloads/langchain-benchmarks-main/csv-qa/custom_agent.py) on my computer, when it comes to this line of code: `chain_results = run_on_dataset( client, dataset_name="Titanic CSV Data", llm_or_chain_factory=get_chain, evaluation=eval_config, )` it generates an...

Is it possible to change the model evaluator?

I see here that code is using the GPT 4 model for the evaluation, since it its the most expensive model out there to run, is it possible to change...

daniellefranca96

Analysis

Evaluate image resolution for multi-modal RAG

[Feature req] Agents: comparing fine-tuning techniques

Common question: I'm fine-tuning for an agent. What split of data should I prioritize collecting, and in what mixture? - Fewer long trajectories? - More short trajectories / single-step function...

Implementation of anthropic agent

1

comment

WIP implementing using our own primitives

Fireworks additional models (WIP)

Additional models

Use `type` rather than `Type`

1

comment

IMO we should use python conventions. If we think this is confusing, we can lower-case the column names in all html tables.

1
2
3
›

About

🦜💯 Flex those feathers!

benchmarking

benchmark-framework

llm

llms

langchain

langchain-python

228

Stars

46

Forks

Watchers

Owner

← Metadata

228

Stars

46

Forks

Watchers

Owner

Metadata

🦜💯 Flex those feathers!