evals topic
                        List
                        evals repositories
                    
                agentops
                            
                                3.8k
                            
                            
                        
                        Stars
                    
                            
                                336
                            
                            
                        
                        Forks
                    
                            
                                42
                            
                            
                        
                        Watchers
                    Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including OpenAI Agents SDK, CrewAI, Langchain, Autogen, AG2, and CamelAI
langfuse
                            
                                14.4k
                            
                            
                        
                        Stars
                    
                            
                                1.3k
                            
                            
                        
                        Forks
                    
                            
                                44
                            
                            
                        
                        Watchers
                    🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
vivaria
                            
                                53
                            
                            
                        
                        Stars
                    
                            
                                15
                            
                            
                        
                        Forks
                    Watchers
                    Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
rag-evaluator
                            
                                21
                            
                            
                        
                        Stars
                    
                            
                                13
                            
                            
                        
                        Forks
                    Watchers
                    A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).