evals

openai / evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

11.5k stargazers
70 open issues
2.2k forks
Loading...