evaluator
The Evaluator provides a comprehensive framework for testing AI models againstcustom evaluations, analyzing failures, and iteratively refining system prompts
to improve performance. It supports parallel execution across multiple models,
automatic prompt optimization, and detailed reporting with telemetry integration.
Key features:
- Run evaluations across multiple AI models in parallel
- Automatically analyze failures and generate improved system prompts
- Export results to CSV format for further analysis
- Compare evaluation results between different runs
- Integrated with Dagger's telemetry for detailed tracing
More info: https://dagger.io/blog/evals-as-code
Example (RunMyEvals)
no available example in current language
// Run the Dagger evals across the major model providers.
func (dev *Examples) Evaluator_RunMyEvals(
ctx context.Context,
// Run particular evals, or all evals if unspecified.
// +optional
evals []string,
// Run particular models, or all models if unspecified.
// +optional
models []string,
) error {
myEvaluator := dag.Evaluator().
WithDocsFile(dev.Source.File("core/llm_docs.md")).
WithoutDefaultSystemPrompt().
WithSystemPromptFile(dev.Source.File("core/llm_dagger_prompt.md")).
WithEvals([]*dagger.EvaluatorEval{
// FIXME: ideally this list would live closer to where the evals are
// defined, but it's not possible for a module to return an interface type
// https://github.com/dagger/dagger/issues/7582
dag.Evals().Basic().AsEvaluatorEval(),
dag.Evals().BuildMulti().AsEvaluatorEval(),
dag.Evals().BuildMultiNoVar().AsEvaluatorEval(),
dag.Evals().WorkspacePattern().AsEvaluatorEval(),
dag.Evals().ReadImplicitVars().AsEvaluatorEval(),
dag.Evals().UndoChanges().AsEvaluatorEval(),
dag.Evals().CoreAPI().AsEvaluatorEval(),
dag.Evals().ModuleDependencies().AsEvaluatorEval(),
dag.Evals().Responses().AsEvaluatorEval(),
})
return myEvaluator.
EvalsAcrossModels(dagger.EvaluatorEvalsAcrossModelsOpts{
Evals: evals,
Models: models,
}).
Check(ctx)
}
no available example in current language
no available example in current language
Installation
dagger install github.com/AmirulAndalib/dagger/modules/evaluator@09b87500ac9883eda96cc38470ce22a084e235ab
Entrypoint
Return Type
Evaluator !
Arguments
Name | Type | Default Value | Description |
---|---|---|---|
model | String | - | The AI model name to use for the evaluator agent (e.g., "gpt-4o", "claude-sonnet-4-0"). If not specified, uses the default model configured in the environment. |
Example
dagger -m github.com/AmirulAndalib/dagger/modules/evaluator@09b87500ac9883eda96cc38470ce22a084e235ab call \
func (m *MyModule) Example() *dagger.Evaluator {
return dag.
Evaluator()
}
@function
def example() -> dagger.Evaluator:
return (
dag.evaluator()
)
@func()
example(): Evaluator {
return dag
.evaluator()
}
Types
Evaluator 🔗
docs() 🔗
The documentation that defines expected model behavior and serves as the reference for evaluations.
Return Type
File !
Example
dagger -m github.com/AmirulAndalib/dagger/modules/evaluator@09b87500ac9883eda96cc38470ce22a084e235ab call \
docs
func (m *MyModule) Example() *dagger.File {
return dag.
Evaluator().
Docs()
}
@function
def example() -> dagger.File:
return (
dag.evaluator()
.docs()
)
@func()
example(): File {
return dag
.evaluator()
.docs()
}
systemPrompt() 🔗
A system prompt file that will be applied to all evaluations to provide consistent guidance.
Return Type
File !
Example
dagger -m github.com/AmirulAndalib/dagger/modules/evaluator@09b87500ac9883eda96cc38470ce22a084e235ab call \
system-prompt
func (m *MyModule) Example() *dagger.File {
return dag.
Evaluator().
SystemPrompt()
}
@function
def example() -> dagger.File:
return (
dag.evaluator()
.system_prompt()
)
@func()
example(): File {
return dag
.evaluator()
.systemPrompt()
}
disableDefaultSystemPrompt() 🔗
Whether to disable Dagger’s built-in default system prompt (usually not recommended).
Return Type
Boolean !
Example
dagger -m github.com/AmirulAndalib/dagger/modules/evaluator@09b87500ac9883eda96cc38470ce22a084e235ab call \
disable-default-system-prompt
func (m *MyModule) Example(ctx context.Context) bool {
return dag.
Evaluator().
DisableDefaultSystemPrompt(ctx)
}
@function
async def example() -> bool:
return await (
dag.evaluator()
.disable_default_system_prompt()
)
@func()
async example(): Promise<boolean> {
return dag
.evaluator()
.disableDefaultSystemPrompt()
}
evaluatorModel() 🔗
The AI model to use for the evaluator agent that performs analysis and prompt generation.
Return Type
String !
Example
dagger -m github.com/AmirulAndalib/dagger/modules/evaluator@09b87500ac9883eda96cc38470ce22a084e235ab call \
evaluator-model
func (m *MyModule) Example(ctx context.Context) string {
return dag.
Evaluator().
EvaluatorModel(ctx)
}
@function
async def example() -> str:
return await (
dag.evaluator()
.evaluator_model()
)
@func()
async example(): Promise<string> {
return dag
.evaluator()
.evaluatorModel()
}
withSystemPrompt() 🔗
Set a system prompt to be provided to all evaluations.
The system prompt provides foundational instructions and context that will be applied to every evaluation run. This helps ensure consistent behavior across all models and evaluations.
Return Type
Evaluator !
Arguments
Name | Type | Default Value | Description |
---|---|---|---|
prompt | String ! | - | The system prompt text to use for all evaluations. |
Example
dagger -m github.com/AmirulAndalib/dagger/modules/evaluator@09b87500ac9883eda96cc38470ce22a084e235ab call \
with-system-prompt --prompt string
func (m *MyModule) Example(prompt string) *dagger.Evaluator {
return dag.
Evaluator().
WithSystemPrompt(prompt)
}
@function
def example(prompt: str) -> dagger.Evaluator:
return (
dag.evaluator()
.with_system_prompt(prompt)
)
@func()
example(prompt: string): Evaluator {
return dag
.evaluator()
.withSystemPrompt(prompt)
}
withSystemPromptFile() 🔗
Set a system prompt from a file to be provided to all evaluations.
This allows you to load a system prompt from an external file, which is useful for managing longer prompts or when the prompt content is maintained separately from your code.
Return Type
Evaluator !
Arguments
Name | Type | Default Value | Description |
---|---|---|---|
file | File ! | - | The file containing the system prompt to use for all evaluations. |
Example
dagger -m github.com/AmirulAndalib/dagger/modules/evaluator@09b87500ac9883eda96cc38470ce22a084e235ab call \
with-system-prompt-file --file file:path
func (m *MyModule) Example(file *dagger.File) *dagger.Evaluator {
return dag.
Evaluator().
WithSystemPromptFile(file)
}
@function
def example(file: dagger.File) -> dagger.Evaluator:
return (
dag.evaluator()
.with_system_prompt_file(file)
)
@func()
example(file: File): Evaluator {
return dag
.evaluator()
.withSystemPromptFile(file)
}
withoutDefaultSystemPrompt() 🔗
Disable Dagger’s built-in system prompt.
You probably don’t need to use this - Dagger’s system prompt provides the fundamentals for how the agent interacts with Dagger objects. This is primarily exposed so that we (Dagger) can iteratively test the default system prompt itself.
Return Type
Evaluator !
Example
dagger -m github.com/AmirulAndalib/dagger/modules/evaluator@09b87500ac9883eda96cc38470ce22a084e235ab call \
without-default-system-prompt
func (m *MyModule) Example() *dagger.Evaluator {
return dag.
Evaluator().
WithoutDefaultSystemPrompt()
}
@function
def example() -> dagger.Evaluator:
return (
dag.evaluator()
.without_default_system_prompt()
)
@func()
example(): Evaluator {
return dag
.evaluator()
.withoutDefaultSystemPrompt()
}
withDocs() 🔗
Set the documentation content that the system prompt should enforce.
This documentation serves as the reference material that evaluations will test against. The system prompt should guide the model to follow the principles and patterns defined in this documentation.
Return Type
Evaluator !
Arguments
Name | Type | Default Value | Description |
---|---|---|---|
prompt | String ! | - | The documentation content as a string. |
Example
dagger -m github.com/AmirulAndalib/dagger/modules/evaluator@09b87500ac9883eda96cc38470ce22a084e235ab call \
with-docs --prompt string
func (m *MyModule) Example(prompt string) *dagger.Evaluator {
return dag.
Evaluator().
WithDocs(prompt)
}
@function
def example(prompt: str) -> dagger.Evaluator:
return (
dag.evaluator()
.with_docs(prompt)
)
@func()
example(prompt: string): Evaluator {
return dag
.evaluator()
.withDocs(prompt)
}
withDocsFile() 🔗
Set the documentation file that the system prompt should enforce.
This allows you to load documentation from an external file. The documentation serves as the reference material for what behavior the evaluations should test, and the system prompt should guide the model to follow these principles.
Return Type
Evaluator !
Arguments
Name | Type | Default Value | Description |
---|---|---|---|
file | File ! | - | The file containing the documentation to reference. |
Example
dagger -m github.com/AmirulAndalib/dagger/modules/evaluator@09b87500ac9883eda96cc38470ce22a084e235ab call \
with-docs-file --file file:path
func (m *MyModule) Example(file *dagger.File) *dagger.Evaluator {
return dag.
Evaluator().
WithDocsFile(file)
}
@function
def example(file: dagger.File) -> dagger.Evaluator:
return (
dag.evaluator()
.with_docs_file(file)
)
@func()
example(file: File): Evaluator {
return dag
.evaluator()
.withDocsFile(file)
}
withEval() 🔗
WithEval adds a single evaluation to the evaluator.
Return Type
Evaluator !
Arguments
Name | Type | Default Value | Description |
---|---|---|---|
eval | Interface ! | - | The evaluation to add to the list of evals to run. |
Example
dagger -m github.com/AmirulAndalib/dagger/modules/evaluator@09b87500ac9883eda96cc38470ce22a084e235ab call \
with-eval
func (m *MyModule) Example(eval ) *dagger.Evaluator {
return dag.
Evaluator().
WithEval(eval)
}
@function
def example(eval: ) -> dagger.Evaluator:
return (
dag.evaluator()
.with_eval(eval)
)
@func()
example(eval: ): Evaluator {
return dag
.evaluator()
.withEval(eval)
}
withEvals() 🔗
WithEvals adds multiple evaluations to the evaluator.
Return Type
Evaluator !
Arguments
Name | Type | Default Value | Description |
---|---|---|---|
evals | [Interface ! ] ! | - | The list of evaluations to add to the evaluator. |
Example
dagger -m github.com/AmirulAndalib/dagger/modules/evaluator@09b87500ac9883eda96cc38470ce22a084e235ab call \
with-evals
func (m *MyModule) Example(evals []) *dagger.Evaluator {
return dag.
Evaluator().
WithEvals(evals)
}
@function
def example(evals: List[]) -> dagger.Evaluator:
return (
dag.evaluator()
.with_evals(evals)
)
@func()
example(evals: []): Evaluator {
return dag
.evaluator()
.withEvals(evals)
}
evalsAcrossModels() 🔗
Run evals across models.
Models run in parallel, and evals run in series, with all attempts in parallel.
Return Type
EvalsAcrossModels !
Arguments
Name | Type | Default Value | Description |
---|---|---|---|
evals | [String ! ] | - | Evals to run. Defaults to all. |
models | [String ! ] | - | Models to run evals across. Defaults to all. |
attempts | Integer | - | Attempts to run each eval. Defaults to a per-provider value. |
Example
dagger -m github.com/AmirulAndalib/dagger/modules/evaluator@09b87500ac9883eda96cc38470ce22a084e235ab call \
evals-across-models
func (m *MyModule) Example() *dagger.EvaluatorEvalsAcrossModels {
return dag.
Evaluator().
EvalsAcrossModels()
}
@function
def example() -> dagger.EvaluatorEvalsAcrossModels:
return (
dag.evaluator()
.evals_across_models()
)
@func()
example(): EvaluatorEvalsAcrossModels {
return dag
.evaluator()
.evalsAcrossModels()
}
explore() 🔗
Explore evaluations across models to identify patterns and issues.
This function uses an LLM agent to act as a quality assurance engineer, automatically running evaluations across different models and identifying interesting patterns. It focuses on finding evaluations that work on some models but fail on others, helping to identify model-specific weaknesses or strengths.
The agent will avoid re-running evaluations that fail consistently across all models, but will retry evaluations that show partial success to gather more insights.
Return Type
[String ! ] !
Example
dagger -m github.com/AmirulAndalib/dagger/modules/evaluator@09b87500ac9883eda96cc38470ce22a084e235ab call \
explore
func (m *MyModule) Example(ctx context.Context) []string {
return dag.
Evaluator().
Explore(ctx)
}
@function
async def example() -> List[str]:
return await (
dag.evaluator()
.explore()
)
@func()
async example(): Promise<string[]> {
return dag
.evaluator()
.explore()
}
generateSystemPrompt() 🔗
Generate a new system prompt based on the provided documentation.
This function uses an LLM to analyze the documentation and generate a system prompt that captures the key rules and principles. The process involves first interpreting the documentation to extract all inferable rules, then crafting a focused system prompt that provides proper framing without being overly verbose or turning into meaningless word salad.
The generated prompt aims to establish foundation and context while allowing the model flexibility to apply the guidelines appropriately.
Return Type
String !
Example
dagger -m github.com/AmirulAndalib/dagger/modules/evaluator@09b87500ac9883eda96cc38470ce22a084e235ab call \
generate-system-prompt
func (m *MyModule) Example(ctx context.Context) string {
return dag.
Evaluator().
GenerateSystemPrompt(ctx)
}
@function
async def example() -> str:
return await (
dag.evaluator()
.generate_system_prompt()
)
@func()
async example(): Promise<string> {
return dag
.evaluator()
.generateSystemPrompt()
}
iterate() 🔗
Iterate runs all evals across all models in a loop until all of the evals succeed, analyzing the failures and generating a new system prompt to course-correct.
Return Type
String !
Example
dagger -m github.com/AmirulAndalib/dagger/modules/evaluator@09b87500ac9883eda96cc38470ce22a084e235ab call \
iterate
func (m *MyModule) Example(ctx context.Context) string {
return dag.
Evaluator().
Iterate(ctx)
}
@function
async def example() -> str:
return await (
dag.evaluator()
.iterate()
)
@func()
async example(): Promise<string> {
return dag
.evaluator()
.iterate()
}
compare() 🔗
Compare two CSV evaluation reports and generate an analysis.
This function takes two CSV files containing evaluation results (typically from different runs or with different system prompts) and generates a detailed comparison report. The comparison includes success rate changes, token usage differences, and trace links for debugging.
The generated report is analyzed by an LLM to provide insights into the differences and their potential causes.
Return Type
String !
Arguments
Name | Type | Default Value | Description |
---|---|---|---|
before | File ! | - | The CSV file containing the baseline evaluation results. |
after | File ! | - | The CSV file containing the new evaluation results to compare against. |
Example
dagger -m github.com/AmirulAndalib/dagger/modules/evaluator@09b87500ac9883eda96cc38470ce22a084e235ab call \
compare --before file:path --after file:path
func (m *MyModule) Example(ctx context.Context, before *dagger.File, after *dagger.File) string {
return dag.
Evaluator().
Compare(ctx, before, after)
}
@function
async def example(before: dagger.File, after: dagger.File) -> str:
return await (
dag.evaluator()
.compare(before, after)
)
@func()
async example(before: File, after: File): Promise<string> {
return dag
.evaluator()
.compare(before, after)
}
EvalsAcrossModels 🔗
EvalsAcrossModels represents the results of running evaluations across multiple models.
traceId() 🔗
Return Type
String !
Example
dagger -m github.com/AmirulAndalib/dagger/modules/evaluator@09b87500ac9883eda96cc38470ce22a084e235ab call \
evals-across-models \
trace-id
func (m *MyModule) Example(ctx context.Context) string {
return dag.
Evaluator().
EvalsAcrossModels().
TraceId(ctx)
}
@function
async def example() -> str:
return await (
dag.evaluator()
.evals_across_models()
.trace_id()
)
@func()
async example(): Promise<string> {
return dag
.evaluator()
.evalsAcrossModels()
.traceId()
}
modelResults() 🔗
Return Type
[ModelResult ! ] !
Example
dagger -m github.com/AmirulAndalib/dagger/modules/evaluator@09b87500ac9883eda96cc38470ce22a084e235ab call \
evals-across-models \
model-results
func (m *MyModule) Example() []*dagger.EvaluatorModelResult {
return dag.
Evaluator().
EvalsAcrossModels().
ModelResults()
}
@function
def example() -> List[dagger.EvaluatorModelResult]:
return (
dag.evaluator()
.evals_across_models()
.model_results()
)
@func()
example(): EvaluatorModelResult[] {
return dag
.evaluator()
.evalsAcrossModels()
.modelResults()
}
check() 🔗
Return Type
Void !
Example
dagger -m github.com/AmirulAndalib/dagger/modules/evaluator@09b87500ac9883eda96cc38470ce22a084e235ab call \
evals-across-models \
check
func (m *MyModule) Example(ctx context.Context) {
return dag.
Evaluator().
EvalsAcrossModels().
Check(ctx)
}
@function
async def example() -> None:
return await (
dag.evaluator()
.evals_across_models()
.check()
)
@func()
async example(): Promise<void> {
return dag
.evaluator()
.evalsAcrossModels()
.check()
}
analyzeAndGenerateSystemPrompt() 🔗
AnalyzeAndGenerateSystemPrompt performs comprehensive failure analysis and generates an improved system prompt.
This function implements a sophisticated multi-stage analysis process:
Report Generation: Collects all evaluation reports from different models and organizes them for analysis, providing a comprehensive view of successes and failures.
Initial Analysis: Generates a summary of current understanding, grading overall results and focusing on failure patterns. Uses specific examples from reports to support the analysis.
Cross-Reference Analysis: Compares the analysis against the original documentation and system prompt, suggesting improvements without over-specializing for specific evaluations. Focuses on deeper, systemic issues rather than superficial fixes.
Success Pattern Analysis: Compares successful results with failed ones to identify what made the successful cases work. Extracts generalizable principles from the documentation and prompts that led to success.
Prompt Generation: Creates a new system prompt incorporating all insights, focusing on incremental improvements rather than complete rewrites unless absolutely necessary.
The process emphasizes finding general, root-cause issues over specific evaluation failures, ensuring that improvements help broadly rather than just fixing individual test cases.
Return Type
String !
Example
dagger -m github.com/AmirulAndalib/dagger/modules/evaluator@09b87500ac9883eda96cc38470ce22a084e235ab call \
evals-across-models \
analyze-and-generate-system-prompt
func (m *MyModule) Example(ctx context.Context) string {
return dag.
Evaluator().
EvalsAcrossModels().
AnalyzeAndGenerateSystemPrompt(ctx)
}
@function
async def example() -> str:
return await (
dag.evaluator()
.evals_across_models()
.analyze_and_generate_system_prompt()
)
@func()
async example(): Promise<string> {
return dag
.evaluator()
.evalsAcrossModels()
.analyzeAndGenerateSystemPrompt()
}
csv() 🔗
CSV exports evaluation results to CSV format for analysis and comparison.
This function generates a CSV representation of all evaluation results across models, including performance metrics, token usage, and trace information for debugging. The CSV includes the following columns:
- model: The name of the AI model tested
- eval: The name of the evaluation that was run
- input_tokens: Number of input tokens used
- output_tokens: Number of output tokens generated
- total_attempts: Total number of evaluation attempts made
- success_rate: Success rate as a decimal (0.0 to 1.0)
- trace_id: Unique identifier for the trace
- model_span_id: Span ID for the model execution
- eval_span_id: Span ID for the specific evaluation
The CSV format makes it easy to import results into spreadsheet applications, databases, or data analysis tools for further processing.
Return Type
String !
Arguments
Name | Type | Default Value | Description |
---|---|---|---|
noHeader | Boolean ! | false | Don't include a header row in the CSV output. |
Example
dagger -m github.com/AmirulAndalib/dagger/modules/evaluator@09b87500ac9883eda96cc38470ce22a084e235ab call \
evals-across-models \
csv --no-header boolean
func (m *MyModule) Example(ctx context.Context, noHeader bool) string {
return dag.
Evaluator().
EvalsAcrossModels().
Csv(ctx, noHeader)
}
@function
async def example(no_header: bool) -> str:
return await (
dag.evaluator()
.evals_across_models()
.csv(no_header)
)
@func()
async example(noHeader: boolean): Promise<string> {
return dag
.evaluator()
.evalsAcrossModels()
.csv(noHeader)
}
ModelResult 🔗
ModelResult represents the evaluation results for a single model.
modelName() 🔗
Return Type
String !
Example
Function EvaluatorModelResult.modelName is not accessible from the evaluator module
Function EvaluatorModelResult.modelName is not accessible from the evaluator module
Function EvaluatorModelResult.modelName is not accessible from the evaluator module
Function EvaluatorModelResult.modelName is not accessible from the evaluator module
spanId() 🔗
Return Type
String !
Example
Function EvaluatorModelResult.spanId is not accessible from the evaluator module
Function EvaluatorModelResult.spanId is not accessible from the evaluator module
Function EvaluatorModelResult.spanId is not accessible from the evaluator module
Function EvaluatorModelResult.spanId is not accessible from the evaluator module
evalReports() 🔗
Return Type
[EvalResult ! ] !
Example
Function EvaluatorModelResult.evalReports is not accessible from the evaluator module
Function EvaluatorModelResult.evalReports is not accessible from the evaluator module
Function EvaluatorModelResult.evalReports is not accessible from the evaluator module
Function EvaluatorModelResult.evalReports is not accessible from the evaluator module
check() 🔗
Return Type
Void !
Example
Function EvaluatorModelResult.check is not accessible from the evaluator module
Function EvaluatorModelResult.check is not accessible from the evaluator module
Function EvaluatorModelResult.check is not accessible from the evaluator module
Function EvaluatorModelResult.check is not accessible from the evaluator module
EvalResult 🔗
EvalResult represents the results of a single evaluation.
name() 🔗
Return Type
String !
Example
Function EvaluatorEvalResult.name is not accessible from the evaluator module
Function EvaluatorEvalResult.name is not accessible from the evaluator module
Function EvaluatorEvalResult.name is not accessible from the evaluator module
Function EvaluatorEvalResult.name is not accessible from the evaluator module
spanId() 🔗
Return Type
String !
Example
Function EvaluatorEvalResult.spanId is not accessible from the evaluator module
Function EvaluatorEvalResult.spanId is not accessible from the evaluator module
Function EvaluatorEvalResult.spanId is not accessible from the evaluator module
Function EvaluatorEvalResult.spanId is not accessible from the evaluator module
error() 🔗
Return Type
String !
Example
Function EvaluatorEvalResult.error is not accessible from the evaluator module
Function EvaluatorEvalResult.error is not accessible from the evaluator module
Function EvaluatorEvalResult.error is not accessible from the evaluator module
Function EvaluatorEvalResult.error is not accessible from the evaluator module
report() 🔗
Return Type
String !
Example
Function EvaluatorEvalResult.report is not accessible from the evaluator module
Function EvaluatorEvalResult.report is not accessible from the evaluator module
Function EvaluatorEvalResult.report is not accessible from the evaluator module
Function EvaluatorEvalResult.report is not accessible from the evaluator module
successRate() 🔗
Return Type
Float !
Example
Function EvaluatorEvalResult.successRate is not accessible from the evaluator module
Function EvaluatorEvalResult.successRate is not accessible from the evaluator module
Function EvaluatorEvalResult.successRate is not accessible from the evaluator module
Function EvaluatorEvalResult.successRate is not accessible from the evaluator module
totalAttempts() 🔗
Return Type
Integer !
Example
Function EvaluatorEvalResult.totalAttempts is not accessible from the evaluator module
Function EvaluatorEvalResult.totalAttempts is not accessible from the evaluator module
Function EvaluatorEvalResult.totalAttempts is not accessible from the evaluator module
Function EvaluatorEvalResult.totalAttempts is not accessible from the evaluator module
inputTokens() 🔗
Return Type
Integer !
Example
Function EvaluatorEvalResult.inputTokens is not accessible from the evaluator module
Function EvaluatorEvalResult.inputTokens is not accessible from the evaluator module
Function EvaluatorEvalResult.inputTokens is not accessible from the evaluator module
Function EvaluatorEvalResult.inputTokens is not accessible from the evaluator module
outputTokens() 🔗
Return Type
Integer !
Example
Function EvaluatorEvalResult.outputTokens is not accessible from the evaluator module
Function EvaluatorEvalResult.outputTokens is not accessible from the evaluator module
Function EvaluatorEvalResult.outputTokens is not accessible from the evaluator module
Function EvaluatorEvalResult.outputTokens is not accessible from the evaluator module
check() 🔗
Return Type
Void !
Example
Function EvaluatorEvalResult.check is not accessible from the evaluator module
Function EvaluatorEvalResult.check is not accessible from the evaluator module
Function EvaluatorEvalResult.check is not accessible from the evaluator module
Function EvaluatorEvalResult.check is not accessible from the evaluator module