harbor-check
No long description provided.
Installation
dagger install github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0Entrypoint
Return Type
HarborCheck !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| ws | Workspace | - | No description provided |
| sourcePath | String | - | No description provided |
| sourceDir | Directory | - | No description provided |
| harborPackage | String | - | No description provided |
| harborExtras | [String ! ] | - | No description provided |
| pythonVersion | String | - | No description provided |
| container | Container | - | No description provided |
| claudeCodeOauthToken | Secret | - | No description provided |
| openrouterApiKey | Secret | - | No description provided |
| codexAccessToken | Secret | - | No description provided |
| miniSweConfig | String | - | No description provided |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
func (m *MyModule) Example() *dagger.HarborCheck {
return dag.
HarborCheck()
}@function
def example() -> dagger.HarborCheck:
return (
dag.harbor_check()
)@func()
example(): HarborCheck {
return dag
.harborCheck()
}Types
HarborCheck 🔗
source() 🔗
The source directory (a task tree or repo), taken from the workspace at
sourcePathin the constructor. Harbor commands run against it; the argument-free checks validate it directly.
Return Type
Directory ! Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
sourcefunc (m *MyModule) Example() *dagger.Directory {
return dag.
HarborCheck().
Source()
}@function
def example() -> dagger.Directory:
return (
dag.harbor_check()
.source()
)@func()
example(): Directory {
return dag
.harborCheck()
.source()
}harborPackage() 🔗
Pip/uv install spec for Harbor itself (pin to a commit for reproducibility).
Return Type
String ! Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
harbor-packagefunc (m *MyModule) Example(ctx context.Context) string {
return dag.
HarborCheck().
HarborPackage(ctx)
}@function
async def example() -> str:
return await (
dag.harbor_check()
.harbor_package()
)@func()
async example(): Promise<string> {
return dag
.harborCheck()
.harborPackage()
}harborExtras() 🔗
Extra packages installed alongside Harbor via
uv pip install.Default mirrors the live pipeline: Claude Agent SDK for agent execution plus Modal SDK for the live
executorRun --executor=modalpath. RewardKit ships inside Harbor as a package (selected via theREWARDKIT_JUDGE=claude-codeenv var), and mini-swe-agent is an agent installed in Harbor (invoked with-a mini-swe-agent) — neither is a separate install, so do not add them here. “Kimi” is just a model string + the mini-swe-agent config YAML in the task repo.
Return Type
[String ! ] ! Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
harbor-extrasfunc (m *MyModule) Example(ctx context.Context) []string {
return dag.
HarborCheck().
HarborExtras(ctx)
}@function
async def example() -> List[str]:
return await (
dag.harbor_check()
.harbor_extras()
)@func()
async example(): Promise<string[]> {
return dag
.harborCheck()
.harborExtras()
}pythonVersion() 🔗
Python version for the default Alpine + uv base.
Return Type
String ! Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
python-versionfunc (m *MyModule) Example(ctx context.Context) string {
return dag.
HarborCheck().
PythonVersion(ctx)
}@function
async def example() -> str:
return await (
dag.harbor_check()
.python_version()
)@func()
async example(): Promise<string> {
return dag
.harborCheck()
.pythonVersion()
}container() 🔗
Optional custom base image. When set, it replaces the default Alpine + uv base; Harbor + extras + harbor_runner are still installed into it.
Return Type
Container Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
containerfunc (m *MyModule) Example() *dagger.Container {
return dag.
HarborCheck().
Container()
}@function
def example() -> dagger.Container:
return (
dag.harbor_check()
.container()
)@func()
example(): Container {
return dag
.harborCheck()
.container()
}claudeCodeOauthToken() 🔗
Claude Code OAuth token, forwarded to Harbor as a secret env var. Never printed; Dagger scrubs its value from logs.
Return Type
Secret Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
claude-code-oauth-tokenfunc (m *MyModule) Example() *dagger.Secret {
return dag.
HarborCheck().
ClaudeCodeOauthToken()
}@function
def example() -> dagger.Secret:
return (
dag.harbor_check()
.claude_code_oauth_token()
)@func()
example(): Secret {
return dag
.harborCheck()
.claudeCodeOauthToken()
}openrouterApiKey() 🔗
OpenRouter API key (for Kimi trials), forwarded as a secret env var.
Return Type
Secret Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
openrouter-api-keyfunc (m *MyModule) Example() *dagger.Secret {
return dag.
HarborCheck().
OpenrouterApiKey()
}@function
def example() -> dagger.Secret:
return (
dag.harbor_check()
.openrouter_api_key()
)@func()
example(): Secret {
return dag
.harborCheck()
.openrouterApiKey()
}codexAccessToken() 🔗
Codex (ChatGPT) access token for the RewardKit
codexagent judge, forwarded as a secret env var. WithREWARDKIT_FORCE_OAUTH=1, RewardKit prefers this overOPENAI_API_KEY(rewardkit/judges.py) — the codex analogue of the Anthropic subscription-token path.
Return Type
Secret Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
codex-access-tokenfunc (m *MyModule) Example() *dagger.Secret {
return dag.
HarborCheck().
CodexAccessToken()
}@function
def example() -> dagger.Secret:
return (
dag.harbor_check()
.codex_access_token()
)@func()
example(): Secret {
return dag
.harborCheck()
.codexAccessToken()
}miniSweConfig() 🔗
Default mini-swe-agent config YAML (a path relative to the source, e.g. the task repo’s
mini-swe-agent.yaml) forwarded asharbor run -c <config>.Constructor-defaulted for the common case and also accepted per call by evidence-emitting primitives, so task-type registries can route each task shape to its own agent config without rebuilding the toolchain accessor.
Return Type
String ! Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
mini-swe-configfunc (m *MyModule) Example(ctx context.Context) string {
return dag.
HarborCheck().
MiniSweConfig(ctx)
}@function
async def example() -> str:
return await (
dag.harbor_check()
.mini_swe_config()
)@func()
async example(): Promise<string> {
return dag
.harborCheck()
.miniSweConfig()
}validateTask() 🔗
Validate the source as a Harbor task directory (Task / TaskConfig models). Fails the check when Harbor reports the task invalid.
Return Type
Void ! Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
validate-taskfunc (m *MyModule) Example(ctx context.Context) {
return dag.
HarborCheck().
ValidateTask(ctx)
}@function
async def example() -> None:
return await (
dag.harbor_check()
.validate_task()
)@func()
async example(): Promise<void> {
return dag
.harborCheck()
.validateTask()
}checkTrajectory() 🔗
Validate the source’s trajectory with Harbor’s TrajectoryValidator.
Return Type
Void ! Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
check-trajectoryfunc (m *MyModule) Example(ctx context.Context) {
return dag.
HarborCheck().
CheckTrajectory(ctx)
}@function
async def example() -> None:
return await (
dag.harbor_check()
.check_trajectory()
)@func()
async example(): Promise<void> {
return dag
.harborCheck()
.checkTrajectory()
}environmentBuilds() 🔗
Build the mounted source’s task environment as a check: fails when
harbor task start-env cannot build the environment/ (Dockerfile /
docker_image / compose). Confirms the environment builds before trials run.
Return Type
Void ! Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
environment-buildsfunc (m *MyModule) Example(ctx context.Context) {
return dag.
HarborCheck().
EnvironmentBuilds(ctx)
}@function
async def example() -> None:
return await (
dag.harbor_check()
.environment_builds()
)@func()
async example(): Promise<void> {
return dag
.harborCheck()
.environmentBuilds()
}proofHash() 🔗
Deterministic proof-freshness content hash of the source task tree
(generated/cache files excluded). Returns sha256: — byte-identical to the
digest harbor sync writes into a dataset manifest (reconciled to Harbor’s
native Packager.compute_content_hash, with a marked local fallback).
Return Type
String ! Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
proof-hashfunc (m *MyModule) Example(ctx context.Context) string {
return dag.
HarborCheck().
ProofHash(ctx)
}@function
async def example() -> str:
return await (
dag.harbor_check()
.proof_hash()
)@func()
async example(): Promise<string> {
return dag
.harborCheck()
.proofHash()
}proofDigest() 🔗
Content digest of the source task tree as ContentDigest JSON (algorithm, digest, source [“harbor”|“fallback”], files).
Return Type
String ! Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
proof-digestfunc (m *MyModule) Example(ctx context.Context) string {
return dag.
HarborCheck().
ProofDigest(ctx)
}@function
async def example() -> str:
return await (
dag.harbor_check()
.proof_digest()
)@func()
async example(): Promise<string> {
return dag
.harborCheck()
.proofDigest()
}rewardDetails() 🔗
Parse every reward-details.json under the source into a stable JSON array of criterion observations.
Return Type
String ! Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
reward-detailsfunc (m *MyModule) Example(ctx context.Context) string {
return dag.
HarborCheck().
RewardDetails(ctx)
}@function
async def example() -> str:
return await (
dag.harbor_check()
.reward_details()
)@func()
async example(): Promise<string> {
return dag
.harborCheck()
.rewardDetails()
}solveRates() 🔗
Compute per-criterion solve rates (passed / total) under the source.
Return Type
String ! Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
solve-ratesfunc (m *MyModule) Example(ctx context.Context) string {
return dag.
HarborCheck().
SolveRates(ctx)
}@function
async def example() -> str:
return await (
dag.harbor_check()
.solve_rates()
)@func()
async example(): Promise<string> {
return dag
.harborCheck()
.solveRates()
}normalize() 🔗
Normalize a Harbor job directory (relative to the source) into stable JSON.
Return Type
String !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| jobDir | String | - | No description provided |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
normalizefunc (m *MyModule) Example(ctx context.Context) string {
return dag.
HarborCheck().
Normalize(ctx)
}@function
async def example() -> str:
return await (
dag.harbor_check()
.normalize()
)@func()
async example(): Promise<string> {
return dag
.harborCheck()
.normalize()
}manifest() 🔗
Build an artifact manifest (path, size, sha256) of the source as JSON.
Return Type
String ! Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
manifestfunc (m *MyModule) Example(ctx context.Context) string {
return dag.
HarborCheck().
Manifest(ctx)
}@function
async def example() -> str:
return await (
dag.harbor_check()
.manifest()
)@func()
async example(): Promise<string> {
return dag
.harborCheck()
.manifest()
}redact() 🔗
Redaction scan: scrub secret-shaped strings from text.
Return Type
String !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| text | String ! | - | No description provided |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
redact --text stringfunc (m *MyModule) Example(ctx context.Context, text string) string {
return dag.
HarborCheck().
Redact(ctx, text)
}@function
async def example(text: str) -> str:
return await (
dag.harbor_check()
.redact(text)
)@func()
async example(text: string): Promise<string> {
return dag
.harborCheck()
.redact(text)
}check() 🔗
Run harbor check against tasks/<slug> from working directory kit, using
rubric rubric and model model, writing the report under .dagger-output.
Returns the post-run container (pull the output directory from it).
Return Type
Container !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| slug | String ! | - | No description provided |
| rubric | String ! | - | No description provided |
| model | String | - | No description provided |
| kit | String | - | No description provided |
| outputPath | String | - | No description provided |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
check --slug string --rubric stringfunc (m *MyModule) Example(slug string, rubric string) *dagger.Container {
return dag.
HarborCheck().
Check(slug, rubric)
}@function
def example(slug: str, rubric: str) -> dagger.Container:
return (
dag.harbor_check()
.check(slug, rubric)
)@func()
example(slug: string, rubric: string): Container {
return dag
.harborCheck()
.check(slug, rubric)
}run() 🔗
Generic harbor run. Covers oracle/nop/SKU controls and Kimi difficulty
trials — the orchestration decides which agent/model/env/controls to use.
Token-bearing forwarding flags (--verifier-env, --agent-env) are expanded
in-shell from the secret env vars set by withSecrets, mirroring how the
legacy pipeline forwarded them.
Return Type
Container !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| agent | String ! | - | No description provided |
| taskPath | String ! | - | No description provided |
| model | String ! | - | No description provided |
| jobName | String ! | - | No description provided |
| env | String | - | No description provided |
| miniSweConfig | String | - | No description provided |
| outputDir | String | - | No description provided |
| reasoningEffort | String | - | Optional |
| forwardOpenrouter | Boolean | - | Forward OPENROUTER_API_KEY to the agent (Kimi trials need it). |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
run --agent string --task-path string --model string --job-name stringfunc (m *MyModule) Example(agent string, taskPath string, model string, jobName string) *dagger.Container {
return dag.
HarborCheck().
Run(agent, taskPath, model, jobName)
}@function
def example(agent: str, task_path: str, model: str, job_name: str) -> dagger.Container:
return (
dag.harbor_check()
.run(agent, task_path, model, job_name)
)@func()
example(agent: string, taskPath: string, model: string, jobName: string): Container {
return dag
.harborCheck()
.run(agent, taskPath, model, jobName)
}runOracle() 🔗
Convenience: harbor run -a oracle (golden control).
Return Type
Container !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| taskPath | String ! | - | No description provided |
| model | String | - | No description provided |
| miniSweConfig | String | - | No description provided |
| env | String | - | No description provided |
| jobName | String | - | No description provided |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
run-oracle --task-path stringfunc (m *MyModule) Example(taskPath string) *dagger.Container {
return dag.
HarborCheck().
RunOracle(taskPath)
}@function
def example(task_path: str) -> dagger.Container:
return (
dag.harbor_check()
.run_oracle(task_path)
)@func()
example(taskPath: string): Container {
return dag
.harborCheck()
.runOracle(taskPath)
}runNop() 🔗
Convenience: harbor run -a nop (no-op control).
Return Type
Container !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| taskPath | String ! | - | No description provided |
| model | String | - | No description provided |
| miniSweConfig | String | - | No description provided |
| env | String | - | No description provided |
| jobName | String | - | No description provided |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
run-nop --task-path stringfunc (m *MyModule) Example(taskPath string) *dagger.Container {
return dag.
HarborCheck().
RunNop(taskPath)
}@function
def example(task_path: str) -> dagger.Container:
return (
dag.harbor_check()
.run_nop(task_path)
)@func()
example(taskPath: string): Container {
return dag
.harborCheck()
.runNop(taskPath)
}kimiTrial() 🔗
Convenience: a Kimi difficulty trial via mini-swe-agent with high reasoning effort and OpenRouter forwarding.
Return Type
Container !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| taskPath | String ! | - | No description provided |
| trial | Integer | - | No description provided |
| model | String | - | No description provided |
| miniSweConfig | String | - | No description provided |
| env | String | - | No description provided |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
kimi-trial --task-path stringfunc (m *MyModule) Example(taskPath string) *dagger.Container {
return dag.
HarborCheck().
KimiTrial(taskPath)
}@function
def example(task_path: str) -> dagger.Container:
return (
dag.harbor_check()
.kimi_trial(task_path)
)@func()
example(taskPath: string): Container {
return dag
.harborCheck()
.kimiTrial(taskPath)
}analyze() 🔗
Run harbor analyze over a jobs directory with the given rubric/model.
Return Type
Container !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| jobsDir | String ! | - | No description provided |
| rubric | String ! | - | No description provided |
| model | String | - | No description provided |
| outputPath | String | - | No description provided |
| concurrency | String | - | No description provided |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
analyze --jobs-dir string --rubric stringfunc (m *MyModule) Example(jobsDir string, rubric string) *dagger.Container {
return dag.
HarborCheck().
Analyze(jobsDir, rubric)
}@function
def example(jobs_dir: str, rubric: str) -> dagger.Container:
return (
dag.harbor_check()
.analyze(jobs_dir, rubric)
)@func()
example(jobsDir: string, rubric: string): Container {
return dag
.harborCheck()
.analyze(jobsDir, rubric)
}startEnv() 🔗
Build (start) the source task’s environment/ via harbor task start-env,
exercising its Dockerfile / docker_image / compose without running an agent.
Returns the post-build container (pull logs/artifacts from it).
Return Type
Container !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| taskPath | String ! | - | No description provided |
| env | String | - | No description provided |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
start-env --task-path stringfunc (m *MyModule) Example(taskPath string) *dagger.Container {
return dag.
HarborCheck().
StartEnv(taskPath)
}@function
def example(task_path: str) -> dagger.Container:
return (
dag.harbor_check()
.start_env(task_path)
)@func()
example(taskPath: string): Container {
return dag
.harborCheck()
.startEnv(taskPath)
}trial() 🔗
Run harbor trial start (a single trial). Mirrors run exactly but invokes
the trial subcommand and forwards --trial-name in place of --job-name.
The Python source-of-truth for this exact argv/flag shape is
harbor_runner/commands.py (render_harbor_command), which is unit-tested.
Token-bearing forwarding flags (--verifier-env, --agent-env) are expanded
in-shell from the secret env vars set by withSecrets; plaintext is never
interpolated.
Return Type
Container !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| agent | String ! | - | No description provided |
| taskPath | String ! | - | No description provided |
| model | String ! | - | No description provided |
| trialName | String ! | - | No description provided |
| miniSweConfig | String | - | No description provided |
| env | String | - | No description provided |
| outputDir | String | - | No description provided |
| reasoningEffort | String | - | Optional |
| forwardOpenrouter | Boolean | - | Forward OPENROUTER_API_KEY to the agent (Kimi trials need it). |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
trial --agent string --task-path string --model string --trial-name stringfunc (m *MyModule) Example(agent string, taskPath string, model string, trialName string) *dagger.Container {
return dag.
HarborCheck().
Trial(agent, taskPath, model, trialName)
}@function
def example(agent: str, task_path: str, model: str, trial_name: str) -> dagger.Container:
return (
dag.harbor_check()
.trial(agent, task_path, model, trial_name)
)@func()
example(agent: string, taskPath: string, model: string, trialName: string): Container {
return dag
.harborCheck()
.trial(agent, taskPath, model, trialName)
}exportTraces() 🔗
Export Harbor agent trajectories under path to a local SFT dataset
(parquet by default, or ShareGPT JSON when sharegpt is set), written under
outputDir. Returns that directory.
episodes selects which episodes to include (“all” or “last”). Pushing to
the Hugging Face Hub (hf://) is opt-in upstream and is intentionally NOT
implemented in this primitive — only local parquet/JSON output is produced.
Return Type
Directory !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| path | String ! | - | No description provided |
| episodes | String | - | No description provided |
| sharegpt | Boolean | - | No description provided |
| outputDir | String | - | No description provided |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
export-traces --path stringfunc (m *MyModule) Example(path string) *dagger.Directory {
return dag.
HarborCheck().
ExportTraces(path)
}@function
def example(path: str) -> dagger.Directory:
return (
dag.harbor_check()
.export_traces(path)
)@func()
example(path: string): Directory {
return dag
.harborCheck()
.exportTraces(path)
}download() 🔗
Download a shared Harbor task or dataset (harbor download) and reconcile the
fetched bytes against a content digest. allowNetwork defaults to false: with
no network/registry-auth egress the function REFUSES without fetching. Pass
allowNetwork=true to actually reach the registry; the verified digest reuses
the same reconciled native hash as proof-freshness.
Return Type
Container !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| ref | String ! | - | No description provided |
| outputDir | String | - | No description provided |
| allowNetwork | Boolean | - | No description provided |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
download --ref stringfunc (m *MyModule) Example(ref string) *dagger.Container {
return dag.
HarborCheck().
Download(ref)
}@function
def example(ref: str) -> dagger.Container:
return (
dag.harbor_check()
.download(ref)
)@func()
example(ref: string): Container {
return dag
.harborCheck()
.download(ref)
}screenPool() 🔗
Pool the native reward-details.json grades under gradesDir (relative to the
source /work) via screen-pool. Reads every */reward-details.json, folds
through screen.pool.pool (errored EXCLUDED, UNWEIGHTED bar, boundary band
surfaced, INSUFFICIENT_DATA on < k non-errored) and returns the PoolResult
JSON on stdout.
k/threshold/mode are configurable knobs (house-rule 6); the admission
verdict is decided inside pool.py, not here (HC5). Int args are stringified
via "${n}" string interpolation — dang has no Int.toString, and String! +
Int! is a type error; interpolation is the conversion idiom.
Return Type
String !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| gradesDir | String ! | - | No description provided |
| k | Integer | - | No description provided |
| threshold | Integer | - | No description provided |
| mode | String | - | No description provided |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
screen-pool --grades-dir stringfunc (m *MyModule) Example(ctx context.Context, gradesDir string) string {
return dag.
HarborCheck().
ScreenPool(ctx, gradesDir)
}@function
async def example(grades_dir: str) -> str:
return await (
dag.harbor_check()
.screen_pool(grades_dir)
)@func()
async example(gradesDir: string): Promise<string> {
return dag
.harborCheck()
.screenPool(gradesDir)
}screenClassify() 🔗
Classify a finished trial set under trialsDir (relative to /work) via
screen-classify: apply the failure taxonomy and env-quality gates and return
the disposition/decision JSON on stdout. Gates flag, never accommodate
(house-rule 7) — a slow startup surfaces as SKIPPED_SLOW_STARTUP, a slow trial
as TRIAL_TOO_SLOW; the measured seconds are a fact, the flag is the verdict
(decided in classify.py, not here).
Return Type
String !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| trialsDir | String ! | - | No description provided |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
screen-classify --trials-dir stringfunc (m *MyModule) Example(ctx context.Context, trialsDir string) string {
return dag.
HarborCheck().
ScreenClassify(ctx, trialsDir)
}@function
async def example(trials_dir: str) -> str:
return await (
dag.harbor_check()
.screen_classify(trials_dir)
)@func()
async example(trialsDir: string): Promise<string> {
return dag
.harborCheck()
.screenClassify(trialsDir)
}screenReport() 🔗
Fold per-task result dicts (tasksJson, a path relative to /work) through
screen-report and return the deterministic, sorted report JSON on stdout.
Output is byte-identical for identical input (no clock/PID/uuid; the
inputs_digest is content-derived in report.py).
Return Type
String !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| tasksJson | String ! | - | No description provided |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
screen-report --tasks-json stringfunc (m *MyModule) Example(ctx context.Context, tasksJson string) string {
return dag.
HarborCheck().
ScreenReport(ctx, tasksJson)
}@function
async def example(tasks_json: str) -> str:
return await (
dag.harbor_check()
.screen_report(tasks_json)
)@func()
async example(tasksJson: string): Promise<string> {
return dag
.harborCheck()
.screenReport(tasksJson)
}screen() 🔗
Difficulty / admission screener for a single task. Runs the EXISTING trial
(mini-swe-agent + the task’s own rewardkit verifier) maxTrials times,
collecting each trial’s output directory; then classifies and pools those
native reward-details.json grades and folds the result through
screen-report, returning a Directory holding report.json + JUnit XML
(matching exportTraces, which also returns a Directory).
Screening is difficulty-only and mini-swe-agent only: there is no llm agent
path here, and controls=true is rejected (golden/no-op controls stay in
runOracle/runNop, not the screener — HC5 boundary). The admission verdict is
decided in pool.py from threshold/k; this dang function holds no threshold
and embeds no decision. Secrets are forwarded only through the reused trial
(withSecrets) — no new secret surface is introduced.
NOTE (lead to confirm the dang loop idiom): the dang module has no observed
loop/iteration construct, so the trial fan-out is a FIXED unrolled sequence of
trial calls (up to the maxTrials ceiling) collected conditionally — mirroring
the existing let x = if (...) {...} idiom. Each attempt past maxTrials is
skipped. When the dang loop idiom is confirmed, this unroll collapses to a loop
over 1..maxTrials.
Return Type
Directory !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| taskPath | String ! | - | No description provided |
| model | String | - | No description provided |
| miniSweConfig | String ! | - | No description provided |
| k | Integer | - | No description provided |
| threshold | Integer | - | No description provided |
| maxTrials | Integer | - | No description provided |
| env | String | - | No description provided |
| controls | Boolean | - | No description provided |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
screen --task-path string --mini-swe-config stringfunc (m *MyModule) Example(taskPath string, miniSweConfig string) *dagger.Directory {
return dag.
HarborCheck().
Screen(taskPath, miniSweConfig)
}@function
def example(task_path: str, mini_swe_config: str) -> dagger.Directory:
return (
dag.harbor_check()
.screen(task_path, mini_swe_config)
)@func()
example(taskPath: string, miniSweConfig: string): Directory {
return dag
.harborCheck()
.screen(taskPath, miniSweConfig)
}checkTask() 🔗
Run harbor check against task (a task path relative to the source) and emit
CheckEvidence JSON ({task, status, exitCode, summaryRef}) on stdout. The
check’s exit code is captured (always exits 0 at the wrapper) and folded through
check-evidence; status is pass/fail from the exit code (the admission verdict
is pipeline policy, not encoded here).
HC3: withExec is the only exec surface. HC4: task rides in an env var,
never interpolated into the command text.
Return Type
String !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| task | String ! | - | No description provided |
| rubric | String | - | No description provided |
| model | String | - | No description provided |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
check-task --task stringfunc (m *MyModule) Example(ctx context.Context, task string) string {
return dag.
HarborCheck().
CheckTask(ctx, task)
}@function
async def example(task: str) -> str:
return await (
dag.harbor_check()
.check_task(task)
)@func()
async example(task: string): Promise<string> {
return dag
.harborCheck()
.checkTask(task)
}runControl() 🔗
Dispatch a control run for task by agent (oracle / nop / any SKU control)
and emit ControlEvidence JSON ({task, agent, status, exitCode, reward}). The
control run uses the same command builder as run, but captures the Harbor
exit code inside the shell that invokes it; the reward is read from the run’s reward.json via the
control-evidence emitter (the reward is a FACT; oracle≈1 / nop≈0 being correct
is the pipeline’s verdict).
HC3: all execution via withExec. HC4: task/agent ride in env vars.
Return Type
String !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| task | String ! | - | No description provided |
| agent | String ! | - | No description provided |
| model | String | - | No description provided |
| miniSweConfig | String | - | No description provided |
| env | String | - | No description provided |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
run-control --task string --agent stringfunc (m *MyModule) Example(ctx context.Context, task string, agent string) string {
return dag.
HarborCheck().
RunControl(ctx, task, agent)
}@function
async def example(task: str, agent: str) -> str:
return await (
dag.harbor_check()
.run_control(task, agent)
)@func()
async example(task: string, agent: string): Promise<string> {
return dag
.harborCheck()
.runControl(task, agent)
}runKimiTrials() 🔗
Run trials Kimi difficulty trials for task and emit canonical 0–1
DifficultyEvidence JSON ({task, model, pooled_pct, k, errored_excluded,
criteria, meanReward, costUsd, rewardDetailsPresent} — NO verdict field).
Orchestrates the EXISTING screen path: fan out trials difficulty trials via
screenTrialDir, pool the native grades via screen-pool into pooled.json,
then fold through difficulty-evidence (the SOLE 0-100→0-1 converter, HC2).
Facts only; the admission verdict is pipeline policy.
HC3: all execution via withExec. The trial fan-out is a FIXED unroll up to 10
(the dang loop idiom is not yet confirmed — see the note in screen).
Return Type
String !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| task | String ! | - | No description provided |
| trials | Integer | - | No description provided |
| model | String | - | No description provided |
| miniSweConfig | String | - | No description provided |
| k | Integer | - | No description provided |
| threshold | Integer | - | No description provided |
| env | String | - | No description provided |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
run-kimi-trials --task stringfunc (m *MyModule) Example(ctx context.Context, task string) string {
return dag.
HarborCheck().
RunKimiTrials(ctx, task)
}@function
async def example(task: str) -> str:
return await (
dag.harbor_check()
.run_kimi_trials(task)
)@func()
async example(task: string): Promise<string> {
return dag
.harborCheck()
.runKimiTrials(task)
}analyzeJobs() 🔗
Run harbor analyze over the task’s jobs directory and emit AnalyzeEvidence
JSON ({task, status, rewardHackingFindings[]}). The analyze exit code is
captured (always exits 0 at the wrapper) and folded through analyze-evidence,
which harvests any reward-hacking findings from the written analyze report.
Surfaced findings are facts, never an auto-verdict.
HC3: withExec is the only exec surface. HC4: task/jobsDir ride in env vars.
Return Type
String !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| task | String ! | - | No description provided |
| jobsDir | String | - | No description provided |
| rubric | String | - | No description provided |
| model | String | - | No description provided |
| concurrency | String | - | No description provided |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
analyze-jobs --task stringfunc (m *MyModule) Example(ctx context.Context, task string) string {
return dag.
HarborCheck().
AnalyzeJobs(ctx, task)
}@function
async def example(task: str) -> str:
return await (
dag.harbor_check()
.analyze_jobs(task)
)@func()
async example(task: string): Promise<string> {
return dag
.harborCheck()
.analyzeJobs(task)
}validateJobs() 🔗
Inspect the task’s jobs/ subtree and emit JobsEvidence JSON
({task, present, finalJobs[], staleRetries[]}). Pure filesystem inspection in
the container via jobs-evidence — no Harbor invocation, no secrets, no
mutation (incomplete trials are surfaced as staleRetries, never quarantined).
HC3: withExec is the only exec surface; jobs_evidence.py is pure pathlib+json.
Return Type
String !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| task | String ! | - | No description provided |
| taskRoot | String | - | No description provided |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
validate-jobs --task stringfunc (m *MyModule) Example(ctx context.Context, task string) string {
return dag.
HarborCheck().
ValidateJobs(ctx, task)
}@function
async def example(task: str) -> str:
return await (
dag.harbor_check()
.validate_jobs(task)
)@func()
async example(task: string): Promise<string> {
return dag
.harborCheck()
.validateJobs(task)
}platformCompat() 🔗
Check whether the task’s environment/ Dockerfile builds on the given OCI
platform. Uses harbor_runner.platform.buildx.buildx_argv (pure, HC3-clean)
to produce the docker buildx build --platform argv, then runs it via
withExec (the toolchain’s only exec surface) and parses the result through
platform-parse → PlatformEvidence JSON (String!).
HC3: withExec is the only exec surface; the buildx argv builder and parse
modules contain no host-exec calls.
Return Type
String !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| taskPath | String ! | - | No description provided |
| platform | String | - | No description provided |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
platform-compat --task-path stringfunc (m *MyModule) Example(ctx context.Context, taskPath string) string {
return dag.
HarborCheck().
PlatformCompat(ctx, taskPath)
}@function
async def example(task_path: str) -> str:
return await (
dag.harbor_check()
.platform_compat(task_path)
)@func()
async example(taskPath: string): Promise<string> {
return dag
.harborCheck()
.platformCompat(taskPath)
}executorRun() 🔗
Run a single Harbor trial via the local Docker or Modal executor and emit
ExecutorEvidence JSON ({task, executor, status}) in EVERY case (A6 fix).
For executor=local: actually run the trial via the existing trial path
(mini-swe-agent + the task’s verifier), capture its exit code with a post-exec
probe (sh -c writing $? then exiting 0 so withExec does not fail), and shape
the evidence from that code through executor-run.
For executor=modal: run the same Harbor trial path with -e modal, then pass
the captured exit code to executor-run --executor modal --exit-code ... so the
evidence contract stays identical. Modal token creds are forwarded as container
env vars via withSecretVariable (never on argv).
Secrets are forwarded via the existing withSecrets pattern plus the optional
Modal token pair — no plaintext on argv (HC4). HC3: withExec is the only exec
surface; every reached module is host-exec-free.
Return Type
String !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| taskPath | String ! | - | No description provided |
| executor | String | - | No description provided |
| platform | String | - | No description provided |
| jobName | String | - | No description provided |
| model | String | - | No description provided |
| miniSweConfig | String | - | No description provided |
| env | String | - | No description provided |
| modalTokenId | Secret | - | No description provided |
| modalTokenSecret | Secret | - | No description provided |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
executor-run --task-path stringfunc (m *MyModule) Example(ctx context.Context, taskPath string) string {
return dag.
HarborCheck().
ExecutorRun(ctx, taskPath)
}@function
async def example(task_path: str) -> str:
return await (
dag.harbor_check()
.executor_run(task_path)
)@func()
async example(taskPath: string): Promise<string> {
return dag
.harborCheck()
.executorRun(taskPath)
}selfReviewReport() 🔗
Emit a SelfReviewReport JSON fact and optionally write it to outputPath.
requiredLanesJson is a JSON string array. checksJson is a JSON array of
{id,state,evidence} objects. The wrapper passes both as scalar argv values so
the Dagger surface stays typed and the Python emitter owns report shaping.
Return Type
String !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| task | String ! | - | No description provided |
| taskType | String ! | - | No description provided |
| sourceKind | String ! | - | No description provided |
| reviewAgent | String ! | - | No description provided |
| reportPath | String ! | - | No description provided |
| reviewState | String | - | No description provided |
| requiredLanesJson | String | - | No description provided |
| checksJson | String | - | No description provided |
| outputPath | String | - | No description provided |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
self-review-report --task string --task-type string --source-kind string --review-agent string --report-path stringfunc (m *MyModule) Example(ctx context.Context, task string, taskType string, sourceKind string, reviewAgent string, reportPath string) string {
return dag.
HarborCheck().
SelfReviewReport(ctx, task, taskType, sourceKind, reviewAgent, reportPath)
}@function
async def example(task: str, task_type: str, source_kind: str, review_agent: str, report_path: str) -> str:
return await (
dag.harbor_check()
.self_review_report(task, task_type, source_kind, review_agent, report_path)
)@func()
async example(task: string, taskType: string, sourceKind: string, reviewAgent: string, reportPath: string): Promise<string> {
return dag
.harborCheck()
.selfReviewReport(task, taskType, sourceKind, reviewAgent, reportPath)
}runSweepTrials() 🔗
Fan out trials trials per (agent, model) cell across the FULL matrix and emit
SweepDifficultyEvidence JSON ({task, cells[{agent, model, pooled_pct, k,
errored_excluded}]}). Facts only — no verdict field (HC5: the dang layer holds
no thresholds and embeds no decision; pooled_pct is a 0–1 fraction, HC2).
The matrix is built by sweep-matrix (A7 fix): agents/models are passed as
base64-encoded JSON via env vars, so a model string containing commas/brackets
can never corrupt the argv (no broken ["a,b"] string-concat). Each cell runs
its OWN trials into cell-<i>/trial-<n> (A8 fix: one row per matrix cell, not a
single hardcoded cell), with the per-cell trial fan-out unrolled to 10 (A9 fix:
cap raised from 4).
The matrix fan-out is a FIXED unroll over a bounded 2×2 (agent, model) grid (the
dang loop idiom is not confirmed — see the note in screen). Cells beyond the
first are guarded by agentCount/modelCount, which default to the 1×1 default
matrix so the pipeline’s source+task call runs exactly the default cell; a
caller widening agents/models MUST pass matching counts and MUST provide
lists whose lengths are at least agentCount / modelCount. The current dang
surface does not expose a list-length primitive, so this is an explicit caller
contract: agentCount >= 2 authorizes use of agents[1]!, and
modelCount >= 2 authorizes use of models[1]!. The Python sweep-matrix
step then cross-checks the encoded list source against the expected indexed
values before emitting evidence. sweep-collect reports any matrix slot whose
trials were not run as INSUFFICIENT_DATA / pooled_pct=null — a fact, never a
silent drop.
HC3: all execution is via withExec; the sweep/matrix/collect modules are pure.
Return Type
String !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| taskPath | String ! | - | No description provided |
| agents | [String ! ] | - | No description provided |
| models | [String ! ] | - | No description provided |
| agentsB64 | String | - | No description provided |
| modelsB64 | String | - | No description provided |
| agentCount | Integer | - | No description provided |
| modelCount | Integer | - | No description provided |
| trials | Integer | - | No description provided |
| miniSweConfig | String | - | No description provided |
| env | String | - | No description provided |
| k | Integer | - | No description provided |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
run-sweep-trials --task-path stringfunc (m *MyModule) Example(ctx context.Context, taskPath string) string {
return dag.
HarborCheck().
RunSweepTrials(ctx, taskPath)
}@function
async def example(task_path: str) -> str:
return await (
dag.harbor_check()
.run_sweep_trials(task_path)
)@func()
async example(taskPath: string): Promise<string> {
return dag
.harborCheck()
.runSweepTrials(taskPath)
}screenRetry() 🔗
Retry-resume: remove completed trial dirs whose result.json carries an
exception_info.exception_type matching any entry in errorTypes (comma-
separated), via harbor_runner.screen.retry_resume.select_trials_to_retry.
Returns a pruned Directory! of the trials dir (the removed dirs are gone;
surviving trials remain). Secrets are NOT needed — this is a pure filesystem
operation inside the container.
HC3: retry_resume.py is pure pathlib+json; withExec is the only exec surface.
Return Type
Directory !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| trialsDir | String ! | - | No description provided |
| errorTypes | String | - | No description provided |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
screen-retry --trials-dir stringfunc (m *MyModule) Example(trialsDir string) *dagger.Directory {
return dag.
HarborCheck().
ScreenRetry(trialsDir)
}@function
def example(trials_dir: str) -> dagger.Directory:
return (
dag.harbor_check()
.screen_retry(trials_dir)
)@func()
example(trialsDir: string): Directory {
return dag
.harborCheck()
.screenRetry(trialsDir)
}screenCleanup() 🔗
Cleanup: quarantine trial dirs that are incomplete (missing/invalid
config.json or result.json), via
harbor_runner.screen.retry_resume.clean_incomplete_trials. Returns the pruned
Directory! of the trials dir. Pure filesystem operation — no secrets needed.
HC3: retry_resume.py is pure pathlib+json; withExec is the only exec surface.
Return Type
Directory !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| trialsDir | String ! | - | No description provided |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
screen-cleanup --trials-dir stringfunc (m *MyModule) Example(trialsDir string) *dagger.Directory {
return dag.
HarborCheck().
ScreenCleanup(trialsDir)
}@function
def example(trials_dir: str) -> dagger.Directory:
return (
dag.harbor_check()
.screen_cleanup(trials_dir)
)@func()
example(trialsDir: string): Directory {
return dag
.harborCheck()
.screenCleanup(trialsDir)
}syncTrials() 🔗
Upsert a Harbor job directory’s job + trial rows to Supabase via
harbor_runner.sync.trials.sync_trials. No-op (graceful skip) when
supabaseUrl/supabaseKey are not supplied — the call always succeeds, it
just writes nothing. Returns {written, skipped} JSON (String!).
HC4: the Supabase URL/key are attached as Dagger secrets via
withSecretVariable and read by the CLI from os.environ — they NEVER appear
on argv (A11 fix). The argv is a STRUCTURED withExec (no sh -c, no
command-substitution that would echo the plaintext URL/key into the final
command); the only inputs on the command line are the non-secret --job-dir.
HC3: sync/trials.py contains no host-exec calls; withExec is the sole exec surface.
Return Type
String !Arguments
| Name | Type | Default Value | Description |
|---|---|---|---|
| jobDir | String ! | - | No description provided |
| supabaseUrl | Secret | - | No description provided |
| supabaseKey | Secret | - | No description provided |
Example
dagger -m github.com/Kurry/harbor-check@34fb3409d0c79a92daf5345daddeb30f16737fe0 call \
sync-trials --job-dir stringfunc (m *MyModule) Example(ctx context.Context, jobDir string) string {
return dag.
HarborCheck().
SyncTrials(ctx, jobDir)
}@function
async def example(job_dir: str) -> str:
return await (
dag.harbor_check()
.sync_trials(job_dir)
)@func()
async example(jobDir: string): Promise<string> {
return dag
.harborCheck()
.syncTrials(jobDir)
}