Workflows¶
Canonical reference for the blackbox workflow engine - the durable execution loop that lets you stitch together rule-packet decisions, hook side-effects, sub-arc composition, and external webhook signals into a deterministic state machine driven by mermaid graphs.
Why a separate doc. README is for the project. This is for the engine. If you're authoring a workflow, this is your map. If you just want to run an example, head to
examples/keystone/.
Why it exists¶
An LLM maintaining workflow state across turns drifts. It forgets phases, re-litigates settled decisions, invents new steps to paper over mistakes, and dies on context compaction. A daemon-driven loop doesn't - it has no context to forget. LLMs become stateless function calls dispatched into the loop rather than the loop's substrate.
This is the same lesson Temporal, Restate, Sayiir, and Wolverine learned for deterministic handlers. The shape adapted here: checkpoint-based (LLMs are non-deterministic so replay is meaningless), JSON-spec-driven (specs are LLM-author-friendly), packet-policy-mechanized (decisions are data, not code), MCP-surfaced (operator commands the engine through the same protocol agents use).
TL;DR shape¶
A workflow is one JSON file:
{
"name": "issue-to-merged-pr",
"version": 1,
"vars_schema": { "issue_number": {"kind": "int", "required": true}, ... },
"actors": { "implementer": { "kind": "executor", "brofile": "...", "durable": true } },
"policy_packet": "domain:workflow-policy/arc-budget",
"on_arc_exit": [ ... cleanup hooks ... ],
"on_arc_cancel": [ ... compensating hooks ... ],
"start": "Setup",
"nodes": {
"Setup": { "actor": "", "on_enter": [ ... ],
"next": { "type": "goto", "to": "Implement" } },
"Implement": { "subworkflow_ref": "implementer-arc",
"imports": ["issue_number","owner","repo"],
"exports": ["pr_number","branch"],
"next": { "type": "goto", "to": "Wait" } },
"Wait": { "wait": { "any_of": [{"signal":"pr-merged",
"correlate":{"pr":{"kind":"json_path","path":"vars.pr_number"}}}],
"timeout": "24h" },
"gate": "domain:workflow-gate/merge-or-review",
"next": { "type": "branch",
"cases": { "merged": "Done", "ready": "Done" } } },
"Done": { "actor": "", "next": { "type": "terminal" } }
}
}
The metadata declares actors, optional atom_bindings, nodes,
vars_schema, policy_packet, and arc-level hooks. Control flow lives
on each node as a typed next clause (goto / branch / fork /
terminal); top-level start names the entry node. The daemon validates
every transition target, actor reference, atom binding, late_inject source,
and node reachability from start before any dispatch fires. The canonical
JSON Schema is at schema/workflow.schema.json.
ArcContext - the universal state container¶
Every node, hook, gate, policy packet, and webhook routing decision sees the same JSON-shaped state, called the ArcContext:
{
"vars": { ... user-set, hook-writable, schema-validated ... },
"outputs": { "NodeName": <value>, ... immutable per node ... },
"meta": { "arc_id", "workflow_name", "workflow_version",
"started_at", "project_dir", "worktree",
"arc_outcome", "parent_arc_id", "composition_depth" },
"last_signal": { "name", "payload", "correlation", "received_at" }?,
"signal_history": [ ... append-only ... ]
}
Templating¶
Anywhere a string is interpreted as a template (prompt, hook args,
correlation values), ${path.into.context} resolves against this shape:
| Template form | Resolves to |
|---|---|
${vars.x} |
vars["x"] |
${vars.list.0} |
vars["list"][0] (array indexing) |
${outputs.NodeName} |
output of NodeName (raw) |
${outputs.NodeName.field.subfield} |
nested JSON traversal |
${meta.arc_id} / ${meta.worktree} |
arc-intrinsic metadata |
${last_signal.name} |
which Wait signal just resolved |
${last_signal.payload.x} |
resolved-signal payload field |
${NodeName.output} (legacy) |
same as ${outputs.NodeName} |
Whole-string templates resolve as their typed value (a ${vars.n}
where vars.n == 42 becomes the integer 42 in a hook arg). Mixed
strings stringify ("issue-${vars.n}" → "issue-42").
Unresolved expressions are left in place verbatim - ${vars.does_not_exist}
shows up as the literal string in the dispatched prompt rather than
silently becoming the empty string. Misspellings are loud.
Vars schema¶
Optional vars_schema declares per-key kind and required-ness:
"vars_schema": {
"issue_number": { "kind": "int", "required": true },
"branch": { "kind": "string", "required": false },
"labels": { "kind": "array" },
"metadata": { "kind": "object" }
}
Kinds: bool / int / float / string / array / object /
any. Hook writes (SetVar, IncVar, MergeVar, ParseJson,
HttpJson with into_var) are kind-validated against the schema -
mismatches fail at write time, not later. required: true keys are
enforced at arc start (initial vars), at sub-arc start (imports), and
at terminal state (exports). Unknown keys are accepted (open schema
by design - explicit required fields are how you lock down the
surface).
Packet entity flatten¶
When a packet evaluates against the ArcContext (gates, hook when,
policy packets, webhook routing), the engine flattens to:
{
"vars": { ... },
"outputs": { ... },
"meta": { ... },
"last_signal": { ... }?,
"node_output": "<just-completed node's output as string>",
"node_output_json": <raw>,
"node_id": "<just-completed node id>"
}
Predicate paths support dotted resolution - vars.issue_number,
outputs.Plan.branch, last_signal.name. Array elements via numeric
indices: vars.labels.0. Quantified predicates work too:
Exists{path:"vars.labels[*]", pred:Eq{field:"$", value:"bug"}}.
Actors¶
Declared in the workflow's actors map and referenced by node specs.
Two kinds.
| Kind | What it does |
|---|---|
executor |
Single bro dispatched via bro_exec / bro_resume. Used for every single-bro role the workflow declares - implementer, fixer, triager, planner, facilitator, advisor, aggregator, synthesizer, reviewer, etc. The brofile + prompt carry the persona. |
ensemble |
Team broadcast: every member runs the same prompt; output is the labeled concatenation. When the node has an associated whiteboard (see §Whiteboards), each member's STRICT-JSON output is also auto-postable via whiteboard_post from inside the dispatched turn. |
Why only two kinds. Earlier iterations had advisor, planner,
triager, user as marker types. They didn't pull engine weight -
mechanically each was identical to executor, and the marker was
never enforced. Persona / role / contract is a workflow-author concern
carried by the brofile lens + prompt + on_exit parse_json
validation. Engine vocabulary describes dispatch shape, not roles.
Human-in-the-loop without a user actor. When a workflow needs
human input - operator approval, external Claude weighing in,
ntfy-routed acknowledgement - the pattern is: open a whiteboard,
register the human as an operator role, suspend the arc on a
board-transitioned signal correlated to (board_id, target_phase).
The human (or their Claude session, or a slack/ntfy adapter) calls
whiteboard_post / whiteboard_vote / whiteboard_transition
through the same MCP surface in-workflow specialists use. No special
"user" type. See §Whiteboards.
For hook-only / pure-routing nodes (Setup / Done patterns where the
work is on_enter / on_exit hooks and there's no LLM dispatch), leave
the node's actor field empty (""). The validator only complains
when a non-empty actor name fails to resolve. Hook-only nodes still
fire hooks, capture their rendered prompt as the node output (so
${NodeName.output} references stay legal), and follow next.
Per-actor fields:
brofile- single-bro target (executor / advisor)team- ensemble targetdurable: true- reuse the same session across every node that invokes this actor; the engine threads the session id throughbro_resumeautomatically. Critical for the implementer that has to address feedback after initial PR.compaction_anchor: true- the actor contributes a rolling summary to the arc's work_item thread at every boundary, so an orchestrator swap-out doesn't lose strategic memory.requires: [Capability, ...]- see Capability tags below.
Atom Bindings¶
A workflow uses atom_bindings when a node should call a reusable capability
instead of dispatching an actor directly.
The binding name is local. The atom ref is the contract. That split matters: workflows can cap a powerful atom for one arc without changing the atom itself.
"atom_bindings": {
"echo": {
"atom_ref": "atom:echo@v1",
"limits": { "dispatches_runs": 0 }
}
},
"nodes": {
"Echo": {
"atom": "echo",
"atom_args": { "message": "hello" },
"next": { "type": "terminal" }
}
}
Use this when the workflow should say "run the test-island extractor" rather than know which brofile, workflow, runner, or adapter currently implements it.
Binding fields:
| Field | Meaning |
|---|---|
id |
Optional self-identifying id. If present, it must match the map key. |
atom_ref |
Required atom ref: atom:name@vN or atom:name@latest. |
durable |
Reuse the binding's previous invocation id on repeated visits. Profile-backed atoms resume; non-resumable handles are re-invoked. |
requires |
Provider capabilities required when the atom resolves to a profile-backed brofile. |
limits |
Optional tighter writes_files, dispatches_runs, max_depth, uses_network bounds. |
supervision_override / trace_override / portal |
Reserved policy metadata carried on the binding. |
Validation happens before dispatch. The compiler checks binding shape and node references. Capability validation resolves installed atoms, rejects missing or inactive refs, verifies binding limits do not exceed the atom's declared effects, and checks profile-backed provider capabilities.
At runtime the engine calls atom_invoke, records the invocation id, reads
atom_status, and stores that status JSON as the node output. Deterministic and
adapter atoms usually finish immediately. Profile and workflow atoms may still be
backed by a running bro task or child arc, so downstream nodes should branch on
the trace fields they actually require.
Atom nodes are mutually exclusive with actor, subworkflow*, wait,
foreach, and matrix, and cannot use mode: "fire_and_forget" in v1.
If atom_args is absent and prompt is present, the rendered prompt is passed
as { "prompt": "..." }; otherwise args default to {}.
Capability tags¶
Mirrors daystrom's CapabilityTag shape. Hard-fail at compile rather
than silent route-around.
| Capability | Provider availability |
|---|---|
structured_output |
Claude (extension), Codex (--output-schema); NOT Gemini CLI |
vision |
Claude, Gemini |
long_context |
Claude (1M); promote to model-keyed when smaller variants matter |
tool_use |
All except Vibe |
resume |
All except Vibe and Gemini |
Declare on the actor:
"actors": {
"structured_review": {
"kind": "ensemble",
"team": "review-team",
"requires": ["structured_output"]
}
}
bro_orchestrate_run (and the bro_workflow_install + webhook-routing
start_arc paths) walk every actor's brofile / team-member brofiles,
resolve to provider, check capability coverage. First missing
capability is a compile error with the offending actor + provider
named. Sub-workflows are recursively validated.
Nodes¶
A node declares the unit of work. Fields:
| Field | Meaning |
|---|---|
actor |
Which actor runs this node (mutually exclusive with atom, subworkflow*, and wait). |
atom |
Which workflow-local atom binding runs this node (mutually exclusive with actor, subworkflow*, wait, foreach, and matrix). |
atom_args |
Structured atom input arguments, resolved through ArcContext templating before atom_invoke. Whole-string templates preserve JSON type. |
prompt |
Template (rendered against ArcContext). |
gate |
Packet id; verdict drives next-node selection at choice nodes. |
gate_mode |
first (one verdict, default) or all (multi-finding aggregate). |
mode |
sync (default) or fire_and_forget. |
retry |
{max_generations: N} - visit-count ceiling. |
late_inject |
{from: NodeId, policy: "resume_on_return"} - fold an async source into this node at its entry. |
subworkflow |
Inline Workflow spec (mutually exclusive with subworkflow_ref). |
subworkflow_ref |
Workflow id resolved from the registry at dispatch time. |
imports |
Vars to copy from parent into the sub's fresh ArcContext (subworkflow nodes only). |
exports |
Vars to promote back to parent on sub completion (missing exports = runtime error). |
import_renames |
{ local_name: "parent.path.expression" } - Extractor-style projection from parent context into sub. |
on_enter |
Hook ops fired BEFORE actor dispatch / sub descent / Wait registration. |
on_exit |
Hook ops fired AFTER body returns, BEFORE the gate evaluates. Lets ParseJson normalize output the gate sees. |
wait |
WaitSpec - suspend the arc on a signal. Mutually exclusive with actor + subworkflow*. |
Hooks and the op catalog¶
Hooks are guarded side effects that fire at lifecycle boundaries:
| Lifecycle slot | When fires |
|---|---|
node.on_enter |
Before actor dispatch / Wait registration / subworkflow descent. |
node.on_exit |
After actor body returns, BEFORE the gate. |
workflow.on_arc_exit |
At terminal state (success, fail, cancel, timeout). |
workflow.on_arc_cancel |
ONLY when arc is cancelled. Runs BEFORE on_arc_exit. |
Each hook is one declaration:
{
"op": "set_var",
"args": { "key": "branch", "value": "fix/issue-${vars.issue_number}" },
"when": "domain:workflow-cleanup/keep-on-fail", // optional packet gate
"on_failure": "halt", // halt | warn | ignore
"into_var": "result_key" // for ops that produce a value
}
Hook gating¶
A hook's when references a packet. The packet evaluates against the
flattened ArcContext. The hook fires only if the packet's verdict is in
the operator-blessed allow set: allow, pass, proceed, fire,
ok, delete, keep, yes, true. Unknown verdicts (e.g. flag,
manual) do NOT permit firing - conservative on purpose.
Op catalog (current)¶
| Op | Purpose |
|---|---|
set_var |
Write vars[key] = value (schema-validated). |
inc_var |
Increment integer var (default delta 1). |
append_var |
Append to array var. |
merge_var |
Merge object into object var. |
parse_json |
Parse a string-shaped value into JSON; strips ```json fences. |
shell |
Sandboxed shell command (gated via packet policy in v-next; today reads cwd from meta.worktree). |
worktree_create |
git worktree add with smart branch reuse (see below). |
worktree_remove |
git worktree remove; idempotent on missing path. No rm -rf fallback - surfaces git's error if the path is not a git-tracked worktree. |
set_meta |
Update meta.worktree directly (only mutable meta field today). |
http_json |
Generic HTTP request → response into_var. Workflow author composes URL/headers/body using ${env.X} + ${vars.X} templates to express any code-host integration without baking platform-specific ops into the engine. response_kind: json (default - parse, error on non-JSON), text (capture body as-is, e.g. for .diff URLs), or auto (try JSON, fall back to text). expect_status: optional override of the default 2xx success window. |
find_first |
Find the first element of an array variable whose nested fields all equal the targets in where, write into_var. Writes Value::Null (not Err) when no match - downstream IsNull / IsNonNull packets branch cleanly. Composable primitive for "find existing PR for this branch" / "find label by name" without baking platform-specific search ops or relying on upstream API filters that may be broken. where keys use dotted paths (e.g. head.ref). |
mcp_call |
Outbound MCP tool call - sibling of http_json but speaking MCP JSON-RPC. Resolves args.server against the existing bro_mcp registry (global ~/.bro/mcp.json + project overlay), opens a transient client (stdio child-process or streamable HTTP), invokes args.tool with args.arguments, captures the result into vars[into_var]. Tool errors (is_error: true) become op failures so on_failure fires. Result normalization preserves typing when structured_content is present, falls back to JSON-parsed text content otherwise. Args: {server, tool, arguments?, timeout_secs?} (default 300s). Use for engine-level grounding calls - sast_run / sast_findings against biofilter, bbox_thread / bbox_note against blackbox-self - instead of dispatching a bro just to make a tool call. |
worktree_create semantics¶
Three real cases the op handles:
- Branch absent →
git worktree add -b <branch> <path> <base>(fresh). - Branch present, no worktree references it →
git worktree add <path> <branch>(REUSE - common after a prior arc died withcleanup-policy=keep-on-fail). - Branch present AND another worktree has it checked out → fail loudly with the occupant path. Suggests including
${meta.arc_id}in the branch name for concurrent-arc-safe naming.
Failure semantics¶
on_failure per hook decides what an op error means:
halt(default) - abort the arc with the op's error.warn- log + write asurprisenote on the arc thread, continue.ignore- log only, continue.
Hooks cannot change which next node runs - that authority stays with the gate packet. Hooks are guarded side effects, not control flow. If you need conditional branching, that's a node + gate.
Wait nodes¶
WaitSpec suspends the arc until one of any_of signals arrives or
the timeout fires.
"AwaitFeedbackOrMerge": {
"wait": {
"any_of": [
{ "signal": "pr-feedback", "correlate": { "pr": "${vars.pr_number}" } },
{ "signal": "pr-merged", "correlate": { "pr": "${vars.pr_number}" } }
],
"timeout": "7d"
},
"gate": "domain:workflow-gate/loop-or-exit"
}
Correlation¶
Each WaitSignal.correlate is a tuple of key → Extractor-Selector
expressions evaluated against the current ArcContext at Wait
registration. The canonical form (sorted JSON keys) is the second
component of the (signal_name, correlation) → (arc_id, wait_id)
index in the daemon's WaitStore. When a signal arrives, the router
walks pending waits whose correlate is a subset-match against the
signal's payload-correlation map. Empty correlation = broadcast:
matches anything that signals the same name.
Resolution¶
When a matching signal arrives:
- The router pops the first-matching wait from the store.
- Writes a
SignalRef { name, payload, correlation, received_at }into the wait's resolved-slot. - Wakes the arc via the wait's
Notify. - Sibling waits in the same
any_ofare removed from the store (race-loser cleanup - only the first signal wins).
The arc resumes, populates ctx.last_signal + appends to
signal_history, and proceeds to the gate (which can branch on
last_signal.name, last_signal.payload.*, etc.).
Timeout¶
If timeout fires before any signal:
- All sibling waits are removed.
- A synthetic signal is recorded:
SignalRef { name: "__timeout__", payload: { expired: ["sig1", "sig2"] } }. - Gate evaluates as normal (route via
Eq{field: "last_signal.name", value: "__timeout__"}).
timeout accepts 30s, 5m, 1h, 7d - any decimal followed by a
unit suffix. Absent timeout = wait indefinitely (suitable for
never-firing-but-cancellable arcs).
Subworkflows¶
Two ways to compose a subworkflow into a node:
Inline¶
"Implement": {
"subworkflow": { "name": "...", "version": 1, "actors": { ... }, ... },
"imports": ["issue_number"],
"exports": ["pr_number"]
}
The full sub-spec is embedded. Compiled at parent compile time (errors surface immediately).
By id¶
"Implement": {
"subworkflow_ref": "implementer-arc",
"imports": ["issue_number"],
"exports": ["pr_number"]
}
Resolved at dispatch time against the daemon's workflow registry.
Lets the same sub be referenced multiple times (e.g. keystone calls
implementer-arc once for initial impl, calls a peer
implementer-feedback-arc later for revisions) without inlining the
spec twice. Install the referenced workflow with bro_workflow_install
or POST /admin/workflow/install.
Imports / exports / renames¶
The sub gets its own fresh ArcContext. imports copies parent vars
in by name. import_renames is for the cases where the parent's
shape doesn't match the sub's expected names:
"RunInner": {
"subworkflow_ref": "issue-to-merged-pr",
"imports": ["repo", "owner"],
"import_renames": { "issue_number": "next_issue.0.number" }
}
Right-hand side is an Extractor path (dotted). Evaluated against the parent's ArcContext flatten.
exports declares which sub vars get promoted back into the parent
on sub completion. Missing exports are a runtime error - if you
declare exports: ["pr_number"] and the sub never sets vars.pr_number,
the parent fails. This is the contract.
meta.parent_arc_id propagates so audit walks back up the call tree.
Composition depth is capped at MAX_COMPOSITION_DEPTH = 5; exceeding
errors out at the offending nest.
Gates and branch transitions¶
A gate packet runs after the node body, against the flattened
ArcContext (with node_output exposed at the top level for legacy
rules). The classification becomes the verdict; the same node's
next.branch.cases selects the target whose key matches.
{
"Decide": {
"actor": "worker",
"prompt": "...",
"gate": "packet-decide",
"next": {
"type": "branch",
"cases": { "yes": "Yes", "no": "No" },
"default": "No" // optional fallback for unmatched verdicts
}
},
"Yes": { "actor": "worker", "next": { "type": "terminal" } },
"No": { "actor": "worker", "next": { "type": "terminal" } }
}
Two modes:
gate_mode: "first"(default) - first-matching rule wins, single verdict.gate_mode: "all"- every matching rule fires, aggregate verdict is the lattice-highest-priority classification across all findings. Findings are surfaced as alearnednote on the arc thread and exposed viabro orchestrate status.
Back-edges in the graph become natural retry loops. Each visit bumps
visit_counts[node]; exceeding retry.max_generations halts the
arc. Retried prompts get [retry - attempt N, prior gate verdict: X]
prepended automatically.
Workflow-level policy packets¶
policy_packet: <id> runs at every node boundary against the
flattened ArcContext (with arc-shape additions: step, just_ran,
next, completed, in_flight, visit_counts). Classifications
are arc-level verdicts:
halt- stop the arc immediately (error exit, reason posted to thread).escalate- write ablockednote, continue.warn- write asurprisenote, continue.- anything else - no-op.
This is the mechanization of the advisor loop: instead of dispatching an LLM at every boundary to read the checkpoint and judge, compile those rules into a packet once and let them evaluate deterministically. Useful for runaway-visit detectors, time / step ceilings, arc-shape invariants.
Domain-shaped packet refs¶
Anywhere a workflow names a packet (gate, policy_packet,
hook.when, webhook.routing_packet), accept either:
packet-XXXXXXXX- pinned exact compile.domain:<name>- resolves to the most-recently-compiled packet matching<name>.
Domain refs let workflow + webhook specs survive packet recompiles
without re-edit. Audit events name the resolved packet-id so you
always know which compile actually fired.
Inlets - webhook, poller, cron¶
Three inlet primitives, all converging on the same dispatch pipeline:
<inlet emits a flat entity>
→ extractor.project(payload) [JSON → flat entity]
→ routing_packet.apply(entity) [verdict]
→ routing::resolve_entity_template [substitute ${entity.X} with typed values]
→ RoutingVerdict::parse [typed verdict]
→ dispatch_verdict(...) [start_arc | signal_arc | cancel_arc | ignore | dead_letter]
The shared dispatch path is crate::dispatch_routed_event. Routing
rules don't know which inlet fed them - they see a flat entity and a
routing-packet id, the rest is the engine's problem. Pick the inlet
that matches your trigger source:
| Inlet | Trigger source | Spec lives in | When to use |
|---|---|---|---|
| Webhook | External HTTP POST + signature | webhooks/<name>.json |
Upstream pushes to you (Forgejo / GitHub / Stripe / generic JSON). Lower latency than polling, but needs ingress. |
| Poller | Scheduled HTTP fetch (data rides on the tick) | pollers/<name>.json |
Upstream is poll-only OR you have no public ingress. Carries an HttpFetchSpec + optional iterate for array responses + dedup ring. |
| Cron | Calendar / clock (no fetch - entity is operator-supplied payload) |
crons/<name>.json |
Trigger is time-based, not event-based. Nightly maintenance, hourly sweeps, scheduled SAST squashing. The dispatched arc does its own data acquisition (typically via mcp_call hooks). Concurrency cap (default 1) skips ticks while a prior arc is still in flight. |
Webhook¶
Webhooks are operator-installed (signature scheme + Extractor + routing packet). The pipeline:
HTTP POST /webhook/<name>
→ verify_signature(secret) [HmacSha256 (configurable header
+ optional `prefix=`) | None
(loopback-bind only)]
→ idempotency dedup [delivery-id ring per webhook]
→ extractor.project(payload) [JSON → flat entity]
→ routing_packet.apply(entity) [verdict]
→ dispatch_verdict(...) [start_arc | signal_arc | cancel_arc | ignore | dead_letter]
Extractor¶
Tiny AST that projects a webhook payload (plus _headers map for
header-driven routing - operator names the header in their extractor;
the engine doesn't know the sender) into a flat entity:
"extractor": {
"outputs": {
"event": { "kind": "json_path", "path": "$._headers.x-gitea-event" },
"action": { "kind": "json_path", "path": "$.action" },
"issue_number": {
"kind": "default",
"inner": { "kind": "json_path", "path": "$.issue.number" },
"fallback": null
},
"owner": { "kind": "json_path", "path": "$.repository.owner.login" },
"repo": { "kind": "json_path", "path": "$.repository.name" }
}
}
Selector kinds: json_path, const, default { inner, fallback },
concat { parts: [...] }, coalesce { sources: [...] } (first
non-null wins; lets one extractor field cover divergent payload
shapes - e.g. Forgejo's pull_request_review puts the comment text
at .review.body while pull_request_review_comment uses
.comment.body). Deliberately small - no transformations (regex,
case folding, math). Those belong in packet predicates or downstream
nodes.
Routing verdicts¶
The routing packet is a normal rule packet whose consequent is
either a JSON-encoded string (since Rule.consequent is the typed
packets::Value enum, structured shapes go through string-encoding):
{
"id": "start_arc_on_issue_opened",
"classification": "start_arc",
"antecedent": { "op": "All", "args": [
{ "op": "Eq", "field": "event", "value": "issues" },
{ "op": "Eq", "field": "action", "value": "opened" }
]},
"consequent": "{\"route\":\"start_arc\",\"workflow\":\"issue-to-merged-pr\",\"initial_vars\":{}}"
}
Or a string shorthand for ignore / dead_letter.
Verdict shapes:
{ "route": "start_arc", "workflow": "<id>", "initial_vars": { ... } }
{ "route": "signal_arc", "signal": "pr-merged", "correlate": { "pr": 42 } }
{ "route": "cancel_arc", "correlate": { "pr": 42 } } // not yet implemented
{ "route": "ignore" }
{ "route": "dead_letter", "reason": "..." }
start_arc semantics¶
When a webhook resolves to start_arc, the engine:
- Resolves the workflow by id from the registry.
- Compiles + capability-validates.
- Builds
merged_vars: extracted entity (excluding_headersandnullvalues) overlaid by the verdict's explicitinitial_vars(last-writer wins). - Resolves
project_dir: ${WEBHOOK_NAME_UPPER}_PROJECT_DIRenv overrideWebhookSpec.default_project_dirNone(worktree hooks fail loudly - better than silent fallback to cwd).- Spawns the arc in a background task. Webhook returns immediately
with
{status: "arc_started", workflow: <id>}.
Default-deny¶
No-match on the routing packet is dead-lettered, not ignored. The extracted entity is included in the response for forensics.
Replay¶
POST /webhook/<name>/replay with the same payload runs extractor →
routing packet → verdict without dispatching. Use to debug routing
rules without firing arcs.
Poller¶
Scheduled HTTP-source inlet - operationally "a webhook whose source is a tokio interval pulling from a URL instead of an inbound POST." Spec shape:
{
"name": "forgejo-open-issues",
"every_seconds": 120,
"source": {
"method": "GET",
"url": "${env.FORGEJO_BASE_URL}/api/v1/repos/owner/repo/issues?state=open",
"headers": { "Accept": "application/json" }
},
"iterate": { "kind": "json_path", "path": "$" },
"extractor": { "outputs": { ... } },
"dedup_id_path": { "kind": "json_path", "path": "$.id" },
"routing_packet": "domain:webhook-routing/forgejo",
"default_project_dir": "/path/to/local/clone"
}
every_secondsis clamped aboveBBOX_POLLER_MIN_INTERVAL_SECS(default 5s) - operators can't accidentally hammer an upstream.sourceis the sameHttpFetchSpecshape the workflowhttp_jsonop consumes.${env.X}is resolved at spec-load time (no per-tick ArcContext to resolve${vars.X}against - credentials live in env).iterate(optional) - array path; each element becomes its own event (extractor runs per-element). Absent → whole response is one event.dedup_id_path(optional) - stable id per item; in-memory recent- seen ring (1024 cap, per-poller, resets on daemon restart). Repeated items are dropped before dispatch.
Manage with bro_poller_install / bro_poller_list MCP tools, or the
plain-HTTP /admin/poller/install endpoint. Persisted under
store_dir/pollers/<name>.json; restored on daemon startup.
Cron¶
Calendar-driven inlet - sibling of webhook + poller. Distinction from
poller: poller fetches HTTP per tick (data rides on the tick), cron
carries no fetch (entity is operator-supplied payload plus synthetic
cron_name + tick_at fields). Use cron when the trigger is time-
based and the dispatched arc itself does the data acquisition.
{
"name": "sastquatch-daily",
"schedule": "0 0 9 * * *", // 6-field: sec min hour dom mon dow
"payload": {
"owner": "sastquatch-admin",
"repo": "quat"
},
"concurrency": 1, // 0 disables the cap
"routing_packet": "domain:cron-routing/sastquatch",
"default_project_dir": "/path/to/local/clone"
}
scheduleuses thecroncrate's 6- or 7-field form (seconds-first; prefix a classic 5-field cron with0to run-at-second-0). Validated at install time.bro_cron_upcomingis a pure helper that returns the next N scheduled times for a candidate expression - use to sanity- check before installing.payloadis operator-supplied entity fields. Syntheticcron_nameandtick_at(RFC3339 UTC) are merged in at tick time so routing rules can discriminate without operator boilerplate. Operator-supplied keys win on collision.concurrencycaps in-flight arcs spawned by this cron. Default 1 (skip ticks while a prior arc is still running - the most common case for daily sweeps). Set 0 to lift the cap. The counter decrements when the dispatched arc terminates; failed dispatches refund immediately.- The routing-packet entity sees
{cron_name, tick_at, ...payload}. Most cron arcs route tostart_arcunconditionally, but you can branch oncron_nameif multiple crons share a routing packet.
Manage with bro_cron_install / bro_cron_list / bro_cron_upcoming
MCP tools, or the plain-HTTP /admin/cron/install endpoint. Persisted
under store_dir/crons/<name>.json; restored on daemon startup.
Inlet selection cheat sheet¶
- Event arrives at known time, low frequency, you control the source → webhook.
- Event arrives at known time but you can't push (closed network, polling-only API) → poller.
- Trigger is time-based, no upstream event to react to → cron.
- Resilience layering → webhook + poller against the same upstream, accept the duplicate-dispatch cost (workflow-level idempotency catches it).
- Event-driven AND time-bounded (e.g. "open a PR but auto-close after 7 days") → webhook for the open, cron sweeping for the close.
Whiteboards¶
Multi-agent deliberation surface, first-class in the engine. Posts (proposals / claims / concerns / informational), annotations (challenge / corroborate / resolve / validation), and votes accumulate on a board, advanced through phases (blind → read → validate → debate → resolve → archived) by a facilitator-or-operator role.
Where webhooks and crons are inlet primitives (events arriving into the engine), whiteboards are a deliberation primitive - a shared structured log that any audience can read and write through the same MCP surface:
- In-workflow ensemble specialists. Their dispatched bro
prompts instruct them to call
whiteboard_post(and laterwhiteboard_annotate/whiteboard_vote) from inside their turn. Each member's brofile lens carries itsagent_nameso the bro knows which name to use without prompt-side per-member interpolation. - In-workflow facilitators (single bro, executor mechanics).
Drive phase transitions via
whiteboard_transition. Phase transitions emit aboard-transitionedsignal correlated to(board, target_phase)through the shareddispatch_routed_eventpipeline - same machinery webhooks use. - External agents - operator's Claude session, dispatched help,
eventually humans through slack / ntfy adapters. Read board
state via
whiteboard_state, act via the same write tools. The board IS the human-in-the-loop surface; no separate escalation registry, nouseractor type.
Spec shape¶
Whiteboards aren't operator-installed like webhooks/pollers/crons -
they're created on demand by workflows (via mcp_call → whiteboard_open)
or by external clients (via whiteboard_open MCP tool directly). The
board id is the operator's choice; convention is
<workflow-name>-<arc_id> or a deliberation-specific slug.
// In a workflow node's on_enter:
{
"op": "mcp_call",
"args": {
"server": "blackbox",
"tool": "whiteboard_open",
"arguments": {
"board_id": "${vars.board_id}",
"topic": "ADR #${vars.issue_number}: ${vars.issue_title}",
"project": "${vars.worktree_path}",
"arc_thread_id": "${meta.arc_thread_id}",
"opened_by": "facilitator"
}
}
}
Phase protocol¶
| Phase | Allowed actions |
|---|---|
blind |
post (specialists post without seeing others) |
read |
(none - observation phase, advance when ready) |
validate |
annotate (validation only, with required result) |
debate |
annotate (challenge / corroborate / resolve), vote |
resolve |
(none - frozen, facilitator synthesizes outside) |
archived |
(terminal - board moved to archive/ directory) |
Skip rule: read → debate is legal (skip validate). All other
transitions follow the canonical order.
Conflict detection¶
whiteboard_conflicts evaluates the board state and returns three
kinds of conflicts:
- direct_overlap - two posts target the same
target_file+target_location - cascade_collision - post A's
cascade_targetsincludes post B's direct target - severity_disagreement - two posts share a
finding_refbut disagree onseverity
This generalizes phaser's domain-specific shapes; the operator's posts decide whether the conflict types apply by setting (or not setting) the relevant fields.
Resume on phase transition¶
A workflow can suspend on a board's phase advance using the existing
wait primitive - no new variant needed:
"AwaitResolve": {
"actor": "",
"wait": {
"any_of": [
{
"signal": "board-transitioned",
"correlate": {
"board": { "kind": "json_path", "path": "vars.board_id" },
"phase": { "kind": "const", "value": "resolve" }
}
}
],
"timeout": "1h"
},
"next": { "type": "goto", "to": "Synthesize" }
}
The transition tool fires the signal; the wait correlates on
(board, phase); routing-pipeline machinery resolves the wait and
the arc resumes. Same shape webhook ingress uses to resume on PR
events.
Persistence¶
Boards are JSON files under $store_dir/whiteboards/<id>.json,
atomically written via tempfile + rename, restored on daemon
startup. Archived boards move to $store_dir/whiteboards/archive/
and are not loaded back into memory - they exist for audit only.
Where to dispatch what¶
| Surface | Who calls | Why |
|---|---|---|
whiteboard_open from on_enter hook (mcp_call) |
Engine, on Setup | Predictable id, deterministic registration |
whiteboard_register from on_enter hook (mcp_call) |
Engine, on Setup | All specialists registered up-front so role checks work |
whiteboard_post / whiteboard_annotate / whiteboard_vote from inside dispatched bro |
Specialist (LLM) | The structured deliberation IS the agent's output |
whiteboard_transition from on_enter hook (mcp_call) |
Engine, on phase-advance node | Deterministic - facilitator's "decide to advance" is part of arc shape, not dispatched |
whiteboard_state / whiteboard_summarize |
Anyone (specialist mid-turn, facilitator at synthesis, external Claude joining) | Read-only inspection |
whiteboard_archive from on_enter of Done |
Engine, terminal | Strip live state after the artifact ships |
Comparison with phaser¶
The engine's whiteboard machinery absorbs phaser's mk0 protocol (blind → read → debate → resolve, posts / annotations / votes, conflict detection, auto-transition advisory). Phaser stays peer software in the daystrom-claude-plugins monorepo; bridgecrew / isolinear / ARS keep using the stdio MCP until they migrate. The engine version is a same-shape superset accessible in-process.
Operator-blessed registries¶
Three registries, all persisted to disk and restored on daemon startup. Workflow specs reference packets by domain and webhooks by name; specs themselves never inline secrets or routing logic.
| Registry | On-disk path | Install MCP tool | Plain-HTTP admin endpoint |
|---|---|---|---|
| Workflows | ${BRO_HOME}/workflows/<id>.json |
bro_workflow_install |
POST /admin/workflow/install |
| Webhooks | ${BRO_HOME}/webhooks/<name>.json |
bro_webhook_install |
POST /admin/webhook/install |
| Packets | ${PACKETS_PATH}/<scope>/<packet-id>.json |
bbox_compile (MCP) |
POST /admin/packet/compile |
Brofiles + teams have their own MCP surface (bro_brofile, bro_team)
plus admin shortcuts (/admin/brofile/upsert, /admin/team/upsert)
for install scripts that can't easily speak rmcp's streamable-HTTP
transport.
The admin endpoints are loopback-only by default (BBOX_BIND=127.0.0.1).
Set BBOX_BIND=0.0.0.0 if you need a Docker container on the same
host (Forgejo/Gitea webhook → http://172.17.0.1:7264/webhook/...).
Closed-network assumption applies in 0.0.0.0 mode. The dev daemon
template ships at 0.0.0.0; prod stays at loopback.
Audit and observability¶
Every bro orchestrate run opens a bbox_thread(kind=work_item). The
returned arc_thread_id is the audit handle.
For direct task supervision without workflow state, use Bro Runtime. For inbox and note hygiene, use Knowledge Store.
| Surface | What it shows |
|---|---|
bro orchestrate peek [<thread-id>] |
Live in-flight state from the running_arcs registry. No id = all. |
bro orchestrate status <thread-id> |
Note trail + latest compaction anchor for one arc. |
bro orchestrate list [--limit N] |
Recent arcs catalog with final status + anchor. |
bbox_notes(thread_id=<arc>) |
Every structured note the engine wrote (kind: done / learned / surprise / blocked). |
bbox_inbox |
Arcs currently flagged for attention (failed, blocked, etc.). |
bro_arc_status(arc_id=<id>) |
Snapshot + pending-wait registrations for one arc. |
bro_signals(signal=, since=, outcome=) |
Recent signal-dispatch events as a bounded ring buffer. Each entry: (timestamp, signal, correlation, outcome, matched_arc_id, idle_pending). On outcome=no_matching_wait the idle_pending snapshot shows what waits were registered with the same signal name but didn't match - the diff between what arrived and what was waiting is one read away. |
bro_webhook_deliveries(name=, since=, verdict_classification=) |
Recent webhook deliveries (live + replay). Each entry: extracted entity, routing verdict classification, response. Filter by name to focus on one inlet. |
bro_webhook_replay(name, body, headers) |
Replay a synthetic payload through an installed webhook's extractor + routing packet WITHOUT dispatching. Returns the extracted entity + verdict consequent (after ${entity.X} substitution). Recorded into the same delivery buffer with source: replay. |
bro_arc_cancel(arc_id=<id>) |
Trip an arc's cancellation token. Runner observes between node iterations and inside Wait suspensions, exits with status cancelled, runs on_arc_cancel + on_arc_exit. |
HTTP GET /tail (SSE) |
Live event stream from the daemon - every node_dispatch / hook_ok / wait_registered / gate_applied / ... |
Compaction anchors: at every node boundary the engine writes a
rolling ANCHOR [step N, just-ran='X', next='Y']: completed=[...]
in_flight=[...] verdict=Z visits=[...] note. Lets observers
reconstruct arc state without reading every event.
Lifecycle¶
spawn arc
→ seed initial_vars (schema-validated)
→ open arc thread (bbox_thread kind=work_item)
→ walk graph from [*]
→ for each activity node:
on_enter hooks (gated)
dispatch (actor / wait / subworkflow)
on_exit hooks (gated) ← runs BEFORE gate
gate packet (if any) ← chooses next-node label
→ policy_packet at every boundary (halt/escalate/warn/continue)
→ terminal state (success / failed / cancelled / timeout)
→ set meta.arc_outcome
→ on_arc_cancel hooks (only if cancelled)
→ on_arc_exit hooks (always)
→ return WorkflowRunResult { status, vars, outputs, events, arc_thread_id }
Currently implemented¶
- ArcContext (vars / typed outputs / meta / last_signal / history) with full templating including
${env.X}for credentials - Routing verdicts can carry typed correlation tuples + payload via
${entity.X}template substitution (resolved against the extracted entity before verdict parse) - same${X}shape the workflow templater uses, applied to a different scope - Three convergent event inlets: webhook (signed inbound POST), poller
(scheduled HTTP fetch - data rides on the tick), cron (calendar-
driven, no fetch - entity is operator-supplied
payloadplus syntheticcron_name+tick_at). All three extract → route → dispatch through one shareddispatch_routed_eventso a workflow doesn't care how it was triggered. Pollers reuseHttpFetchSpec(same primitive thehttp_jsonop consumes), explode array responses viaiterate, dedup by operator-named id field, and resolve${env.X}in source URL/headers per-tick (no secrets persisted to disk). Crons use thecroncrate's 6-field expression form, support aconcurrencycap (default 1, skip ticks while a prior arc is in flight; set 0 to disable), and decrement the in-flight counter when the dispatched arc terminates. - Hook lifecycle: on_enter, on_exit, on_arc_exit, on_arc_cancel
- Op catalog: SetVar, IncVar, AppendVar, MergeVar, ParseJson, Shell,
WorktreeCreate (with smart branch reuse), WorktreeRemove, SetMeta,
HttpJson (generic HTTP;
response_kind: json|text|auto), FindFirst (array search by dotted-path equality - composable primitive for idempotent lookups), McpCall (outbound MCP tool call - JSON-RPC over stdio child-process or streamable HTTP; resolves server name through the existingbro_mcpregistry; lets hooks inject deterministic tool results without dispatching a bro). Engine carries no platform knowledge; workflow author composes generic ops +${env.X}+${vars.X}to express any code-host integration. - Hook gating via
when: domain:...+on_failure: halt|warn|ignore(gate-packet errors route throughon_failuretoo - no silent skips) - Terminal hook
haltfailures rewritemeta.arc_outcometofailed: ...so cleanup failures are visible in the arc summary - Wait nodes with
any_ofrace + correlation tuples + timeout (synthetic__timeout__signal) - Subworkflows: inline + by-id registry refs + imports/exports/renames; capability validation runs at dispatch, not just install
- Capability tags: provider catalog + actor
requires+ workflow-compile validation - Domain-shaped packet refs (
domain:...) - Hook-only nodes (empty
actorfield) for Setup / Done patterns - Webhook ingress: HmacSha256 (configurable header + optional
prefix=forsha256=style senders), None for loopback-only testing, idempotency dedup, replay endpoint, default-deny on no-match - Operator-blessed registries (workflows / webhooks) + admin HTTP
- Actor kinds: just
executorandensemble. Persona / role / contract (advisor, planner, triager, facilitator, specialist, reviewer, aggregator, …) is brofile-lens + prompt + on_exitparse_jsonvalidation, never an engine type. Emptyactor: ""for hook-only / pure-routing nodes. - Whiteboards: in-engine multi-agent deliberation primitive (port of
phaser's mk0 protocol). 10
whiteboard_*MCP tools, three audiences (in-workflow ensemble specialists / facilitator, external Claude operators, eventual humans through slack/ntfy adapters) sharing one surface. Phase transitions emitboard-transitionedsignals through the same dispatch_routed_event pipeline webhooks use, so workflows resume on transitions via the existingwaitprimitive correlated to(board, target_phase)- no new wait variant needed. Replaces the deprecateduseractor kind: humans-in-the-loop join boards as agents, no special engine type for them. - Transition shapes:
goto(forward edges + back-edges),branch(verdict-routed multi-way with optionaldefault),fork(fire-and-forget side-dispatch +continue_to) paired withwait_foron the join node for fan-in,terminal(arc end) - Gate modes:
first(single verdict),all(multi-finding aggregate) - Workflow-level
policy_packet(advisor-as-packet) late_injectfor fire-and-forget → join-on-next-turn- Per-actor
durable: truesession continuity +compaction_anchor: true - Arc threads + structured notes + rolling compaction anchors
bro_arc_signal/bro_arc_statusMCP tools--dry-runvalidates + summarizes without dispatchingbro orchestrate run --streamSSE event firehose during execution
Phase-next¶
bro orchestrate resume <thread-id>- genuine re-entry at the last recorded step for arcs that paused or errored. Needs persistent full-output snapshots to survive daemon restarts.- Disk-backed
WaitStore(currently in-memory) so suspended arcs survive daemon restart. - Disk-backed
running_arcsregistry sopeekworks after a restart. - YAML workflow loader (JSON-only today; one-line add once
serde_yamlis introduced). cancel_arcrouting verdict actually wired through.- Push-storm debouncing (real deployments want a
vars.review_in_progressflag the routing packet checks). Pattern documented; not yet wired in the keystone example. - Sandboxed
Shellop gated via packet policy (today: shell runs withcwdfrommeta.worktreebut no allowlist enforcement).
Examples¶
examples/whiteboard/- end-to-end: ADR-tagged issue webhook → 3-specialist ensemble posts blind to whiteboard → debate (annotate + vote) → facilitator synthesizes ADR markdown → PR opens → auto-merge. The reference arc for the whiteboard primitive + multi-round durable ensemble.examples/sastquatch/- end-to-end: cron tick → analyzer (mcp_call → biofilter sast_*; executor picks a finding cluster) → fixer subworkflow → wait → ensemble reviewer → loop on feedback → auto-merge. The reference arc for cron + mcp_call.examples/keystone/- end-to-end: Forgejo webhook → arc → implementer subworkflow → wait → reviewer ensemble → wait-loop until merged → cleanup hooks. Real LLM dispatch. Full layout + adaptation guide in the example's README.examples/workflows/- smaller shape catalog: linear, gated, ensemble, fork-join, blind-convergence, optimistic-review, self-audit.
Authoring loops¶
If you don't want to hand-write specs:
bro_orchestrate_author(charter, brofile, hint?)- prose-charter to validated spec via an authoring LLM. Auto-retries on compile failure with the error appended.- Always pair with
bro_orchestrate_run(workflow, dry_run=true)to validate before dispatching.
For runtime debugging - the canonical "an arc is stuck, why?" loop:
bro_arc_status- confirm the arc is parked, see which node, see the registered wait correlations.bro_signals(signal=<name>)- did the signal the arc is waiting on actually arrive?outcome=matchedmeans the wait resolved (arc should have advanced; if it didn't, look at the gate).outcome=no_matching_waitmeans the signal arrived but its correlation didn't match any pending wait -idle_pendingshows what was waiting at dispatch time, the diff between that and the signal's correlation IS the bug (typedpr: 24vs stringpr: "24"is the classic).bro_webhook_deliveries(name=<webhook>)- if the signal never arrived, walk back one step. Did the webhook actually arrive? What did the routing packet classify it as?verdict_classificationofignore/no_matchfor an event you expected to route reveals a missing or mis-shaped routing rule.extracted_entityshows what the extractor projected - useful when the routing rule isn't matching because the event field's actual value differs from what the rule expects (Forgejo sendsaction: synchronizednotsynchronize, etc.).bro_webhook_replay(name, body, headers)- once you suspect a routing-rule fix, replay a synthetic payload through the same path the live webhook would take, see the verdict, iterate without needing the upstream to fire a real event.bbox_notes(thread_id=<arc>)- the arc's audit trail with structured notes (done / learned / surprise / blocked) and rollingANCHORcompaction summaries.
Control:
bro_arc_cancel(arc_id)- manually stop a runaway / mis-dispatched / no-longer-relevant arc. Cleanup hooks fire automatically.cancel_arcrouting verdict - emit from a routing packet to cancel arcs by correlation tuple (e.g. an upstream "PR closed without merge" event cancelling the arc that was waiting on its merge).