Speculative Execution for Code¶
Status: Shipped — all 8 tools implemented and CI-tested across 8 languages (TestSpeculativeSessions in test/speculative_test.go: Go, TypeScript, Python, Rust, C++, C#, Dart, Java)
Tools: create_simulation_session, simulate_edit, evaluate_session, simulate_chain, commit_session, discard_session, destroy_session, simulate_edit_atomic
Prerequisites¶
Call start_lsp with root_dir set to your workspace before using any simulation tools. The language server must be initialized and pointing at the correct workspace root for diagnostics to be meaningful.
Simulation tools create sessions on the currently-running language server. If start_lsp has not been called (or was called with a different workspace), session results will be empty or incorrect.
Position convention¶
All start_line, start_column, end_line, and end_column parameters are 1-indexed — the same as line numbers shown by cat -n and most editors. The extractRange helper in the codebase converts these to the 0-indexed values the LSP protocol requires. Do not subtract 1 before passing positions to simulation tools.
Quick start¶
The simplest path: use simulate_edit_atomic for a single speculative edit. It handles the full session lifecycle internally — no session ID to track, file on disk is never modified.
start_lsp(root_dir="/your/workspace")
simulate_edit_atomic(
workspace_root="/your/workspace",
language="go",
file_path="/your/workspace/pkg/handler.go",
start_line=42, start_column=1,
end_line=42, end_column=20,
new_text="replacement text",
scope="file",
timeout_ms=5000
)
→ {"errors_introduced": null, "errors_resolved": null, "net_delta": 0, "confidence": "high"}
net_delta: 0 means no new errors were introduced — safe to apply. A positive net_delta means the edit would break things; inspect errors_introduced for details.
The Idea¶
LSP today is a query engine — agents ask what exists and react to what they find. This makes AI-assisted editing inherently trial-and-error: edit, discover breakage, fix, repeat.
Speculative code sessions turn LSP into a simulation engine. Create an isolated semantic workspace, apply hypothetical changes, evaluate the resulting diagnostic state, then commit or discard — without ever touching disk.
Current workflow: edit → discover breakage → fix → repeat
With sessions: create session → mutate → evaluate → decide → commit once (or discard)
The valuable primitive is not "preview one edit." It is:
create an isolated semantic future of the codebase
This is the first agent-native primitive in agent-lsp. Everything else (navigation, diagnostics, hover) is LSP exposed. This is new capability: isolated state → mutation → evaluation → commit or discard.
Session Lifecycle¶
A speculative code session is an isolated semantic workspace rooted in a baseline code state.
create_simulation_session(workspace_root, language)
→ session_id
simulate_edit(session_id, file_path, range, new_text)
→ edit_result
evaluate_session(session_id)
→ evaluation_result
[optional: additional simulate_edit calls]
commit_session(session_id, target?) OR discard_session(session_id)
→ commit_result → ok
destroy_session(session_id)
→ ok
Session state persists across operations. A session accumulates speculative edits and maintains its own diagnostic snapshot. Multiple sessions may exist in parallel, each with independent state.
Session Phases¶
| Phase | Entered by | Exits to |
|---|---|---|
created |
create_simulation_session |
mutated, evaluating |
mutated |
simulate_edit |
mutated, evaluating |
evaluating |
evaluate_session |
evaluated (timeout sets confidence: "partial", not a separate state) |
evaluated |
evaluation completes | mutated, committed, discarded |
committed |
commit_session |
destroyed |
discarded |
discard_session |
destroyed |
dirty |
revert failure or version mismatch | — (terminal, requires destroy) |
A dirty session must not be committed. Call destroy_session to clean up.
Session State Model¶
A session holds:
- baseline_ref — the workspace state at session creation (read-only within session)
- isolated LSP semantic state — in-memory document buffers managed by the session
- document versions — per-document version counter, monotonically increasing
- accumulated speculative edits — ordered list of edits applied within the session
- diagnostics snapshot — latest diagnostic state after most recent evaluation
- session status — one of:
created,mutated,evaluating,evaluated,committed,discarded,dirty,destroyed - session_id — UUID assigned at creation, used for tracing
The baseline is the code state at the moment create_simulation_session is called. It is immutable from the session's perspective — the session can only mutate its own overlay.
Isolation Model¶
Session isolation is per-session, not per-call.
- One session must not observe another session's speculative state
- The baseline is conceptually shared (read-only); speculative overlays are session-local
- Commit materializes one session's state; discard removes it without side effects
- No cross-session visibility at any point
Logical isolation vs physical isolation¶
This is the primary unresolved architectural tension.
Logical isolation (current design): a single LSP server instance handles all sessions. Concurrent sessions on the same server are serialized — only one session may hold mutated in-memory state at a time. The mutex enforces ordering; sessions do not run truly in parallel against the same server.
session_a and session_b on same server:
→ session_a acquires lock, mutates, evaluates, reverts, releases
→ session_b acquires lock (was blocked), mutates, evaluates, reverts, releases
This provides correct results and no state leakage, but sessions are sequential, not parallel.
Physical isolation (future path): each session gets its own LSP server instance. Sessions run truly in parallel with no serialization. Cost: LSP startup per session (~1-3s, memory per server), which makes it impractical for short-lived sessions.
Current choice: logical isolation.
Reasoning: for the primary use cases (single-agent planning, sequential comparison), serialization is not a bottleneck. The ~500ms per simulation is fast enough that the queue rarely matters. Physical isolation is the right upgrade path if parallel multi-agent simulation becomes a real workload.
This is explicitly documented, not hidden:
Speculative code sessions use serialized access to a shared language server to guarantee isolation. This provides deterministic behavior without the overhead of per-session LSP instances. True parallel execution with per-session language servers may be introduced in future versions if workload characteristics justify it.
Concurrent Session Semantics¶
Multiple sessions may exist simultaneously:
- Each session has independent semantic state
- Evaluation results are comparable across sessions (different strategies, same baseline)
- No cross-session visibility
- Sessions on different language servers may evaluate concurrently
- Sessions on the same language server are serialized within that server's scope (V1)
This enables consumers (like Scout-and-Wave) to run strategy comparison:
session_a = create_simulation_session(...) # strategy A
session_b = create_simulation_session(...) # strategy B
simulate_edit(session_a, edit_1a)
simulate_edit(session_b, edit_1b)
result_a = evaluate_session(session_a)
result_b = evaluate_session(session_b)
# compare result_a.net_delta vs result_b.net_delta
# pick the winner, commit that session
Evaluation Model¶
Mutation and observation are separate operations.
simulate_edit(session_id, edit) → edit_result
Mutates session state. Pushes textDocument/didChange to the language server's in-memory buffer. Does not evaluate diagnostics — returns only whether the edit was applied.
evaluate_session(session_id) → evaluation_result
Observes current session state. Calls WaitForDiagnostics, diffs against baseline, returns impact summary. Does not mutate state.
{
"session_id": "a3f2-...",
"errors_introduced": [{ "line": 42, "col": 5, "message": "cannot use string as int", "severity": "error" }],
"errors_resolved": [],
"net_delta": 1,
"scope": "file",
"confidence": "high",
"timeout": false,
"duration_ms": 412
}
A caller may call simulate_edit multiple times before calling evaluate_session. The evaluation reflects the cumulative state.
Atomic convenience wrapper: simulate_edit_atomic supports two modes. Standalone (session_id omitted): creates a temporary session, applies the edit, evaluates, then destroys — pass workspace_root + language. Existing session (session_id provided): applies the edit into an existing session and evaluates without destroying it. Returns an EvaluationResult directly. Useful for single-edit what-if checks without managing session IDs.
Commit Semantics¶
commit_session(session_id, target?) materializes the accumulated speculative state.
Functional vs imperative commit¶
Two models, both supported:
Functional (default): commit_session returns a WorkspaceEdit-compatible patch. The caller decides whether and how to apply it. No disk writes. Safe for CI, multi-agent orchestration, and any caller that wants to inspect the patch before applying.
Imperative (opt-in): pass apply: true (or a target path) to write files to disk directly. Equivalent to calling apply_edit on the returned patch, but in one step.
Default is functional — return patch only, no side effects. Callers opt into disk writes explicitly.
commit_session(session_id) # → WorkspaceEdit patch, no disk write
commit_session(session_id, apply: true) # → writes to disk + returns patch
commit_session(session_id, target: "/path") # → writes to target path + returns patch
This matters for: - CI — inspect patch, validate, then decide whether to apply - Multi-agent — one agent commits a patch, orchestrator applies after comparing - Safety — patch-only commit cannot corrupt workspace state
Commit constraints:
- Commit is only allowed from a session in
evaluatedormutatedstate - Commit is prohibited on
dirtysessions — the state may be corrupt - Commit is prohibited on
createdsessions — no edits have been applied - A timed-out evaluation does not block commit, but the session carries
confidence: "partial"
After commit:
- Session transitions to
committed - Session may not be mutated further
- Call
destroy_sessionto release resources
Discard:
discard_session(session_id) reverts all accumulated in-memory state and releases the session. Nothing is written to disk. Equivalent to rolling back a transaction.
Failure and Corruption Semantics¶
Per-operation failure behavior¶
| Operation | Failure | Behavior |
|---|---|---|
create_simulation_session |
Server unavailable | Return error; no session created |
simulate_edit |
Server rejects didChange |
Abort; session state unchanged; return error |
evaluate_session timeout |
Diagnostics did not settle | Return snapshot with confidence: "partial", timeout: true; session remains usable |
evaluate_session connection failure |
After mutation | Attempt internal revert; mark session dirty if revert fails |
commit_session |
Write failure | Return error; session state preserved; retry allowed |
discard_session |
Revert failure | Mark session dirty; error returned; call destroy_session to force cleanup |
| Concurrent mutation detected | Another didChange arrived during evaluation |
Mark result confidence: "partial"; session remains usable; do not retry automatically |
Session dirty state¶
A session becomes dirty when:
- An internal revert fails during
discard_session - A connection failure occurs while the session holds mutated state
- Document version tracking detects a gap (concurrent external mutation)
A dirty session:
- Must not be committed — state may not reflect intended mutations
- Must be destroyed via
destroy_session(forced cleanup) - Reports
session_dirty: trueon all subsequent operation calls
Guarantee: the system will not silently continue in a corrupted state. Any unrecoverable failure surfaces immediately.
Session Invariants¶
These must hold for every session, for every operation:
- Isolation — no other session may read or mutate this session's speculative state
- Baseline immutability — the baseline is read-only from the session's perspective; only the session's overlay is mutable
- Monotonic versioning — document versions are strictly increasing within a session;
N → N+1 → N+2 → ...; version never rolls back - No silent corruption — a session either holds valid state or is marked
dirty; there is no in-between - Evaluation reflects session state only —
evaluate_sessionreturns diagnostics caused by edits in this session, not external mutations - Commit requires valid state —
dirtysessions must not be committed under any circumstances
Implementation Scope¶
Core API¶
create_simulation_session(workspace_root, language) → session_id
simulate_edit(session_id, file_path, range, new_text) → edit_result
evaluate_session(session_id, scope?, timeout_ms?) → evaluation_result
simulate_chain(session_id, edits[]) → chain_result
commit_session(session_id, target?) → commit_result
discard_session(session_id) → ok
destroy_session(session_id) → ok
Convenience alias¶
simulate_edit_atomic is a thin wrapper — not a separate API, just a helper for callers that don't need session persistence:
func SimulateEditAtomic(ctx, mgr, args) (ToolResult, error) {
sid := mgr.Create(ctx, ...)
defer mgr.Destroy(ctx, sid)
mgr.ApplyEdit(ctx, sid, ...)
return mgr.Evaluate(ctx, sid, ...)
}
Exposed as an MCP tool for single-edit use cases. Backed by the same session infrastructure — no separate code path.
Scope support¶
Both single-file (scope: "file") and workspace (scope: "workspace") are implemented together. Workspace scope carries confidence: "eventual" to be honest about cross-file propagation timing.
Cross-file diagnostic propagation behavior by server: | Server | Cross-file reliability | Typical propagation time | |--------|----------------------|-----------------------------| | gopls | High (re-typechecks importing packages) | 2-5s | | tsserver | Good (project-wide) | 1-3s | | rust-analyzer | High | 2-4s | | pyright-langserver | High (project-wide type graph) | 1-3s | | clangd | Partial (single TU; cross-TU via rebuild) | 2-5s | | csharp-ls | Good (Roslyn workspace model) | 1-4s | | dart (analysis server) | High (full program analysis) | 1-3s | | jdtls | Good (Eclipse project model) | 3-8s |
Chained mutations¶
simulate_chain applies a sequence of edits within a session and evaluates after each step:
simulate_chain(session_id, [edit_1, edit_2, edit_3]) → {
steps: [
{ step: 1, net_delta: 0, errors_introduced: [] },
{ step: 2, net_delta: 3, errors_introduced: [...] },
{ step: 3, net_delta: 0, errors_introduced: [] }
],
safe_to_apply_through_step: 1,
cumulative_delta: 0
}
Each step builds on the previous in-memory state. safe_to_apply_through_step is the last step with net_delta: 0.
Document Versioning¶
Version numbers are per-session, per-document:
session created, document opened: version N (baseline)
simulate_edit call 1: N+1
simulate_edit call 2: N+2
discard / revert: N+3 (revert is itself a new version, not a rollback)
Versions never roll back. The revert didChange sends the original content with the next monotonically increasing version number.
Version is tracked per open document on the session's LSPClient. Mismatch between expected and tracked version invalidates the session (marks dirty).
Diagnostic Diffing¶
Two diagnostics are considered identical if all of the following match:
range.start(line + character)range.end(line + character)message(exact string)severity(error / warning / info / hint)source(optional — ignored if absent in either)
The diff is computed as:
- introduced: present in post-simulation diagnostics, not in baseline
- resolved: present in baseline, not in post-simulation diagnostics
- unchanged: present in both (not returned — reduces noise)
Position matching uses post-edit coordinates. Baseline diagnostics reflect pre-edit positions by design — the delta communicates what changed, not where things moved to.
Evaluation Response Contract¶
{
"session_id": "a3f2-...",
"errors_introduced": [
{ "line": 42, "col": 5, "message": "cannot use string as int", "severity": "error" }
],
"errors_resolved": [],
"net_delta": 1,
"scope": "file",
"confidence": "high",
"timeout": false,
"duration_ms": 412
}
confidence values (defined in internal/session/types.go):
- "high" — single-file, diagnostics settled within timeout
- "partial" — timed out, returned snapshot may be incomplete
- "eventual" — workspace scope, cross-file propagation may be incomplete
Note: affected_symbols and edit_risk_score were planned fields that were not implemented. The shipped EvaluationResult type contains only the fields shown above.
Baseline Stability¶
The diagnostic diff is only as trustworthy as the baseline. If the baseline is incomplete, errors that already exist appear in errors_introduced — false positives that corrupt the diff.
The problem: LSP diagnostic publication is asynchronous. After a document opens, the language server processes it and publishes via textDocument/publishDiagnostics over a window of milliseconds to seconds. Snapshotting before this window closes produces an incomplete baseline.
Strategy: lazy per-file settle¶
On first simulate_edit for a given file, wait for that file's diagnostics to settle before recording its per-file baseline. Do not pay for files the session never touches.
simulate_edit(session_id, file_path, edit)
→ if file not in session.baselines:
WaitForDiagnostics(file_path)
session.baselines[file_path] = snapshot
→ apply edit
→ return edit_result
This is the correct strategy for all cases: pay per touched file, not per session. A session that touches one file in a large workspace does not pay settle cost for the rest.
What "settled" means¶
Diagnostics are considered settled when no new textDocument/publishDiagnostics notification has arrived for the target file within a quiet window (default: 500ms). The existing WaitForDiagnostics implementation handles this.
A settle timeout (default: 3000ms) caps the wait. If the server has not published anything within the timeout, use whatever is cached — and mark the baseline with baseline_confidence: "partial" to flag that the diff may contain false positives.
Timeout Behavior¶
If diagnostics do not settle within the timeout window:
- Return the current diagnostic snapshot (whatever the server has published so far)
- Set
confidence: "partial"andtimeout: truein the response - Internal revert still executes — timeout applies only to diagnostic collection, not session cleanup
- No automatic retry
Default timeout: 3000ms (single-file), 8000ms (workspace scope). Configurable via timeout_ms argument.
Revert Guarantee (Internal)¶
Revert is unconditional within the session. When a session is discarded or an evaluation produces partial results, the in-memory state is always restored via defer:
defer func() {
if err := session.Revert(ctx); err != nil {
session.MarkDirty(fmt.Errorf("revert failed: %w", err))
}
}()
If revert fails:
- Session is marked dirty
- Error is returned to the caller (not silenced)
- No further operations on this session will succeed
This is an internal implementation detail, not a user-visible contract. Users see session state (dirty or clean); they do not manage revert explicitly.
Observability¶
Emit structured log events at each phase:
| Event | Fields |
|---|---|
session.created |
session_id, workspace_root, language |
session.edit_applied |
session_id, file, range, version_after |
session.evaluation_start |
session_id, edit_count, scope |
session.evaluation_complete |
session_id, duration_ms, net_delta, confidence |
session.committed |
session_id, files_written, duration_ms |
session.discarded |
session_id, edit_count |
session.dirty |
session_id, step, error |
session.destroyed |
session_id |
These events flow through the existing logging package at LevelDebug (lifecycle events) and LevelError (dirty/failure). No new infrastructure required.
Cross-Language Limits¶
In multi-server mode, a session operates on one language server at a time. A TypeScript change that breaks a Go caller (via a shared JSON contract) will not surface in the session — the Go server has no knowledge of the TypeScript edit.
This is an honest constraint, not a flaw. Single-language impact is the right scope.
Positioning¶
When shipping:
"Simulate code changes before applying them. See exactly what breaks — without touching your files."
This is the correct message. It describes the behavior precisely and makes the agent-native value immediate.
Do not frame it as a testing tool or a linting tool. Frame it as planning infrastructure.
Implementation Notes¶
V1 tool handler (atomic wrapper):
func HandleSimulateEditAtomic(ctx context.Context, mgr *SessionManager, args map[string]interface{}) (types.ToolResult, error) {
// 1. Validate args (file_path, range, new_text)
// 2. ValidateFilePath
// 3. session = mgr.Create(ctx, workspaceRoot, language)
// 4. defer session.Destroy(ctx)
// 5. baseline = session.GetDiagnostics(ctx, uri)
// 6. session.ApplyEdit(ctx, uri, range, newText)
// 7. result = session.Evaluate(ctx, timeout)
// 8. return SimulateEditResult from result.Diff(baseline)
}
Session executor interface (pluggable isolation):
// SessionExecutor abstracts how a session acquires and releases LSP access.
// V1 serializes; future versions may provide per-session LSP instances.
type SessionExecutor interface {
Acquire(ctx context.Context, session *SimulationSession) error
Release(session *SimulationSession)
}
// SerializedExecutor serializes operations per-session using a per-session
// channel semaphore map. Independent sessions do not block each other.
type SerializedExecutor struct {
mu sync.Mutex
sessionLocks map[string]chan struct{}
}
func (e *SerializedExecutor) Acquire(ctx context.Context, s *SimulationSession) error {
ch := e.lockFor(s)
select {
case ch <- struct{}{}:
return nil
case <-ctx.Done():
return ctx.Err()
}
}
func (e *SerializedExecutor) Release(s *SimulationSession) {
// drain the per-session semaphore channel
<-e.sessionLocks[s.ID]
}
Session IDs and the full API remain unchanged regardless of executor. Swapping SerializedExecutor for an IsolatedExecutor (per-session LSP) requires no API changes.
Session manager structure:
type SessionManager struct {
sessions map[string]*SimulationSession
executor SessionExecutor
mu sync.RWMutex
}
type SimulationSession struct {
ID string
Status SessionStatus
Client *lsp.LSPClient
Edits []AppliedEdit
Baselines map[string]DiagnosticsSnapshot // per-file, populated lazily on first simulate_edit
Versions map[string]int // per-file document version counter
Contents map[string]string // per-file current in-memory content
OriginalContents map[string]string // per-file content at baseline (for Discard)
Workspace string
Language string
DirtyErr error
mu sync.Mutex
}
textDocument/didChange format:
{
"textDocument": { "uri": "file:///path", "version": N },
"contentChanges": [
{
"range": { "start": {"line": L, "character": C}, "end": {...} },
"text": "new content"
}
]
}
Version must increment on each change. Tracked per open document on SimulationSession.