ACL Specs Review Feedback
Status: Draft / Review Complete Date: 2025-12-10 Reviewer: Gemini (Agent Control Layer Team)
Overall
The Agent Control Layer (ACL) specification suite (AIP-1, ADP-1, PVS-1, CTX-1) represents a mature and cohesive "defense-in-depth" approach to agent security and observability. The separation of concerns is excellent: Identity (AIP) is decoupled from Execution Logging (ADP), which is decoupled from Policy Enforcement (PVS), yet they interlock cleanly via the Capability-Set and standard formats.
The decision to leverage existing primitives (X.509, mTLS, JSON) rather than inventing new cryptographic protocols is a significant strength, ensuring compatibility with standard infrastructure (cloud LBs, existing PKI). The specs are largely "implementation-ready," and the reference code in lib/ matches the stated goals well.
The primary areas for improvement revolve around Operational Reality (dealing with clock skew, LB TLS termination, and debugging distributed agents) and Granularity (adding structured data for policy violations and environment isolation).
1) AIP-1 – Agent Identity Protocol
Strengths
- Standard Standards: Using X.509 v3 + OIDs is robust and future-proof.
- Blast Radius Reduction: Short-lived certificates (max 15m) effectively mitigate the "revoke problem" without complex CRL infrastructure.
- Reference Code:
ca.tscorrectly implements the profile, includingMath.minclamping on validity.
Gaps / Risks
- Clock Skew: A 5-minute validity window is tight. If a server's clock is 2 minutes behind, certificates might appear "not yet valid" or "expired" prematurely.
- TLS Termination: In many production setups (AWS ALB, Cloudflare), mTLS is terminated at the edge, and the backend receives the cert in an HTTP header (e.g.,
X-Client-Cert). The spec implies direct mTLS which might not fit all topologies. - Missing Context: The current OIDs don't specify the environment (Prod vs Staging). An agent with a valid cert might accidentally hit a Prod endpoint if network routing allows it.
Concrete Recommendations
- Add
EnvironmentExtension:- OID:
1.3.6.1.4.1.59999.1.6(next available slot). - Value:
production,staging,dev. - Why: Prevents cross-environment accidents at the identity layer.
- OID:
- Mitigate Clock Skew:
- Update
ca.tsto backdatenotBeforeby 1-2 minutes (e.g.,now - 2m) to account for verifying servers with lagging clocks.
- Update
- Document LB Passthrough:
- Explicitly mention in the spec that if mTLS is terminated at a load balancer, the Validator logic must verify the
X-Client-Cert-Hashor headers forwarded by the trusted LB.
- Explicitly mention in the spec that if mTLS is terminated at a load balancer, the Validator logic must verify the
2) ADP-1 – Agent Data Protocol
Strengths
- Framework Agnostic: The Action/Observation/Reflection loop fits almost every agent paradigm (LangChain tools, AutoGen functions, etc.).
- Immutability: The reference to
audit-log-immutability.tsand hash chaining is a standout feature for enterprise trust.
Gaps / Risks
- Observability Silos: There is no standard field for Distributed Tracing. If Agent A calls Agent B, there is no
trace_idto link the two runs in specific logging tools (DataDog, Jaeger). - Sub-steps: Complex agents often have recursive structures (a step that spawns a sub-agent). The flat
stepsarray doesn't capture hierarchy well.
Concrete Recommendations
- Add Trace Context:
- Add
trace_idandspan_idto the top-levelmetadataor creating a dedicatedtraceblock. - Example:
"trace": { "trace_id": "0af7651916cd43dd8448eb211c80319c", "span_id": "b7ad6b7169203331" }
- Add
- Support Hierarchy:
- Add optional
parent_step_idtoAgentStepto allow reconstructing trees from the flat list.
- Add optional
- Standardize "Stop" Reasons:
- Status
failedis broad. Addcancellation_reasonor afailure_codeenum (e.g.,rate_limited,policy_violation,timeout) to the top-level run object.
- Status
3) PVS-1 – Policy Verdict Schema
Strengths
- Clean Interface: The
approvedboolean makes enforcement trivial. - Auditability:
reasoningandconfidence_scoreprovide the "why," which is crucial for compliance.
Gaps / Risks
- Unstructured Violations:
policy_violationsis an array of strings. It's hard for a UI highlight which part of the text caused the violation, or how severe it is (INFO vs CRITICAL). - Missing Remediation: The agent doesn't know how to fix the checking error. "No PII" is vague; "Redact the SSN" is actionable.
Concrete Recommendations
- Structured Violations:
- Deprecate string array (or keep for backward compat) and add
violationsobject array:"violations": [ { "policy_name": "No PII", "severity": "critical", // critical, warning, info "location": { "start": 10, "end": 20 }, // Optional char range "remediation_hint": "Mask the detected SSN with ***" } ]
- Deprecate string array (or keep for backward compat) and add
- Versioning:
- Ensure
version: "pvs-1"is actually emitted byThe Gavel(currently missing in code).
- Ensure
4) CTX-1 – Capability & Trust Context
Strengths
- Simple & Readable:
namespace:nameis intuitive and debuggable. - RBAC Mapping: The
perm:resource:actionpattern maps 1:1 with standard RBAC tables, making adoption easy.
Gaps / Risks
- Foot-gun: Static Permissions: If an admin revokes a permission in the DB, an agent with a valid 15-minute cert still has access until it expires. This "async revocation" needs distinct warning labels.
- Missing "Limit" capabilities:
budget:is defined, but rate limits (quota:) are missing.
Concrete Recommendations
- Add
quotanamespace:quota:daily_requests:1000orquota:concurrency:5.
- Document the "Revocation Lag":
- Add a "Security Considerations" section explicitly stating that CTX-1 capabilities are valid for the lifetime of the cert and cannot be instantly revoked without blocking the cert fingerprint itself.
Cross-Cutting / Big Picture
Coherence
The specs align very well. The progression of Identity (AIP) -> Permission (CAP) -> Work (ADP) -> Review (PVS) mimics a human organization's workflow.
High-Leverage Additions (Pre-Launch)
Environmentin AIP: Prevents catastrophic cross-env errors. (Low effort, High impact).Trace IDin ADP: Essential for debugging multi-agent systems. (Low effort, High value).- Backdate Certs in AIP: Fixes the clock skew operational headache before it starts. (One line of code).
Explicit Deferrals (Post-Launch)
- Signed ADP Streams: While cool for "tamper-proof logs,"
audit-log-immutability.ts(hash chaining) covers 80% of the value for 10% of the cost. Do not implement per-step crypto signatures for v1. - Complex CAP Logic: Don't try to encode "Access file X if owned by Y" into capabilities. Keep CAP static and simple; handle complex authZ in the application logic (ABAC) or PVS.