SORA Approval Document Generator — Design Plan
Status: Draft v1.2 — plan-before-implementation
Author: Sentinel (research assistant)
Date: 23 April 2026
Audience: Hackathon build team; reviewers from MAA RPAS Regulatory Branch and CAA Innovation Hub
Parent story: USER_STORY_CAA_MAA_SWARM.md — CAA/MAA Joint Approval Workbench for Live BVLOS Autonomous Swarm Flight Trials
v1.1 revision history:
- §14 added — Requirements catalogue REQ001–REQ020 (from
requirements.csv) folded in as a third axis below AC1–AC8 - §15 added — DTFEA stress-test pass (Data / Trust / Failure / Evidence / Authority) from
STRESS-Report.mdadopted as a second-axis check on every submission - §16 added — Eight keystone failure-mode detectors hard-coded from the supplied critical-findings list
- §4.2, §6, §7, §8, §10, §11 updated to reference the new sections
v1.2 revision history — triggered by the supplied user journey.txt service blueprint:
- §20 added — Service blueprint; the generator is now embedded in a 10-step, 3-phase user journey
- §21 added — Dual-mode engine: one codebase, two UX postures (operator-facing pre-submission vs. regulator-facing decision)
- §22 added — Submission Readiness Check as a formal pre-submission gate
- §23 added — Post-decision export, archive, and calibrated-improvement loop (no self-learning in safety decisions)
- §24 added — Revised phasing to include operator-journey phases
- §25 added — Plan-v1.2 definition of done
- Architectural consequence: the operator journey (Phase 1) demands private, non-punitive, iterative UX — the engine's output posture changes, not the engine itself
▌1. Purpose of this document
Design the approach — not yet the implementation — for a generator that ingests a SORA / Operating Safety Case submission pack and produces a structured decision record (approval OR refusal) aligned to acceptance criteria AC1–AC8 of the parent user story.
The plan is anchored on two reference submissions supplied:
| Sample | Outcome | Why |
|---|---|---|
scotland-sample-case.md (Edinburgh Napier University Operating Safety Case v2.2) | Would be approved (under UKPDRA-01, Specific Category, VLOS) | Complete CAP 722A-compliant safety case: named Accountable Manager, qualified remote pilots, SMS aligned to CAP 1059, full emergency procedures library, logbooks, version-controlled change log, traceable references |
sora-safety-case.txt (CTRL-ALT-DEFEAT — 5x RigiTech Eiger BVLOS swarm, GNSS/C2-denied trial at Spadeadam) | Would be refused (as-submitted) | Swarm/BVLOS/autonomy-denied operation declared at SAIL I on SORA v2.0; only 6 of 24 OSOs addressed; no DAA evidence, no swarm-behaviour envelope, no AMLAS/SACE case, no named Duty Holder / RAISO, no C2 link budget |
The gap between these two submissions — one comprehensively evidenced for a low-risk scenario, one skeletally evidenced for a high-risk scenario — defines the detection surface the generator must cover.
▌2. Scope of the generator
2.1 In scope
- Ingest of a submission pack (markdown, text, PDF, docx) containing any subset of: Operating Safety Case, JARUS SORA package, CONOPS, DAA report, C2 link analysis, swarm-behaviour envelope, AMLAS/SACE assurance case, CAP 1616 airspace-change evidence, Range Safety Case.
- Classify the operation (category, airspace, BVLOS/VLOS, autonomy posture, swarm/single, segregated/non-segregated, payload risk including munitions).
- Coverage-map each artefact to the relevant regulatory anchor corpus (CAP 722/722A/722B, DEFSTAN 00-970 Part 9, RA 1000/1600/2300 series, JSP 936 Part 1, JARUS SORA v2.5, CAP 1616, AMLAS, SACE, ASTM F3442 / EUROCAE ED-267).
- Flag gaps, ambiguity, unresolved hazards, extrapolation beyond the tested envelope, and weak claim-to-evidence traceability.
- Draft a structured decision record (approve-with-conditions, refuse-with-reasons, or return-to-applicant-for-clarification) complete with regulatory citations, evidence pointers, ALARP rationale, conditions, and review triggers.
- (v1.2) Support the 10-step service blueprint in §20 end-to-end — operator pre-submission, readiness gate, regulator review, post-decision archive and improvement loop.
- (v1.2) Operate in two UX postures — Applicant mode (private, iterative, non-punitive) and Regulator mode (logged, immutable, decision-bearing) — over the same core engine.
2.2 Explicitly out of scope
- Issuing the Military Permit to Fly (MPTF) itself — the signature remains a human regulator act.
- Autonomous clearance of a live flight.
- Displacing Duty Holder accountability or RAISO sign-off.
- Cross-border approvals outside UK sovereign airspace.
- Automated redaction of classified safety evidence.
▌3. What we learned from the two samples
3.1 Scotland (approve) — structural markers present
The Edinburgh Napier submission exhibits the shape of an approvable CAP 722A Operating Safety Case:
- Clear authorisation scope — UKPDRA-01, Specific Category, VLOS only.
- Document control — version 2.2 (14 Jun 2024), named author, change log back to v0.1.
- Safety policy and SMS — 12-point policy; SMS aligned to CAP 1059; safety target "No Accidents".
- Named organisation — Accountable Manager (Brian Davison), three qualified remote pilots with Flyer IDs and GVC qualifications.
- Roles & responsibilities — Remote Pilot, Observer, Payload Operator — each with a discrete responsibility list.
- Occurrence-reporting flowchart — definitions drawn from EU 376/2014 and 996/2010; ECCAIRS route to AAIB/CAA.
- Operating procedures — viability study → site evaluation → risk analysis → call sheet → loading list → site checklist → crew briefing → pre-flight → post-flight, each with a template form in the appendix.
- Emergency procedures library — 11 distinct failure modes covered (pilot incapacitation, airspace incursion, ground incursion, loss of link, rogue RPA, loss of power (air), loss of power (ground), unexpected behaviour, LiPo fault, RPA fire, loss of GNSS, compass error, abnormal environmental conditions).
- Fleet documented — three aircraft each with MTOM, frequency, serial, assembly checklist.
- References & legislation — full reference list with issue/date (CAP 722 v9, CAP 722A v2, ANO 2016/765, Assimilated Reg EU 2019/947).
- IMSAFE self-assessment, Skywise subscription, currency requirement (2 hrs / 90 days), crew briefing protocol.
These markers are the feature space the generator needs to detect and score. Most are binary (present / absent / partial); a few are graded (e.g., SMS depth, currency evidence strength).
3.2 SORA case (refuse) — the failure pattern
The CTRL-ALT-DEFEAT submission is instructive because it looks like a SORA — it has the section numbering — but the content does not support the operation it describes. The defects fall into five families:
| Defect family | Specific instances in the sample | Which AC catches it |
|---|---|---|
| Version / standard mismatch | Claims SORA v2.0; JARUS v2.5 is the current version and carries 24 OSOs (v2.0 had ~24 but under a different rubric). CAP 722 v9 superseded. | AC1, AC2 |
| Risk/SAIL incoherence | 5-vehicle autonomous swarm, BVLOS, with deliberate GNSS and C2 denial → declared SAIL I. The combination of autonomy, swarm, and loss-of-reference testing should drive a much higher SAIL irrespective of airspace segregation. | AC2 |
| Missing artefacts | No DAA evidence (AC3), no C2 link budget / latency / crypto posture (AC4), no swarm-behaviour envelope with emergent-behaviour analysis (AC5), no CAP 1616 / TDA / NOTAM detail (AC6), no AMLAS / SACE autonomy assurance case (AC7), no Article 36 review even though not in scope here, no Range Safety Case appendix. | AC1, AC3, AC4, AC5, AC6, AC7 |
| Partial OSO coverage | 6 of 24 OSOs listed; OSO 10 declared "Partial — justified" with a one-line rationale. OSOs 5, 7 (proper), 8, 11–24 absent. For an autonomous swarm, OSOs 18–20 (Human Factors / adversarial / flight envelope) and OSO 24 (External services) are particularly load-bearing and missing. | AC2, AC7 |
| Accountability gaps | No named Accountable Manager, no Duty Holder (DDH/ODH), no RAISO sign-off, no Range Safety Officer identity, no remote pilot qualification evidence. "Operator: CTRL-ALT-DEFEAT" is the only organisational identifier. | AC1, AC7, AC8 |
A sixth subtle defect: the submission treats segregated airspace as a blanket risk-eraser. Segregation mitigates mid-air collision with third-party traffic. It does not mitigate emergent swarm behaviour, geofence breach, or uncontrolled flight termination toward range boundary — which is exactly what a GNSS/C2-denied trial is designed to probe.
3.3 Detection implication
A well-constructed generator must be able to say, in plain engineering English:
"The operation as described is SAIL V-adjacent under JARUS v2.5 (autonomous swarm BVLOS with deliberate loss-of-reference); the submission asserts SAIL I. This asymmetry between operational complexity and declared assurance is the dominant reason for refusal. It is not curable by panel deliberation — it requires resubmission with a re-derived SAIL, the corresponding OSO coverage, and the artefacts listed in AC3–AC7 above."
The generator should not simply red-flag line items; it should explain the logic of refusal.
▌4. Architecture of the generator
4.1 High-level flow
v1.2 note: the pipeline is the same in both UX postures — Applicant and Regulator modes share components 1–8. The modes differ in what the engine outputs (§21), not in what it computes.
[Submission pack]
|
v
[0. Baseline configurator (v1.2)] --> (selects clause sub-library based on
| operation type, airspace, regulator)
v
[1. Ingest & segmentation] --> (sections, tables, figures, metadata)
|
v
[2. Operation classifier] --> (category, airspace, BVLOS, autonomy, swarm, payload)
|
v
[3. Artefact detector] --> (OSC, SORA, DAA, C2, swarm env, AMLAS/SACE, CAP1616, Article36)
|
v
[4. Regulatory mapper] --> (clause-level coverage matrix, RAG rated)
|
v
[5. SORA evaluator] --> (GRC, ARC, SAIL, OSO-by-OSO compliance check)
|
v
[6. Gap & extrapolation analyser]
|
v
[7. Decision drafter] --> (approve-with-conditions | refuse-with-reasons | return-for-clarification)
|
v
[8. Evidence traceability binder]
|
+---- (Applicant mode) --> [Operator Feedback Report + Readiness Dashboard] (§21)
|
+---- (Regulator mode) --> [Structured Decision Record + Coverage Matrix +
Gap Report + Panel Deliberation Log] (§7, §21)
4.2 Component responsibilities
v1.1 note: components 5 and 6 now run two passes in sequence — (i) regulatory-coverage pass, as originally designed, and (ii) DTFEA stress-test pass per §15. Component 7 (decision drafter) consumes both.
1. Ingest & segmentation. Accept MD / TXT / PDF / DOCX. Normalise to a sectioned internal representation. Preserve page/paragraph anchors — every later citation must resolve to an original-document coordinate. Re-use the pdf skill for PDFs and docx for Word; markdown/text pass through a header-aware splitter.
2. Operation classifier. Extract structured facts from the submission:
- Operator identity; Accountable Manager; Duty Holder; RAISO
- Aircraft make/model, quantity, MTOM, configuration
- Operation type (photo/survey/delivery/trial), airspace class, segregated or not, BVLOS/VLOS
- Autonomy posture (teleoperated / supervised autonomy / full autonomy / swarm)
- Payload (benign, hazardous, weapon-bearing → triggers AOP-15 / Article 36)
- Range and duration; day/night; populated-area overflight
- Declared category (Open / Specific / Certified) and standard scenario (PDRA-01/02, STS-01/02)
3. Artefact detector. Check for presence and basic structural adequacy of each required evidence artefact. A binary "present" is not enough — e.g., a DAA section that is three paragraphs long with no sensor specs fails the detector's minimum-content heuristic.
4. Regulatory mapper. Uses a configuration-controlled clause library. Every artefact section is linked to the clause(s) it purports to satisfy, and vice versa. Produces a red/amber/green matrix per AC1. Clauses drawn from:
- CAP 722 — UAS operations in UK airspace (current: v9, Dec 2022)
- CAP 722A Second Edition — Operating Safety Cases (Dec 2022)
- CAP 722B Fifth Edition — RAE(PC) / BVLOS-relevant sections (2025)
- CAP 1616 — Airspace Change Process
- DEFSTAN 00-970 Part 9, Issue 13 — RPAS design & airworthiness
- RA 1000 / 1600 / 2300 series — MAA Regulatory Articles (current dated as late as 31 Mar 2026 in supplied source)
- JSP 936 Part 1 v1.1 — Dependable AI in Defence (Nov 2024)
- JARUS SORA v2.5 — 24 OSOs
- AMLAS (6-stage) and SACE (8-stage) — York CAA / CfAA autonomy assurance methodologies
- ASTM F3442 / EUROCAE ED-267 — DAA technical frameworks
- AOP-15 and Article 36 (AP I Geneva) — weapon review
5. SORA evaluator. Walks the JARUS SORA v2.5 algorithm end-to-end:
- Verify declared iGRC against population density + MTOM table
- Verify mitigations M1 (ground-level risk reduction), M2 (effect of ground impact), M3 (ERP)
- Derive final GRC and cross-check against declared value
- Verify ARC derivation and tactical mitigations
- Derive SAIL from the GRC × ARC grid; refuse auto-green if declared SAIL < derived SAIL
- Walk all 24 OSOs; each must be Met / Enhanced / High / Not-applicable-with-rationale; any partial or unmet OSO is flagged for mandatory panel deliberation and a written rationale before any green rating is issued
- Flag when OSO robustness level falls below SAIL requirement
6. Gap & extrapolation analyser. Four checks:
- Coverage gap — required artefact or clause missing
- Strength gap — artefact present but evidence too thin to carry the claim
- Envelope extrapolation — operational envelope (altitude, speed, density, duration, number of agents) exceeds the tested envelope, especially for DAA sensors and swarm behaviour
- DTFEA failure-mode gap (v1.1) — submission passes regulatory coverage but fails one or more pillars (Data / Trust / Failure / Evidence / Authority) per §15 Each flag explains which rule fired, which evidence it drew on (or where it looked and found nothing), and the regulator's next action in plain engineering English.
7. Decision drafter. Produces one of three outputs:
- Approve with conditions — full decision record per AC8 with limits, triggers, and signatures block
- Refuse with reasons — structured refusal enumerating each unresolvable blocker with the clause it breaches and what would be required to convert to approval
- Return for clarification — where defects are recoverable by addition rather than resubmission; lists exactly what is required and sets a clock
8. Evidence traceability binder. Every quantitative claim in the decision record resolves to an evidence artefact, page, and paragraph. No claim without a pointer; no pointer without a claim. This is a hard invariant — the drafter refuses to emit an unsourced quantitative assertion.
4.3 Human-in-the-loop markers
Per NFRs in the user story (override-first UI, mandatory panel deliberation on every amber/red flag): the generator never auto-closes amber or red. It produces the draft; the joint panel signs. The tool logs every comment / accept / challenge / override immutably.
▌5. Knowledge base construction
The generator is only as good as its clause library. Three indices must be built and kept under joint MAA/CAA configuration control:
| Index | Contents | Source | Update cadence |
|---|---|---|---|
| Clause library | Every cited clause, paragraph, or table from the regulatory anchor corpus, with issue/date | CAP 722 family, RA 1000/1600/2300 series, DEFSTAN 00-970 Part 9, JSP 936 Part 1, JARUS SORA v2.5, CAP 1616, AOP-15 | Quarterly MAA/CAA alignment review; on-publication hot-patch for named amendments |
| Artefact schema | What a compliant DAA report / swarm envelope / AMLAS case must contain at minimum | Derived from CAP 722A §3 & §4, AMLAS/SACE templates, EUROCAE ED-267 §5 | On methodology revision |
| Requirements catalogue (v1.1) | REQ001–REQ020 — the CAA operational-authorisation-pack requirement set, with per-REQ acceptance gates | requirements.csv, extensible by joint MAA/CAA configuration control | Per programme; version-pinned to submission under review |
| DTFEA rubric (v1.1) | 5-pillar stress-test questions (Dana/Trevor/Fiona/Evan/Aria) with severity × confidence scoring | STRESS-Report.md methodology | On methodology revision |
| Historical decision ledger | Every previous decision record (approve/refuse/return) plus the evidence pack and the panel's deliberation notes | The Workbench itself, with a read-back loop | Continuous; used for calibration audits |
Superseded revisions are archived, not silently updated — per the NFR on traceability.
5.1 Version pinning — non-negotiable
The generator tags every clause citation with the publication revision in force on the date of review. Example: a review on 23 April 2026 cites CAP 722A Second Edition (Dec 2022) and RA 2305 Issue 7 (updated 31 Mar 2026 per the supplied FLY 2000 series print). A re-review 12 months later may cite different revisions; the two decision records remain individually coherent.
▌6. Mapping AC1–AC8 and REQ001–REQ020 onto generator capability
This is the coverage claim. Each user-story acceptance criterion maps to one or more generator components and produces a named section of the output decision record.
| AC | User-story requirement (one line) | Generator components involved | Output section | REQs that feed into it |
|---|---|---|---|---|
| AC1 | Submission completeness & clause-coverage RAG matrix | 1, 3, 4 | Coverage Matrix appendix | REQ001, REQ013, REQ014, REQ016, REQ020 |
| AC2 | SORA ground assessment: GRC / ARC / SAIL / 24 OSOs | 5 | SORA Evaluation section | REQ002, REQ003 |
| AC3 | DAA evidence with envelope coverage test | 3, 6 | Detect & Avoid section | REQ007 |
| AC4 | C2 link, loss-of-link behaviour, swarm-consensus flag | 3, 6 | Command & Control section | REQ005, REQ006, REQ011 |
| AC5 | Swarm-specific emergent behaviour, containment, live-flight coverage | 3, 6 | Swarm Behaviour Envelope section | REQ005, REQ008 |
| AC6 | Airspace integration, CAP 1616, NOTAM, e-conspicuity, third-party risk | 4, 6 | Airspace & Third-Party Risk section | REQ004, REQ013, REQ017, REQ019 |
| AC7 | AMLAS/SACE posture, JSP 936 RAISO sign-off, Meaningful Human Control, ODD | 3, 4 | Autonomy Assurance Posture section | REQ003, REQ008, REQ009, REQ010 |
| AC8 | Structured decision record with conditions, triggers, signatures, full traceability | 7, 8 | Decision Record (the main document) | REQ012, REQ015, REQ018, REQ020 |
This cross-map is load-bearing: the AC1–AC8 axis comes from the user story (what the Workbench must do); the REQ001–REQ020 axis comes from the requirements catalogue (what an operational-authorisation pack must contain). The generator evaluates both. A submission can pass AC1 (all artefacts present) and still fail a REQ (e.g., REQ009 — AMLAS container named but no re-evidencing cycle) — that's exactly the class of defect the v1.0 plan missed and v1.1 closes.
6.1 Walkthrough against the two samples
Scotland sample through the generator:
- AC1: Green — all required PDRA-01 artefacts present (OSC complete, no DAA/C2/swarm/AMLAS required at this SAIL)
- AC2: N/A (PDRA-01 path) — operation is a standard scenario under UKPDRA-01, so the full 24-OSO walk is replaced by the PDRA assurance pack
- AC3–AC5: N/A — VLOS only, single-vehicle, no autonomy
- AC6: Green — Class G/D covered with NSF process where needed
- AC7: N/A — no autonomy load-bearing
- AC8: Approve with conditions — conditions are essentially the PDRA-01 operational limits (daylight, 400 ft ASL, 500 m VLOS, 50 m from uninvolved, etc.)
SORA case through the generator:
- AC1: Red — DAA, C2 link analysis, swarm envelope, AMLAS/SACE all absent
- AC2: Red — SORA v2.5 not used; SAIL declaration incoherent with operation; 18 of 24 OSOs absent
- AC3: Red — DAA envelope not tested (none provided)
- AC4: Red — loss-of-link is intentional; no link budget; no crypto posture; no individual-vehicle vs swarm-level behaviour table — and a GNSS/C2 denial trial that depends on continued swarm consensus is explicitly flagged as high-risk by AC4
- AC5: Red — no emergent-behaviour analysis, no sim-plus-flight coverage matrix, no containment evidence beyond "hard geofencing" (claim without derivation)
- AC6: Amber — segregated airspace is claimed; NOTAM and TDA not evidenced but plausibly recoverable
- AC7: Red — no AMLAS/SACE case; no RAISO sign-off; no ODD; no MHC concept of use at vehicle/swarm/mission layers
- AC8: Refuse — the submission is not curable by panel deliberation; resubmission required with re-derived SAIL
▌7. Output — the decision record
A single structured Markdown document that renders to PDF. Fixed section order so reviewers, Duty Holders, and AAIB readers can find any element without a table of contents search.
Decision Record — CAA/MAA Joint Approval Workbench
==================================================
0. Front matter
0.1 Submission ID, applicant, operation title, review date
0.2 Panel composition and signatures block
0.3 Classification marking (highest element in pack)
1. Decision
1.1 APPROVE WITH CONDITIONS | REFUSE | RETURN FOR CLARIFICATION
1.2 One-paragraph rationale in engineering English
2. Operation summary
2.1 Concept of Operations (machine-extracted, human-verified)
2.2 Classification: category, SAIL, BVLOS, swarm, autonomy, payload
3. Regulatory basis
3.1 Cited clauses with revision tags
3.2 Applicability map
4. Coverage matrix (AC1)
4.1 Artefact presence table (RAG)
4.2 Clause-to-evidence trace
5. SORA evaluation (AC2)
5.1 GRC derivation and mitigations
5.2 ARC derivation and mitigations
5.3 SAIL determination
5.4 24 OSO walk with evidence pointers
6. Detect & Avoid (AC3)
7. Command & Control and loss-of-link (AC4)
8. Swarm behaviour envelope (AC5)
9. Airspace and third-party risk (AC6)
10. Autonomy assurance posture (AC7)
11. Residual risks and ALARP rationale
12. Conditions (if approve)
12.1 Operational limits (weather, day/night, max airborne agents, volume, payload, crew qualifications)
12.2 Review triggers (loss-of-link, containment breach, weather excursion, software update)
12.3 Reporting obligations
13. Reasons (if refuse) — itemised blockers, clause, and remedy
14. Signatures
14.1 MAA Lead Inspector
14.2 CAA co-signer (non-segregated legs)
14.3 Duty Holder acknowledgement
14.4 RAISO acknowledgement (where AI load-bearing)
Appendix A — Coverage matrix (detailed)
Appendix B — Evidence index with page/paragraph pointers
Appendix C — Panel deliberation log (immutable, timestamped)
Appendix D — Gap report (flags raised, resolution notes)
Every quantitative claim in sections 1–14 resolves to an Appendix B entry. This is the traceability invariant restated at the output level.
7.1 Additional output sections (v1.1)
Two new sections are inserted into the decision record to reflect the requirements catalogue and the DTFEA stress-test:
15. Requirements catalogue compliance (REQ001–REQ020)
15.1 Per-REQ verdict table with evidence pointers
15.2 Absent-by-default REQs flagged (REQ017 public engagement, REQ018 classification, REQ019 environmental, REQ020 close-out)
16. DTFEA stress-test matrix
16.1 Per-pillar maturity % (D / T / F / E / A)
16.2 Critical / major / minor finding counts
16.3 Top-20 findings (severity × confidence) with recommendations
16.4 Keystone failure-mode detector results (§16 of plan)
Appendix E — Requirements-to-AC cross-reference matrix. Appendix F — DTFEA findings in full, per-REQ, per-pillar.
▌8. Validation strategy — how we prove it works
8.1 Twin-sample regression
The supplied Scotland and SORA samples are the first regression cases. Expected behaviour:
| Scenario | Expected decision | Expected failure families flagged (if refuse) |
|---|---|---|
scotland-sample-case.md as-is | Approve with PDRA-01 conditions | None |
scotland-sample-case.md with §3.3 (personnel) redacted | Return for clarification — AC1 §3 missing | Named personnel absent |
scotland-sample-case.md escalated to BVLOS claim | Refuse — operation exceeds authorisation | Operation type vs authorisation mismatch |
sora-safety-case.txt as-is | Refuse | AC1 artefacts, AC2 SAIL incoherence, AC3, AC4, AC5, AC7 |
sora-safety-case.txt re-submitted at SAIL V with full SORA v2.5 OSO pack, swarm envelope, AMLAS case | Return → Approve with heavy conditions | Remaining gaps (e.g., Article 36 if payload changes) |
If the generator approves the SORA case as-submitted, or refuses the Scotland case, the regression fails and the build is blocked.
8.2 Red-team probe
Per Definition of Done in the parent story, an independent red-team run against:
- Automation bias — can an inspector silently accept a red flag? (No — mandatory deliberation lock.)
- Evidence fabrication — does the generator cite a non-existent page? (Binder invariant should block.)
- Classification leakage — does a SECRET paragraph end up in an OFFICIAL export? (Classification-aware pipeline, two-person downgrade.)
- Prompt injection — does content in the submission cause the generator to skip checks? (Guardrails isolate submission text from control flow.)
8.3 Calibration audit
Held-out historical decisions (from the MAA RPAS Regulatory Branch archive, if released) re-run through the generator and compared to the human panel's original verdict and rationale. Disagreements are the training ground — not to make the generator agree with the human (the human might have been wrong), but to force explicit rationale on both sides.
8.4 Drift monitoring
Per NFRs: the generator's own AI components carry a drift-monitoring plan with an owner. When regulatory text, clause numbering, or OSO structure shifts, the clause library update triggers a re-run of the twin-sample regression before the new version is released to inspectors.
▌9. Assurance of the tool itself
The generator is itself a load-bearing component in a safety-regulatory decision loop. Per the user story's Definition of Done:
- The Workbench's own assurance case must be signed off by MAA-CTS or CAA Innovation Hub.
- Drift monitoring on AI components with a named owner.
- UNCLASSIFIED lessons-learned published to the wider UK aviation regulatory community.
We treat the generator as an AMLAS-stage artefact in its own right: ODD (inspector-facing review of UK UAS submissions), training/validation data provenance (historical decisions + synthetic edge cases), residual-risk claims (what the generator does not catch and the compensating controls that catch it instead).
▌10. Implementation phases
The plan stops short of writing code but sets up the work.
Phase 0 — Clause library boot (1 sprint)
Ingest the supplied regulatory corpus. Tag every citable paragraph with revision and date. Produce a machine-readable clause index. Deliverable: clause_library_v0.json + a human-browsable HTML mirror.
Phase 1 — Ingest + classify (1 sprint)
Components 1 & 2. Target accuracy: ≥95% correct extraction of operator identity, MTOM, airspace class, BVLOS/VLOS, swarm/single on the twin samples plus 5 synthetic variants.
Phase 2 — Artefact detector + regulatory mapper (2 sprints)
Components 3 & 4. Produces the AC1 coverage matrix. Passes the twin-sample regression at coverage-matrix level.
Phase 3 — SORA evaluator (2 sprints)
Component 5. End-to-end JARUS SORA v2.5 walk. SAIL derivation cross-checker. OSO-by-OSO walk. Passes the twin-sample regression at SORA-evaluator level.
Phase 4 — Gap analyser + decision drafter (2 sprints)
Components 6 & 7. Produces the full decision record document. Passes the twin-sample regression at document level.
Phase 5 — Traceability binder + HITL (1 sprint)
Component 8. Binder invariant enforcement. Panel deliberation log. Accessibility pass (WCAG 2.2 AA).
Phase 6 — Red-team + calibration + drift monitor (1 sprint)
Validation §§8.2–8.4. Release candidate.
Phase 7 — Joint MAA/CAA panel trial (1 sprint, live)
First real submission through the Workbench without fallback to manual workflow. Decision record accepted by Duty Holder without rework on structure or traceability. Definition of Done — first checkbox met.
Total: ~10 sprints from zero to first live joint-panel approval. This is deliberately padded — regulatory tool development is not a place for optimistic estimates.
▌11. Risks to the plan
| Risk | Mitigation |
|---|---|
| Clause library drifts out of sync with live publications | Quarterly MAA/CAA alignment review; on-publication hot-patch discipline; version-pinned citations |
| Over-reliance on pattern-matching to declare a submission "complete" when content is thin | Strength gap check (component 6) forces minimum-content heuristics per artefact type |
| Generator produces a plausible-sounding refusal for a valid submission (false refuse) | Calibration audit + human-in-the-loop override-first UI; no auto-close on amber/red |
| Generator approves a bad submission (false approve) | Binder invariant plus mandatory SAIL cross-check; the SORA sample is explicitly the canonical false-approve regression case |
| Classification leakage in multi-level submissions | Classification-aware pipeline from ingest onwards; two-person downgrade for any export |
| Scope creep into MPTF issuance | Generator hard-stops at the draft decision record; signature remains human by design |
▌12. Open questions — need regulator input before Phase 2
- What is the authoritative source-of-truth for the clause library — a live CAA/MAA feed, or manually curated under joint configuration control? (Assumed the latter for now.)
- How does the Workbench ingest SECRET and STRAP submissions in practice — secure enclave, offline instance, or classification-aware sanitisation? (Assumed secure enclave on UK sovereign infrastructure.)
- Where does the Range Safety Case sit in the decision record — embedded, or cross-referenced as an out-of-band artefact that the Workbench acknowledges but does not evaluate? (Assumed the latter; consistent with the NFR that Workbench unavailability must never block a live trial.)
- For coalition swarm variants (AUKUS/NATO), does the decision record format need to render to an interop schema for peer regulators? (Out of scope for v1; parent story lists it as a related story.)
▌13. Definition of plan-complete
This plan is ready for implementation when:
- The build team and a named MAA/CAA reviewer agree the scope in §2 is correct.
- The mapping in §6 from AC1–AC8 to generator components is accepted.
- The twin-sample expected behaviour in §8.1 is accepted as the regression contract.
- Open questions in §12 are either answered or explicitly deferred with ownership.
- A hackathon timebox decision is made — full Phase 0–7 will not fit; the hackathon target is most likely Phase 0 + Phase 1 + a partial Phase 2 producing the AC1 coverage matrix, with the two supplied samples as the live demo.
End of v1.0 plan body. Implementation begins only after sign-off on §13.
v1.1 Addendum — Requirements catalogue & DTFEA stress-test
Incorporating requirements.csv (20 REQs) and STRESS-Report.md (DTFEA methodology, 100 findings across 5 pillars)
▌14. Requirements catalogue REQ001–REQ020
The requirements catalogue is a CAA-oriented pack-level view that sits below AC1–AC8 and above individual clause checks. Each REQ is phrased as a user story from a specific CAA role (case officer, SORA assessor, airspace regulator, spectrum specialist, DAA assessor, autonomy reviewer, AI/ML specialist, human factors inspector, flight-ops inspector, flight safety investigator, authorisation officer, licensing inspector, CAA/MAA liaison, programme manager). The generator walks the catalogue for every submission and produces a per-REQ verdict.
14.1 Catalogue summary
| REQ | Name | CAA role | Primary AC binding |
|---|---|---|---|
| REQ001 | Application intake & completeness check | Case officer | AC1 |
| REQ002 | SORA 2.0 assessment for swarm | SORA assessor | AC2 |
| REQ003 | ConOps & ODD review | Safety-case reviewer | AC2, AC7 |
| REQ004 | Airspace change & TDA coordination | Airspace regulator | AC6 |
| REQ005 | Geofence / containment enforcement | Airspace regulator | AC4, AC5 |
| REQ006 | C2 link & spectrum assurance | Spectrum specialist | AC4 |
| REQ007 | Detect-and-avoid performance | DAA assessor | AC3 |
| REQ008 | Swarm decision-making architecture | Autonomy reviewer | AC5, AC7 |
| REQ009 | ML assurance case (AMLAS / SACE) | AI/ML assurance specialist | AC7 |
| REQ010 | Human oversight & crew fitness | Human factors inspector | AC7 |
| REQ011 | Contingency & flight termination | Flight-ops inspector | AC4 |
| REQ012 | Data, MOR & DPIA obligations | Flight safety investigator | AC8 |
| REQ013 | Insurance & ground-environment survey | Authorisation officer | AC1, AC6 |
| REQ014 | Crew training & competency | Licensing inspector | AC1 |
| REQ015 | CAA/MAA interface management | CAA/MAA liaison | AC8 |
| REQ016 | Phased trial build-up | Programme manager | AC1 |
| REQ017 | Public & stakeholder engagement | absent-by-default — CAP 1616 / GDPR implied | AC6 |
| REQ018 | Security classification handling | absent-by-default — CAP 722H / JSP 440 | AC8 |
| REQ019 | Environmental impact | absent-by-default — noise, species, battery fire | AC6 |
| REQ020 | Post-trial assurance & close-out | absent-by-default — lessons learned, evidence preservation | AC1, AC8 |
14.2 The absent-by-default quartet (REQ017–REQ020)
REQ017–REQ020 are not in the user-story set — they were identified as gaps by the stress-test. The generator treats them as proactive checks: for every submission, it asks "is there evidence of community consultation? classification handling? environmental impact? post-trial closeout plan?" — and flags absence even when the applicant has not been asked for them. This is a deliberate asymmetric behaviour: better to surface a gap the applicant missed than let it surface during a live incident.
14.3 Using the catalogue against the two samples
- Scotland sample: REQ001–REQ003 effectively green (PDRA-01 scenario has the catalogue built in via CAP 722A conformance). REQ004–REQ011 mostly N/A (VLOS, single-vehicle, no autonomy). REQ012–REQ014 green (MOR via ECCAIRS, U.M. Association insurance, GVC + currency). REQ015 N/A (civil-only). REQ016 N/A. REQ017 partial (no explicit community engagement but CAA Skywise subscription present). REQ018 N/A (unclassified). REQ019 partial (LiPo fire procedure present; noise/species not). REQ020 amber (change log present; formal close-out practice implicit). Net: green with one or two amber notes — approve.
- SORA case: REQ001 red (no named accountable manager). REQ002 red (SORA v2.0 not v2.5; swarm-specific evidence absent). REQ003 red (ConOps sparse; ODD not defined). REQ004 amber (segregated airspace claimed; NOTAM / TDA process not evidenced). REQ005 red (hard geofencing claimed; independence not demonstrated; see keystone K1 in §16). REQ006 red (no link budget, no crypto posture, no inter-vehicle link analysis). REQ007 red (no DAA evidence of any kind — though arguably N/A in segregated; still required for containment-breach scenarios). REQ008 red (no architecture review, no emergent-behaviour evidence). REQ009 red (no AMLAS/SACE case). REQ010 red (no MHC operationalisation). REQ011 red (termination logic one line; no independence). REQ012 red (MOR not addressed). REQ013 red (no insurance statement, no ground survey). REQ014 red (personnel named by role only, no qualifications). REQ015 red (CAA/MAA apportionment not signed; this is a military-range civil-trial — mandatory). REQ016 N/A (single trial window). REQ017 red. REQ018 red (classified-range submission; no protocol). REQ019 red. REQ020 red. Net: refuse, not curable without re-submission.
▌15. DTFEA stress-test pass
15.1 The methodology
Per STRESS-Report.md: every requirement (and, by extension, every submission section) is stress-tested across five independent pillars. Each pillar carries a named analyst persona so the rubric can be held and applied consistently:
| Pillar | Analyst | Question |
|---|---|---|
| D — Data | Dana | What data does this depend on? Is provenance known? Is bias, drift, class imbalance, sensor-coverage surfaced? |
| T — Trust | Trevor | Who has to trust this output? Are model cards, uncertainty bounds, explainability, meaningful human control triggers in place for each consumer role? |
| F — Failure | Fiona | What happens when this fails, silently or loudly? Is there redundancy, independent containment, out-of-distribution detection, forensic audit, canary/sandbox? |
| E — Evidence | Evan | What evidence will convince a regulator? Acceptance criteria, test plans, re-evidencing cycle, state-space coverage, go/no-go thresholds? |
| A — Authority | Aria | Who is accountable when this goes wrong? Is the human-in-the-loop role named? Is inter-organisational apportionment signed? |
Working assumption (explicit): the AI will sometimes be wrong. This applies to the onboard autonomy, to the applicant's AI, and — importantly — to the generator itself.
15.2 Scoring and maturity
Each finding carries a severity (critical / major / minor) and a confidence (0–10). The per-pillar maturity percentage is a function of:
- Count and severity of findings
- Coverage of the pillar's question set
- Traceability of evidence to finding
In the supplied stress-test example: D 35%, T 40%, F 32%, E 21%, A 45% — overall "Not suitable (yet)". These percentages and the finding counts (28 critical / 50 major / 20 minor) provide a concrete shape the generator should reproduce on any submission.
15.3 Gates derived from DTFEA
The generator applies two gates on top of the regulatory coverage gate:
Gate 1 — Pillar maturity floor. Minimum 60% per pillar for approval. Below 60% forces refuse or return-for-clarification. The stress-test example's 21–45% range falls far below the floor; the "not suitable (yet)" verdict is mathematically determined rather than judgement-based.
Gate 2 — No unmitigated critical. Any critical finding with confidence ≥9 blocks approval until explicitly mitigated or formally accepted by the panel with a written rationale in the decision record.
15.4 Wiring into the decision record
Output sections 15 and 16 (see §7.1) render the REQ table and the DTFEA matrix directly into the decision record. The top-20 findings with severity × confidence, per the STRESS-Report format, form the body of §16.3.
▌16. Keystone failure-mode detectors
The stress-test surfaced eight critical patterns that recur across submissions. The generator hard-codes a detector for each. These are not discretionary — any one of K1–K8 blocks approval until explicitly resolved.
K1 — Geofence / containment independence (REQ005, Fiona-critical, conf 10)
Detector asks: Does the geofence enforcement path run through the same compute and software stack as the primary autonomy? If yes — single-point failure — refuse. Expected evidence: Dual-independent geofence (e.g., L2 swarm-coordinator + L3 safety-critical module hardened against corruption and spoof), with test evidence of enforcement under degraded states. Relevance to samples: SORA case claims "hard geofencing" without independence evidence — red.
K2 — DAA ML provenance and calibration (REQ007, Dana- and Fiona-critical, conf 10)
Detector asks: Does the DAA assurance pack contain (a) training dataset sheet with source, size, encounter types, sensor platforms, label accuracy metrics, (b) calibration study at 10 confidence thresholds with empirical ROC, (c) distributional-shift detection runtime? If absent — refuse. Expected evidence: Full DO-365 / F3442 mapping matrix, acceptance criteria table, multi-agent deconfliction confidence model, out-of-distribution monitor at runtime. Relevance: SORA case — no DAA content at all — red.
K3 — Swarm emergent behaviour bounds (REQ008, Dana- and Fiona-critical, conf 10)
Detector asks: Is the swarm's collective state space explored in simulation and live flight? Are emergent-behaviour monitors present at runtime? Is a named human accountable for the emergent-behaviour envelope? If any no — refuse. Expected evidence: State-space coverage report, simulation + flight-test matrix, runtime monitor for inter-agent consensus and cohesion, named role (Supplier for design, Operator for real-time supervision). Relevance: SORA case — no emergent-behaviour analysis — red.
K4 — ML forensic explainability (REQ009, Trevor- and Fiona-critical, conf 10)
Detector asks: Is there an onboard logging design that captures every input to DAA, swarm-decision, and geofence modules with intermediate confidence scores, timestamped, and exportable for post-incident forensic reconstruction? If no — refuse. Expected evidence: AMLAS explainability module mapping each model component to consumer + explanation strategy; onboard-log schema; post-incident reconstruction playbook. Relevance: SORA case — no AMLAS case — red.
K5 — Meaningful Human Control operationalisation (REQ010, Trevor-critical, conf 10)
Detector asks: Is MHC defined with a decision-responsibility matrix (AI-sole / human-supervised / human-sole), measurable override latency (<1 s target), and workload thresholds for the human operator? If absent or narrative-only — refuse. Expected evidence: Quantified override latency with test evidence, workload instrumentation plan, take-over simulator runs. Relevance: SORA case — "Remote Pilot / Mission Supervisor" named without MHC model — red.
K6 — Independent flight termination (REQ011, Fiona-critical, conf 10)
Detector asks: Is termination independent of the primary flight stack? Are there redundant termination channels (e.g., RC + software relay, plus independent RF kill)? Is final authority for termination explicitly human? If no or ambiguous — refuse. Expected evidence: Redundant termination channels, independent power/compute paths, explicit statement that final override is human-exclusive. Relevance: SORA case — "termination" mentioned in passing; independence unevidenced — red.
K7 — CAA/MAA apportionment as a mandatory gate (REQ015, Aria-critical, conf 10)
Detector asks: For any operation that crosses civil/military regulatory boundaries (e.g., military range in civil airspace, civil operator on military aircraft), has a signed CAA/MAA Apportionment Agreement been produced before trial authorisation? If not — refuse. Expected evidence: Signed apportionment naming regulatory ownership per leg, per event-class, per payload-class. Relevance: SORA case — Spadeadam military range operation with no apportionment evidence — red. (This is one of the highest-severity findings in the whole stress-test.)
K8 — Classification custodian & apportionment (REQ018, Aria-critical, conf 10)
Detector asks: Where any classified MOD evidence is presented to CAA, is a named Security Classification Custodian (joint MOD/CAA) identified, is a classification guide produced, and are CAP 722H / JSP 440 handling procedures explicit? If not — refuse. Expected evidence: Custodian named, classification guide present, redaction protocol documented, two-person downgrade rule in force. Relevance: SORA case — classified-adjacent range operation; no custodian — red.
16.1 How K1–K8 change the twin-sample expected behaviour
The §8.1 regression table stands, with one refinement: the SORA case refusal is now over-determined — any one of K1 through K8 fires independently with confidence 10 and blocks approval. The detector set is designed to be individually sufficient and jointly redundant, so a clever applicant cannot satisfice one while ignoring the others.
The Scotland sample passes K1–K8 by exemption rather than evidence — the PDRA-01 scenario does not carry the autonomy, swarm, or classification surface that the keystones guard. This is the correct behaviour: keystones should not penalise submissions that simply do not engage the failure mode they protect against.
▌17. Updates to implementation phases (supersedes §10 where in conflict)
Two new phases are inserted:
Phase 2b — Requirements catalogue evaluator (1 sprint, parallel to Phase 2)
Component addition: walks REQ001–REQ020 and emits the per-REQ verdict table. Passes twin-sample regression at the REQ-verdict level.
Phase 3b — DTFEA stress-test evaluator (2 sprints, parallel to Phase 3)
Component addition: runs the five-pillar rubric, produces per-pillar maturity, finding list with severity × confidence, and applies Gates 1 and 2. Passes twin-sample regression at the stress-test level.
Phase 4b — Keystone detector hardening (1 sprint, parallel to Phase 4)
Component addition: K1–K8 hard-coded detectors with their own unit tests. Each detector must individually and independently refuse the SORA sample.
Revised total: ~12 sprints to first live joint-panel approval (was ~10). The extra two sprints buy a much sharper detection surface — and a far more defensible decision record for the refusal case.
▌18. Updated risk register (supersedes §11 additions)
| Risk | Mitigation |
|---|---|
| Generator passes regulatory coverage and fails DTFEA quietly | Gate 1 (pillar floor) and Gate 2 (no unmitigated critical) are blocking; the drafter cannot emit "approve" while either gate is open |
| Pillar maturity scores become a single number that hides detail | Each pillar score renders with its full finding list in the decision record; the number is a summary, never the whole story |
| Keystone detectors overfit to the SORA sample and become brittle | K1–K8 are expressed as invariants (e.g., "independence required") not patterns (e.g., "contains word 'independent'"); unit tests include paraphrase adversarials |
| REQ017–REQ020 (absent-by-default) produce noise on submissions where they genuinely don't apply | Each absent-by-default REQ has an N/A-with-rationale path; the generator requires the rationale, not the evidence, in N/A cases |
| DTFEA stress-test itself ages out of date as failure modes evolve | The rubric lives in the knowledge base (§5) under joint MAA/CAA configuration control; the stress-test is itself stress-tested annually by a red team |
| Generator fails K7 (CAA/MAA apportionment) too aggressively on civil-only trials | K7 is conditional on the operation crossing a regulatory boundary; civil-only and military-only operations skip K7 with an explicit rationale line |
▌19. Definition of plan-v1.1-complete
Supersedes §13. The plan is ready for implementation when:
- v1.0 definition of plan-complete (§13) items are all satisfied.
- The requirements catalogue in §14.1 is agreed by MAA and CAA reviewers as the right pack-level taxonomy.
- The DTFEA rubric in §15 is adopted as the second-axis check, with Gates 1 and 2 accepted as blocking.
- The keystone detectors K1–K8 are agreed — both the list and each detector's refusal logic.
- The twin-sample regression in §8.1 is re-run mentally against K1–K8 and the expected over-determined refusal for the SORA case is accepted.
- The revised ~12-sprint implementation schedule is resourced.
End of v1.1 plan body. Implementation begins only after sign-off on §19.
v1.2 Addendum — Service blueprint, dual-mode engine, readiness gate, learning loop
Incorporating user journey.txt — the 10-step, 3-phase end-to-end service blueprint for the Workbench
▌20. Service blueprint — the 10 steps and 3 phases
The v1.0 and v1.1 plan treated the generator as a regulator-facing tool that runs once per submission. The blueprint corrects this: the Workbench is a service with three phases and ten steps, and the generator runs more than once in more than one posture.
20.1 Phase 1 — Operator Journey (Prepare · Improve · Submit)
Personas: Operator · Safety Manager · Autonomy Engineer · Airspace Consultant · Duty Holder
| Step | Operator action | Generator action | Outcome |
|---|---|---|---|
| 1. Create Application Workspace | Selects operation type, aircraft/swarm size, airspace, regulator(s) | Component 0 (Baseline Configurator, v1.2) pulls the relevant clause sub-library (CAP 722/722A/722B, SORA v2.5, RA 1600/2300, CAP 1616 triggers, JSP 936/AMLAS/SACE expectations) | Operator sees exactly what is expected for this operation — no guesswork |
| 2. Upload Evidence Pack | Uploads Safety Case, SORA, C2 link analysis, DAA evidence, swarm envelope, AMLAS/SACE case, CAP 1616 artefacts, Range Safety Case | Components 1–4 run; produces a live coverage matrix (RAG) | Operator immediately sees what's missing, weak, or unclear |
| 3. Assisted Pre-Submission Review | Reads AI feedback, revises documents, uploads new evidence, adds justifications, iterates | Components 5–6 run; emits operator-friendly feedback ("OSO 10 partially met — loss-of-link behaviour depends on swarm consensus"); private, non-punitive, invisible to the regulator | Fewer weak submissions; higher-quality applications; reduced regulator rework |
| 4. Submission Readiness Check | Runs the readiness check, captures Duty Holder acknowledgements, submits formally | System (§22) confirms all mandatory artefacts present, all red/amber items acknowledged, all required signatures captured | A clean, structured, regulator-ready submission enters the approval pipeline |
20.2 Phase 2 — Regulator Journey (Review · Challenge · Decide)
Personas: MAA Lead Inspector (Chair) · CAA Inspector (AAA / FOI) · RAISO · Range Safety Officer · Independent Technical Advisor
| Step | Panel action | Generator action | Outcome |
|---|---|---|---|
| 5. Joint Review Panel Opens Case | Inspectors open a single, structured application view | Presents clause-level coverage against SORA OSOs, CAP 722, RA/DEFSTAN, AI assurance anchors; evidence traceable to page and paragraph; flags residual risks, extrapolations, inconsistencies | Inspectors start at engineering substance, not document hunting |
| 6. Deep-Dive Reviews | Panel examines SORA / DAA / C2 / Swarm / Airspace / Autonomy — one sub-panel per area | Runs AC2–AC7 checks; highlights uncovered collision geometries, swarm-level link dependencies, extrapolation beyond validated envelope, AMLAS/SACE completeness, RAISO sign-off status | Panel focuses on judgement and challenge, not clerical checking |
| 7. Panel Deliberation & Overrides | Inspectors accept / override / add concerns to every AI finding | All actions logged, attributed, justified in plain English; no auto-close on amber/red | Automation bias actively mitigated; accountability stays human |
| 8. Decision Record Generation | Lead Inspector closes review | Component 7 drafts the decision record per §7 and §7.1 — approval/refusal, regulatory basis, residual risks + ALARP, limits, conditions, review triggers, signature blocks | Defensible, auditable, inquiry-ready decision artefact produced in hours, not weeks |
20.3 Phase 3 — Post-Decision & Learning
| Step | Action | Generator action | Outcome |
|---|---|---|---|
| 9. Export, Archive, Audit | Decision record exported as read-only PDF; audit trail retained; clause library versioned (§23) | Generates classification-marked PDF; pushes immutable audit to the ledger; pins the clause-library revision used | Strong institutional memory and consistency across cases |
| 10. Continuous Improvement | Lessons-learned captured; AI flags compared with regulator judgements; model performance calibrated under configuration control — never via self-learning in safety decisions (§23) | Calibration audits run against held-out cases; drift monitor alerts on divergence; clause-library updates follow joint MAA/CAA change control | The system improves without drifting regulation or authority |
20.4 How the service blueprint changes the plan
Three structural consequences:
- The engine runs more than once per submission. In Phase 1 it runs iteratively as the operator revises. In Phase 2 it runs once formally for the panel. In Phase 3 it feeds the calibration loop. Same engine, different cadence and different output posture.
- Mode matters more than function. The what the engine computes is identical in Applicant and Regulator modes. The what the engine reveals, logs, and exports is not. §21 formalises this.
- There is a gate between the phases. §22 makes the Submission Readiness Check a discrete capability with its own acceptance criteria.
▌21. The dual-mode engine
21.1 One codebase, two UX postures
| Property | Applicant mode (Phase 1) | Regulator mode (Phase 2) |
|---|---|---|
| Audience | Operator, Safety Manager, Autonomy Engineer, Airspace Consultant, Duty Holder | Joint Review Panel + advisers |
| Tone | Helpful, non-punitive, coaching | Defensible, precise, challenge-ready |
| Visibility of findings | Private to the applicant; not visible to the regulator until submission | Visible to all panel members; immutable from open |
| Iteration | Unbounded re-runs as documents change | One formal run per review cycle; re-open requires panel action |
| Output type | Operator Feedback Report (revise-and-resubmit) + Readiness Dashboard (gauges) | Structured Decision Record + Coverage Matrix + Gap Report + Deliberation Log |
| Logging | Draft-scoped; discarded on submit unless the applicant opts in | Immutable; timestamp + identity + rationale on every action |
| Decision language | "This is likely to be challenged because ..." | "Approve with conditions / Refuse / Return for clarification" |
| Data retention | Per applicant policy; no regulator-side retention pre-submission | UK sovereign infrastructure; highest classification in pack |
21.2 Applicant-mode output — Operator Feedback Report
A living document the operator can revise against. Sections:
- Readiness Gauges — one per AC (AC1–AC8), one per pillar (DTFEA D/T/F/E/A), one per REQ (REQ001–REQ020), plus a keystone-detector panel (K1–K8).
- Top Issues — highest-severity gaps, paraphrased as revise-next hints. Tone matches the blueprint examples: "OSO 10 partially met — loss-of-link behaviour depends on swarm consensus", "DAA tested envelope does not cover proposed operational geometry", "Autonomy failure modes not bounded to SACE severity levels".
- Suggested Evidence — for each red/amber item, the artefact class that would clear it (not fabricated content — a pointer to what's missing).
- Duty Holder Checklist — the acknowledgements the operator will need to capture before Step 4 will pass.
21.3 Critical constraint — no leakage between modes
The engine must not leak Applicant-mode findings into the Regulator-mode decision record. Two enforcements:
- Data isolation — applicant workspaces are logically separate from the regulator workspace; the only bridge is the formally submitted pack at Step 4.
- Provenance — every finding in the Regulator-mode decision record must cite a submitted artefact, never an Applicant-mode draft or revision history, unless the applicant has explicitly opted to include their change log.
This is not a policy preference; it is a mathematical invariant the drafter enforces at emit-time.
21.4 Why this matters
The Applicant mode is what turns the Workbench from a gatekeeper into a service. Operators get a private place to stress-test their own submission before it hits the formal pipeline. Regulators get fewer poor-quality submissions and cleaner decisions. The quality of the regulatory conversation improves because both sides start further forward.
▌22. Submission Readiness Check
22.1 The gate
The Submission Readiness Check is a pass/fail gate run by the operator at the end of Phase 1. It passes only when all of the following are true:
| Gate condition | How it's checked |
|---|---|
| All mandatory artefacts present for the classified operation type | Artefact detector (component 3) + baseline (component 0) — no red on AC1 |
| All red and amber items acknowledged with a written rationale from the operator | Each AC/REQ/DTFEA/K flag in the Operator Feedback Report carries an "acknowledged" marker with free-text; empty acknowledgements fail |
| Duty Holder acknowledgement captured | Signed declaration bound to the submission; must be present for DDH/ODH, RAISO where AI is load-bearing, RSO where a range is involved |
| Classification markings consistent | Every document carries a marking; the highest marking in the pack drives handling downstream |
| Submission pinned to a clause-library revision | Revision ID captured at submit; the panel will review against this revision even if the library is updated later |
| No open edits | All artefacts finalised; no draft flags |
22.2 What the gate produces
On pass: a Submission Manifest — the operator's signed, structured, regulator-ready package. The manifest is what enters Phase 2; the Applicant-mode feedback history does not cross the gate unless the operator opts in.
On fail: a Readiness Gap List — itemised, with the specific condition that blocked submission and the minimum action that would clear it. No punitive framing; the failed check is a recoverable state.
22.3 Why a formal gate rather than "submit whenever"
Three reasons:
- Panel efficiency. The Phase 2 panel opens a case expecting structure. A free-for-all submission costs panel time to sort.
- Accountability snapshot. The gate freezes the Duty Holder / RAISO acknowledgements at a specific moment. Later changes require a new submission, not a silent edit.
- Regression contract. The gate is the point at which the engine's outputs must be coherent across the three axes — AC coverage, REQ catalogue, DTFEA pillars. If any axis is inconsistent, the gate holds.
▌23. Post-decision loop — export, archive, audit, and calibrated improvement
23.1 Export & archive (Step 9)
On decision close, the engine emits:
- Read-only PDF of the Decision Record (the whole document per §7 and §7.1), classification-marked to the highest element in the pack.
- Evidence bundle — the submitted artefacts as they were at the gate in §22, with cryptographic hashes.
- Deliberation log — immutable timestamped record of every panel action.
- Clause-library revision pin — the exact revision in force at review date, so any future re-read of the decision can reproduce the regulatory baseline.
All artefacts go to the read-only archive on UK sovereign infrastructure. Retention follows the MAA/CAA records schedule plus any specific inquiry-holding requirement.
23.2 Calibrated improvement (Step 10)
The continuous-improvement loop has one absolute constraint from the blueprint: no self-learning in safety decisions. The engine does not update its own decision logic on the basis of live operational outcomes without explicit human review and configuration-controlled release. In practice:
| Signal | Handling |
|---|---|
| Regulator accepts an AI flag | Logged; feeds calibration statistics (precision/recall of flags) |
| Regulator overrides an AI flag | Logged with panel rationale; triggers a quarterly calibration review if the same override pattern repeats |
| AI misses a defect that only surfaces in live operations | Post-incident review; feeds a candidate new detector (e.g., a new keystone K9) through the clause library / detector change control, not into the model directly |
| Clause or OSO changes publish | Hot-patch to clause library; twin-sample regression re-run before release |
| DTFEA rubric evolves | Annual red-team review; rubric updated under joint MAA/CAA configuration control |
| Drift monitor alerts on any AI component | Named owner investigates; the Workbench's own assurance case is updated |
Calibration audits run against held-out historical cases — never live ones. The engine's own AI components carry the same AMLAS posture the engine demands of applicants (§9), and this is itself stress-tested by the red-team probe referenced in the Definition of Done.
23.3 Institutional memory
The archive is the memory. Inspectors can retrieve any historical decision and replay the clause-library revision in force at that date. Decisions never silently update; superseded revisions are archived. This guarantees the "consistency across cases" outcome the blueprint promises.
▌24. Revised implementation phases (supersedes §10 and §17 where in conflict)
Four new phases are inserted to cover the operator journey, the gate, and Phase 3.
Phase 0 — Clause library boot (unchanged)
Phase 1 — Ingest + classify (unchanged)
Phase 1b — Baseline configurator (1 sprint)
Component 0. Given operation type, aircraft, airspace, regulator, pulls the relevant clause sub-set. Deliverable: operator-facing workspace wizard.
Phase 2 — Artefact detector + regulatory mapper (unchanged)
Phase 2b — Requirements catalogue evaluator (unchanged, v1.1)
Phase 2c — Applicant-mode output (Operator Feedback Report + Readiness Dashboard) (2 sprints)
Renders §21.2 from the same engine. Private. Iterable. Passes twin-sample regression at the Applicant-mode level.
Phase 3 — SORA evaluator (unchanged)
Phase 3b — DTFEA stress-test evaluator (unchanged, v1.1)
Phase 4 — Gap analyser + decision drafter (unchanged)
Phase 4b — Keystone detector hardening (unchanged, v1.1)
Phase 4c — Submission Readiness Check (1 sprint)
Gate logic per §22. Duty Holder signature capture, classification-marking checker, clause-revision pin, manifest producer. Passes twin-sample regression: Scotland gates to Submit; SORA case fails at the Readiness Check stage with a clean gap list.
Phase 5 — Traceability binder + HITL (unchanged)
Phase 5b — Mode isolation enforcement (1 sprint)
Hard boundary between Applicant and Regulator workspaces per §21.3. Security review gate — no leakage between modes. This is a small-code, high-scrutiny phase.
Phase 6 — Red-team + calibration + drift monitor (unchanged)
Phase 6b — Post-decision archive & calibration loop (1 sprint)
Phase 3 of the blueprint. Export, archive, clause-library revision pinning, calibration statistics harness. No self-learning.
Phase 7 — Joint MAA/CAA panel trial (unchanged)
Revised total: ~18 sprints from zero to first live joint-panel approval (was ~12 in v1.1, ~10 in v1.0).
The increase is not padding — it reflects the blueprint's insistence that the Workbench is a service not a batch tool. The Applicant-mode UX, the mode-isolation guarantee, and the post-decision loop all carry real implementation weight that v1.0 and v1.1 underestimated.
24.1 Hackathon cut (revised)
Target for a hackathon demo:
- Phase 0 (clause library boot) + Phase 1 (ingest/classify) + Phase 1b (baseline configurator) + part of Phase 2 (artefact detector producing AC1 coverage matrix) + a minimal Phase 2c (Operator Feedback Report with readiness gauges).
- Live demo narrative: operator creates workspace → uploads the two samples → sees immediate private feedback → Scotland goes green on the readiness gauges, SORA case lights up red across K1, K2, K3, K7, K8.
- Regulator-mode and Phase 3 are out of scope for the hackathon but represented in the plan as the next natural increment.
▌25. Definition of plan-v1.2-complete
Supersedes §19. The plan is ready for implementation when:
- v1.1 definition of plan-complete (§19) items are all satisfied.
- The 10-step / 3-phase service blueprint in §20 is accepted as the target user journey.
- The dual-mode architecture in §21 is agreed — one engine, two UX postures, hard isolation between them.
- The Submission Readiness Check in §22 is accepted as the formal Phase 1 → Phase 2 gate with its six pass conditions.
- The post-decision loop in §23 is agreed — no self-learning in safety decisions is a hard constraint.
- The revised ~18-sprint schedule in §24 is resourced, or the hackathon-cut in §24.1 is explicitly accepted as the first-milestone target.
End of v1.2 plan. Implementation begins only after sign-off on §25.