$ ./stress-test --subject="caa-bvlos-uas-swarm"

AI STRESS-TESTCAA BVLOS UAS Swarm — Operational Authorisation

DTFEA methodology · generated 2026-04-22 16:04 · OpenRequirements.AI

❌ Not suitable (yet)

Programme-level verdict · ≥20% of requirements unsuitable OR critical failure-mode gap present

// executive-summary

EXECUTIVE SUMMARY

Requirements

100

Findings

✅ Proceed

⚠️ Pilot only

❌ Not suitable

// dtfea

THE 5 STRESS-TEST PILLARS

cat ./pillars.json

DData

Dana

35%

20 findings · 3c · 7M · 10m

TTrust

Trevor

40%

20 findings · 7c · 12M · 0m

FFailure

Fiona

32%

20 findings · 6c · 10M · 4m

EEvidence

Evan

21%

20 findings · 7c · 13M · 0m

AAuthority

Aria

45%

20 findings · 5c · 8M · 6m

// matrix

PER-REQUIREMENT DECISION MATRIX

REQ	Name	Verdict	Value	Risk	Gap	Score	Reason
`REQ001`	Application Intake & Completeness Check	❌ Not suitable	2.2	2	2.4	-2.8	Moderate assurance gap (score -2.8)
`REQ002`	SORA 2.0 Assessment for Swarm	❌ Not suitable	3.6	3	3.4	-3.9	One critical finding in pillar E
`REQ003`	ConOps & ODD Review	❌ Not suitable	3	3.2	3.4	-4.5	Critical failure-mode gap (Fiona)
`REQ004`	Airspace Change & TDA Coordination	❌ Not suitable	2.2	2.4	2.4	-3.2	Moderate assurance gap (score -3.2)
`REQ005`	Geofence / Containment Enforcement	❌ Not suitable	4.4	4.6	4.4	-6.5	Critical failure-mode gap (Fiona)
`REQ006`	C2 Link and Spectrum Assurance	❌ Not suitable	3.4	3.4	3	-4.0	Moderate assurance gap (score -4.0)
`REQ007`	Detect-and-Avoid Performance	❌ Not suitable	4.8	4.6	4.4	-6.4	Critical failure-mode gap (Fiona)
`REQ008`	Swarm Decision-Making Architecture Review	❌ Not suitable	4.8	5	4.8	-7.5	Critical failure-mode gap (Fiona)
`REQ009`	ML Assurance Case	❌ Not suitable	5	4.8	4.4	-6.1	Critical failure-mode gap (Fiona)
`REQ010`	Human Oversight and Crew Fitness	❌ Not suitable	3.6	3.8	3.8	-5.1	One critical finding in pillar T
`REQ011`	Contingency and Flight Termination	❌ Not suitable	4	4	4	-5.4	Critical failure-mode gap (Fiona)
`REQ012`	Data MOR & DPIA Obligations	❌ Not suitable	4.2	3.2	3.4	-3.8	2 critical findings across pillars
`REQ013`	Insurance & Ground-Environment Survey	❌ Not suitable	2.2	2.4	2	-2.8	Moderate assurance gap (score -2.8)
`REQ014`	Crew Training & Competency	❌ Not suitable	2.6	2.4	3	-3.4	Moderate assurance gap (score -3.4)
`REQ015`	CAA/MAA Interface Management	❌ Not suitable	2.6	3.2	2.8	-4.3	Critical accountability gap (Aria)
`REQ016`	Phased Trial Build-Up	❌ Not suitable	3.4	3.2	3.4	-4.3	One critical finding in pillar E
`REQ017`	Public and Stakeholder Engagement	❌ Not suitable	3	3.6	4.2	-6.2	2 critical findings across pillars
`REQ018`	Security Classification Handling	❌ Not suitable	2.6	3.6	3.8	-6.2	Critical accountability gap (Aria)
`REQ019`	Environmental Impact	❌ Not suitable	1.6	2.8	3.4	-5.2	Moderate assurance gap (score -5.2)
`REQ020`	Post-Trial Assurance & Close-out	❌ Not suitable	2.6	2.2	3.6	-4.0	Moderate assurance gap (score -4.0)

// findings

ANALYST FINDINGS

CRITICAL: DAA ML training data provenance, sensor-coverage gaps, and bias vectors completely unspecified

Pillar D · Analyst DanaREQ007

criticalconf 10#1

Analysis

REQ007 invokes DO-365/F3442 alignment but does not require documented evidence of: (a) what data (real-world encounters, simulation, synthetic) the DAA ML model was trained on; (b) whether that data covers cooperative + non-cooperative encounters at swarm scales; (c) sensor-platform representativeness (radar, optical, ADS-B assumed in training vs. actual platform mix); (d) known failure modes (e.g. class imbalance on rare encounter types). If DAA training data is inaccessible, biased toward fair-weather or single-aircraft scenarios, the model will fail unpredictably on novel swarm encounters.

Recommendation

Add REQ007.1: 'DAA assurance case must include: (a) training dataset sheet (source, size, encounter types, sensor platforms, label accuracy metrics); (b) test-set composition and reservation for independent T&E; (c) bias analysis (coverage of cooperative/non-cooperative, rare encounter types, multi-aircraft scenarios); (d) failure modes analysis (what happens when sensors disagree, or encounter is outside training distribution); (e) drift-monitoring plan for model performance over trial lifetime.'

CRITICAL: Swarm ML training data, class imbalance, emergent-behaviour labelling unspecified

Pillar D · Analyst DanaREQ008

criticalconf 10#2

Analysis

REQ008 asks for 'emergent-behaviour evidence' but does not mandate documentation of: (a) what training data (real swarms, simulation, synthetic multi-agent data) the coordination/decision models learned from; (b) how 'correct' swarm behaviour was labelled (who, by what criteria, dispute resolution); (c) class imbalance (e.g. 99% cooperative scenarios, 1% conflict). No requirement for graceful degradation when coordination data is incomplete or contradictory. Swarm models will exhibit unpredictable emergent failures if trained on narrow, biased data.

Recommendation

Add REQ008.1: 'Swarm decision-making assurance must include: (a) training data composition (simulation platform, real-world data sources, synthetic data generation method); (b) label provenance and accuracy (how emergent behaviours were classified, inter-rater agreement, dispute resolution); (c) class balance analysis and over/under-sampling rationale; (d) adversarial scenario testing (what if one aircraft sends false coordination data); (e) failure-mode analysis (what happens when swarm data is ambiguous or incomplete).'

CRITICAL: MOR obligations name recording but do not specify data classification, bias audit trails, or model-update provenance

Pillar D · Analyst DanaREQ012

criticalconf 10#3

Analysis

REQ012 is the data-governance anchor but is severely under-specified. It says 'complete recorded evidence' and 'DPIA' but does not mandate: (a) what sensor telemetry, AI logs, and training updates will be recorded and classified (for GDPR, IP, national security); (b) bias audit logs (if crew re-labels AI errors in-flight, how is that bias tracked); (c) data freshness and integrity checks during recording; (d) post-incident data forensics SOP; (e) data retention and reproducibility plan for independent T&E (e.g., can models/data be released for third-party validation?). DPIA is mentioned but scope (what data, whose privacy, how long retained) is not defined.

Recommendation

Expand REQ012: '(a) MOR must record: all sensor feeds, AI model decisions/confidence, crew inputs, system state snapshots. Each recording must carry timestamp, data-quality flags, and classification (OUO/IP/GDPR-sensitive). (b) Bias audit: capture any crew re-labelling of AI outputs in-field, with rationale. (c) Post-incident: preserve recording for forensics; define access control (CAA only, or third-party T&E?). (d) DPIA must address: sensor/telemetry privacy risk, crew/civilian PII leakage, model-update frequency and its impact on model explainability and auditability. (e) Data retention and releasability: define what can be shared for independent T&E reproducibility (e.g., synthetic training data, anonymised incidents).'

AMLAS Explainability Module Missing Consumer-Specific Explanation Strategies

Pillar T · Analyst TrevorREQ009

criticalconf 10#4

Analysis

REQ009: CAA AI/ML assurance specialist wants AMLAS/SACE case + frozen-model manifest + explainability evidence for assurance and forensics. This is the central pillar of AI trust. Critical gaps: (1) REQ009 mentions 'explainability' but does not specify: which models, which consumers, which explanation style. (2) Regulatory trust (CAA assessor) requires: causal feature importance, uncertainty bounds, failure modes. (3) Operator trust requires: post-hoc decision rationale in human-readable form, confidence thresholds. (4) Investigator trust requires: decision-replay logs with model weights/inputs for reconstruction. (5) AMLAS/SACE frameworks typically focus on assurance *of the process*, not explainability *of outputs*. The finding: frozen model manifest is necessary but insufficient.

Recommendation

AMLAS case must be supplemented with: (1) Explainability module mapping each model component (swarm consensus, DAA, geofence, etc.) to consumer + explanation style, (2) Model cards per model (accuracy, failure modes, uncertainty estimates), (3) Decision-log schema for MOR: timestamp, model inputs, inferred label, confidence, actual outcome, (4) Operator display mockups showing confidence thresholds and override options.

Meaningful Human Control Not Operationalised: Override Latency and Workload Thresholds Undefined

Pillar T · Analyst TrevorREQ010

criticalconf 10#5

Analysis

REQ010: CAA human factors inspector wants workload and take-over evidence to demonstrate meaningful human control. Critical trust gap: 'meaningful human control' is a regulatory doctrine but the requirement does not operationalise it. Trust consumer: Operator. Questions: (1) What is the operator's role—passive monitor or active decision-maker? (2) At what swarm-confidence threshold does the AI defer to the operator? (3) What is the operator's override latency (target <1s? <5s?), and is it tested under high workload? (4) How does the operator know when the AI is uncertain and needs oversight? REQ010 says 'take-over evidence' but does not specify: measured latency, workload assessment tools, or confidence thresholds that trigger handoff.

Recommendation

Human-oversight case must include: (1) Decision-responsibility matrix: which decisions are AI-sole, human-supervised, or human-sole? (2) Measurable override latency targets per decision type (e.g., 'formation change <2s, TDA breach <1s'), (3) Workload assessment: NASATASK or equivalent showing operator cognitive load remains <70% nominal, (4) Operator interface mockups showing AI confidence state and override buttons, (5) Take-over trial evidence from crew under high workload (simulated distractions).

Public Trust Completely Absent: No CAP 1616 Community Consultation or Prior Notification Plan

Pillar T · Analyst TrevorREQ017

criticalconf 10#6

Analysis

REQ017: No user story addresses community consultation, press/media handling, or prior landowner notification. Trust consumer: General public + affected landowners + media. This is a critical institutional trust gap. Public trust in autonomous swarms is low; lack of transparency invites post-incident backlash. CAP 1616 (airspace change) mandates consultation; GDPR Article 13 (automated decision-making) implies consent. Finding: pack contains zero mention of public engagement plan, media strategy, or landowner notification.

Recommendation

Public engagement case (to be created) must include: (1) CAP 1616 consultation report showing community engagement and response handling, (2) Prior-notification protocol: SMS/email to registered property owners 24h before flight, (3) Media/press strategy: plain-language swarm brief, FAQ on safety/autonomy, contact for incident reporting, (4) Transparency report post-trial: incident summary, lessons learned, public website.

CRITICAL: Geofence enforcement lacks independence; single-point failure unmitigated

Pillar F · Analyst FionaREQ005

criticalconf 10#7

Analysis

Requirement states 'automatic containment so a fly-away cannot become uncontrolled BVLOS.' Standard implementation: swarm-coordinator software checks aircraft position against geofence polygon, commands descent if breach detected. Failure modes not addressed: (1) GPS spoofing makes aircraft believe they are in-bounds when not, (2) geofence polygon in swarm-coordinator memory corrupted (single-bit error pushes boundary 100m), (3) swarm-decision algorithm interprets geofence breach as 'navigation error' and compensates further out-of-bounds, (4) entire swarm-coordinator reboots post-radio-glitch, geofence logic not re-initialized. No independent mechanism. No ground-commanded override. Recovery: swarm flies out of airspace uncontrolled. Consequence: collision with manned aircraft or impact populated area.

Recommendation

Dual-independent geofence: (1) Airborne L2: swarm-coordinator software (as-is), (2) Airborne L3: separate safety-critical module (hardened against corruption, GPS-spoofing resistant via multi-GNSS voting), (3) Ground-commanded L4: CAA/ATSU ground operator can remotely trigger geofence override, forcing all aircraft to level-flight descent at <2 m/s. Proof-of-concept: redundant geofence module on one aircraft, flight-test with GPS jitter/spoofing. Post-incident requirement: if any aircraft exits geofence, mandatory forensic unpack (why did L2 and L3 both fail?) before resumption. This is gate-keeper; absence forces NOT_SUITABLE.

CRITICAL: DAA ML lacks distributional-shift detection and post-incident forensics

Pillar F · Analyst FionaREQ007

criticalconf 10#8

Analysis

DO-365 evidence required but does not mandate: (1) out-of-distribution detection (if sensor input distribution drifts from training envelope, DAA must alert crew, not silently degrade), (2) confidence calibration (high-confidence output must match empirical true-positive rate; miscalibration causes false negatives to be trusted), (3) forensic explainability (post-incident, reconstruct DAA decision chain: what sensor inputs, what intermediate features, what confidence scores led to collision/miss?), (4) adversarial robustness (can spoofed radar bearings make DAA miss intruder?). Consequence: false-negative DAA (AI confident it sees no intruder when one is present) causes mid-air collision. No requirement for runtime detection or containment.

Recommendation

Mandatory: (1) Calibration study: measure true-positive, false-negative rate at 10 confidence thresholds, plot empirical ROC, ensure selected threshold matches acceptable false-negative rate (e.g. <1e-5), (2) distributional-shift monitor: onboard sensor novelty detector (e.g. Mahalanobis distance to training distribution), alert crew if anomaly score exceeds threshold (e.g. 3-sigma), (3) adversarial robustness testing: introduce GPS, radar, IFF spoofing in simulation, measure DAA failure rate, (4) forensic audit trail: log all inputs (radar plots, IFF, ADS-B, GPS), all intermediate confidences, final decision, timestamped, cryptographically sealed post-flight, (5) high-confidence-wrong escalation: if DAA outputs high confidence but crew observes contradiction (e.g radar plots show intruder but DAA says clear), mandatory post-flight forensic review and model retraining gate before next flight. Absence of any element = NOT_SUITABLE.

CRITICAL: ML assurance lacks forensic explainability and post-incident audit trail

Pillar F · Analyst FionaREQ009

criticalconf 10#9

Analysis

Requirement specifies 'AMLAS/SACE case, frozen-model manifest, explainability evidence.' But no mandate: (1) forensic audit trail (full input, intermediate activations, confidence, output logged per decision for post-incident analysis), (2) model-versioning control (which model ran on which flight?), (3) post-incident explainability contract (if incident occurs, can you reconstruct decision path within 24 hours?), (4) model retraining governance (after incident, how long before model is retrained and re-qualified?), (5) distributional-shift telemetry (is training distribution representative of actual trial conditions?). AMLAS/SACE checklist covers process; does not enforce runtime forensics. Consequence: mid-air collision or fly-away occurs, investigator cannot explain why ML system failed. No accountability.

Recommendation

ML assurance case must include forensic audit trail design: (1) Onboard logging: every input to DAA, swarm-decision, geofence module; every intermediate confidence/feature; every output and timestamp; recorded with redundancy (dual SD card, cloud upload post-flight), (2) Model manifest per-flight: serial number of DAA model, swarm model, training-data hash, commit ID if version-controlled, (3) Explainability-on-demand protocol: post-incident, investigator can request 'explain decision at 15:34:27 UTC,' system returns input, feature activation maps (saliency heatmaps), decision path, confidence scores, (4) Retraining gate: post-incident, frozen retrain period (e.g 1 week forensic analysis + crew re-certification), retraining does not resume until root-cause identified (e.g 'sensor saturation at high altitude' → retrain with synthetic high-altitude data), (5) Industry disclosure: critical incident with ML root-cause must be disclosed (anonymized) to CAA AI/ML working group for community learning. Absence of forensic audit trail = NOT_SUITABLE.

CRITICAL: Flight termination not independently assured; contingency matrix incomplete

Pillar F · Analyst FionaREQ011

criticalconf 10#10

Analysis

Requirement states 'independent termination, contingency matrix, ATSU coordination.' But 'independent' is undefined: standard design is single-channel RC receiver + software kill-switch (onboard or ground-commanded). Failure modes not addressed: (1) RC receiver failure or jamming, (2) software kill-switch corrupted (stuck 'live' state), (3) kill command issued but not executed (software in wrong state to receive kill), (4) termination = power cut to motors, but swarm is mid-coordinated dive (aircraft falls vertically, increases collision risk within swarm). No specification: (1) required-success rate (must termination work >99.9% of attempts?), (2) maximum latency (how fast must entire swarm cease thrust?), (3) post-termination trajectory (if falling from 400ft, can swarm glide or impact terrain?), (4) ATSU coordination protocol (who decides to terminate, when, and how is decision transmitted?). Consequence: swarm cannot be made safe in off-nominal state (e.g fire on one aircraft, geofence compromised, mid-air collision imminent). Uncontrolled descent into populated area.

Recommendation

Flight termination is last line of defence; must be independent and assured: (1) Redundant termination channels: (a) RC receiver + software relay (pilot manual), (b) onboard hardware watchdog (timer-based, terminates if no valid heartbeat from swarm-coordinator in 5s), (c) ground-command backdoor (independent transceiver + hardened receiver logic, dedicated button at ATSU), (d) parachute/ballistic recovery (if time/altitude permits), (2) Testing: ground tests (verify each channel independently halts all motors within 1s), flight tests (test termination at 5 points in mission profile: hover, cruise, climb, descent, coordinated maneuver), measure variance, gate if 99.5% success rate not met, (3) Contingency matrix: documented 15-20 off-nominal scenarios (fire, structural failure, mid-air collision detected, geofence breach, swarm-consensus deadlock, C2 link loss >30s), for each: decision-maker (crew or ground AI?), termination trigger, expected aircraft survival, post-incident investigation priority. (4) ATSU protocol: signed MOU, authority limits (can ATSU terminate without crew consent?), voice procedure, logging. Absence of independent L3/L4 termination = NOT_SUITABLE.

SORA 2.0 framework named but swarm-specific evidence thresholds absent

Pillar E · Analyst EvanREQ002

criticalconf 10#11

Analysis

REQ002 calls for 'swarm-aware SORA evidence' but does not specify what distinguishes swarm SORA from single-aircraft SORA (e.g., coordinated separation assurance, communication failure modes, emergent Loss of Signal scenarios, multi-agent decision-making risk). SORA 2.0 is the primary evidence framework for CAA ops authorisation; without swarm-specific thresholds, SAIL/OSO defensibility collapses. The requirement conflates assessment (identifying residual risks) with assurance (proving acceptability).

Recommendation

Produce swarm-specific SORA evidence artefacts: (a) per-aircraft operational risks (as conventional SORA), (b) inter-aircraft coordinated-manoeuvre risks, (c) swarm-loss-of-cohesion recovery, (d) communication-loss formation holding, (e) quantified SAIL per aircraft and swarm separation minimum. Evidence must cite DO-178C + swarm-ops papers (e.g., ACAS Xu, airborne collision avoidance for multiple agents).

DO-365 / F3442 alignment claimed; swarm de-confliction test plan and acceptance criteria missing

Pillar E · Analyst EvanREQ007

criticalconf 10#12

Analysis

REQ007 is the safety-critical requirement for autonomy. It demands 'DO-365 / F3442-aligned evidence' but lacks: (a) mapping to specific DO-365 objectives (e.g., Obj 3.1 'detect non-cooperative targets', Obj 3.2 'avoid collision'), (b) acceptance criteria (sensitivity, false-alarm rate, time-to-manoeuvre), (c) swarm de-confliction specifics (ACAS Xu test scenarios or applicant equivalent?), (d) test report (live or high-fidelity sim), (e) independent audit trail. DO-365 is prescriptive; claiming alignment without traceability is insufficient.

Recommendation

DAA/swarm de-confliction assurance: (a) DO-365 mapping matrix (each DO-365 obj → test case ID, test report section), (b) acceptance criteria table (detection range >X m, false-alarm rate <Y%, swarm collision avoidance success ≥Z%), (c) test design per cooperative (ADS-B/V2X) and non-cooperative (radar) targets, (d) swarm encounter scenarios (intruder at edge of formation, within formation, head-on 3-ship, cross-pattern), (e) simulation + live-trial test reports, both signed by independent T&E authority, (f) forensic data from trials to validate post-hoc.

AMLAS/SACE container named; governance, re-evidencing, and deployment monitoring absent

Pillar E · Analyst EvanREQ009

criticalconf 10#13

Analysis

REQ009 explicitly requires AMLAS / SACE case but the requirement does not specify: (a) ML components in scope (e.g., object detection for DAA, swarm state estimation, coordinated manoeuvre optimization?), (b) model governance (version control, frozen-model manifest, re-evidence triggers post-update), (c) assurance artefacts (AMLAS sections: usage, data, testing, monitoring), (d) explainability evidence (feature importance, decision boundaries, failure case analysis), (e) post-deployment monitoring (drift detection, performance degradation thresholds, retraining governance). AMLAS/SACE is the evidence backbone for AI/ML; incompleteness here cascades through REQ007–REQ008.

Recommendation

AMLAS assurance case covering: (a) model inventory (each ML component, training dataset provenance, validation metrics AUROC/Precision/Recall), (b) data governance (train/test split ratios, class imbalance handling, ODD coverage in training data), (c) AMLAS technical sections (Safety Argument, Testing, Assurance Monitoring), (d) frozen-model manifest with hash/metadata, (e) re-evidence triggers (e.g., model retraining triggered after ≥100 flight hours or ≥5% performance drift), (f) drift-detection + monitoring dashboard post-deployment, (g) explainability report (e.g., SHAP for adversarial robustness, confusion matrices for ODD corner cases), (h) independent audit by CAA-approved ML assurance specialist.

Requirement entirely absent from pack; community consultation and media management unaddressed

Pillar E · Analyst EvanREQ017

criticalconf 10#14

Analysis

REQ017 explicitly notes that no user story addresses community consultation, press/media handling, or landowner notification. CAP 1616 (CAA's consultation framework) and GDPR (privacy of identifiable persons in trial area) both mandate stakeholder engagement. Without evidence of consultation, CAA cannot satisfy parliamentary accountability or fend off public challenge. This is a regulatory showstopper if unaddressed before trial commencement.

Recommendation

Stakeholder engagement plan: (a) community impact assessment (identify nearby residents, businesses, schools, hospitals affected by trial), (b) consultation strategy (public notice, open house, feedback channels, complaint procedure), (c) media statement and FAQs (evidence-based risk communication), (d) landowner consent or compensation protocol (if flying over land), (e) evidence archive (notices issued, responses received, CAA/applicant sign-off on adequacy). This plan must be approved by CAA before trial start.

Requirement entirely absent; classified MOD evidence cannot be integrated without procedure

Pillar E · Analyst EvanREQ018

criticalconf 10#15

Analysis

REQ018 notes that no procedure exists for presenting classified MOD evidence (e.g., swarm algorithms, ML models, test results) to the CAA without compromise (CAP 722H, JSP 440). If MOD-apportioned requirements (e.g., REQ008 swarm architecture, REQ009 ML assurance) rest on classified technology, CAA cannot audit or approve them transparently. This creates a fundamental assurance-governance gap.

Recommendation

Security classification protocol: (a) joint CAA/MOD/Applicant classification review (which evidence is classified? at what level?), (b) redaction procedure (does CAA need full algorithms or only performance summary?), (c) independent verification channel (e.g., GCHQ security-cleared auditor provides abstracted findings to CAA), (d) signed agreement defining what CAA can/cannot publish post-trial, (e) precedent review (how have prior MOD-CAA regulatory programmes handled this?). Protocol must be approved before trial evidence submission.

CRITICAL: Architecture reviewed; no human accountable role named for emergent swarm behaviour

Pillar A · Analyst AriaREQ008

criticalconf 10#16

Analysis

Requirement demands 'decision-making architecture airborne limits and emergent-behaviour evidence' be reviewed so CAA 'can assure what the swarm will do collectively.' This is the right governance question. BUT: no requirement states who is accountable when the swarm behaves in an emergent way not explicitly modelled or predicted. Emergent behaviour is, by definition, not fully predictable from individual rules. Requirement assures the architecture is bounded; does not name who is accountable for unintended collective outcomes. Is it: operator (deployed it)? Supplier (designed it)? CAA (approved it)? Gaps: (1) no named duty-holder, (2) no real-time override mechanism defined, (3) no post-incident model of emergent behaviour reconstruction.

Recommendation

CRITICAL: Add: '(a) Supplier is accountable for swarm architecture design and emergent-behaviour modelling limits; (b) Operator is accountable for real-time swarm override authority—explicitly named person must hold manual control to break swarm if emergent behaviour is detected off-nominal; (c) CAA is accountable for pre-trial audit of override timing and operator workload; (d) On incident, CAA investigator must reconstruct the swarm state and explain emergence post-hoc.' Add explicit role: Real-Time Swarm Safety Officer (operator crew) with exclusive authority to override all swarm decisions.

CRITICAL: Apportionment deferred; regulatory accountability gap unresolved

Pillar A · Analyst AriaREQ015

criticalconf 10#17

Analysis

Requirement states 'As a CAA/MAA liaison I want a signed apportionment so that regulatory ownership is gap-free.' But requirement does not mandate the apportionment document exist before trial start; it only states what is wanted. For BVLOS swarm in UK airspace, MOD/MAA may have classification or safety interests (e.g., if trial data is OFFICIAL-SENSITIVE or affects military airspace). Critical gaps: (1) No signed apportionment document pre-trial means accountability seams are unresolved until incident (too late); (2) No definition of 'gap-free'—does MAA have veto authority over CAA decisions or vice versa?; (3) No named veto-holder if CAA and MAA disagree; (4) If trial is joint UK/allied, international apportionment is unaddressed. This is a pre-trial requirement that must be signed or trial cannot proceed.

Recommendation

CRITICAL: Rewrite REQ015 as mandatory gate: 'GATE: Signed CAA/MAA Apportionment Agreement must be executed before trial authorisation, specifying: (a) Regulatory veto authority (e.g., MAA has veto over military-airspace or classified-data decisions); (b) Incident investigation authority (e.g., CAA leads unless MOD assets damaged); (c) Data classification handling (who is custodian of OFFICIAL-SENSITIVE evidence); (d) Post-trial knowledge dissemination (can CAA publish lessons learned or does MAA restrict?). Agreement must be signed by named CAA/MAA representatives with authority to commit their organisations.' Pre-trial, governance sign-off includes apportionment proof.

CRITICAL: Classified MOD evidence accountability custodian undefìned; CAP 722H/JSP 440 unimplemented

Pillar A · Analyst AriaREQ018

criticalconf 10#18

Analysis

Requirement correctly identifies: no user story addresses classified MOD evidence handling under CAP 722H (MOD–CAA Interface) / JSP 440 (MOD security policy). This is critical for any trial with MOD involvement or classified airspace/data. Accountability gaps: (1) No named custodian for classified evidence (MOD, CAA, operator, or split custody?); (2) CAA case officer and inspectors may not have security clearance to review classified ConOps/ODD; (3) No mechanism for unclassified 'sanitised' safety case to be reviewed by CAA civil staff while MOD retains classified details; (4) Post-trial, classified data cannot be published in incident reports—who decides what is shareable?; (5) Incident investigation: if incident involves classified data, who leads investigation (CAA civil investigator, MOD, or joint team)?. This is unresolved and blocks MOD trials.

Recommendation

CRITICAL: Add REQ018 enforcement: 'Named role: Security Classification Custodian (MOD and CAA jointly). Pre-trial mandatory: (a) Classification guide produced specifying OFFICIAL/OFFICIAL-SENSITIVE/SECRET domains; (b) Sanitisation strategy: unclassified safety case extracted so CAA civil staff can review generic risk; (c) MOD Security Liaison (RAISO-equivalent) sits on CAA Safety Committee for classified-domain decisions (override veto on MOD airspace/data items); (d) Incident investigation protocol: if classified data involved, MOD and CAA jointly designate investigator (MOD if MOD assets/data, CAA if civil airspace/safety); (e) Post-trial publication: CAA can publish unclassified lessons learned; MOD classifies remaining findings. Apportionment must be signed by CAA and MOD pre-trial.'

Geofence Enforcement Not Paired with AI Override Capability

Pillar T · Analyst TrevorREQ005

criticalconf 9#19

Analysis

REQ005: Automatic containment as fail-safe. Trust consumer: CAA airspace regulator (high stakes). Critical trust gap: AI swarm control and geofence enforcement must be independent. If swarm AI drives towards geofence, trust depends on: (1) geofence is physically enforced (not software-only), (2) operator can override swarm maneuver with immediate containment command, (3) confidence bounds on geofence triggers are explicit. No mention of who trusts the geofence logic—operator or regulator? If regulator-only, operator has no override, violating REQ010 (meaningful human control).

Recommendation

Geofence architecture must specify: (1) hardware/firmware geofence layer independent of AI, (2) operator override latency (<1s target), (3) test evidence showing geofence holds under swarm-AI maximum velocity and acceleration, (4) post-incident geofence log forensics in MOR (REQ012).

DAA Evidence Does Not Address Multi-Agent Deconfliction Confidence

Pillar T · Analyst TrevorREQ007

criticalconf 9#20

Analysis

REQ007: CAA DAA assessor wants DO-365/F3442-aligned evidence + coordinated swarm de-confliction for predictable encounter resolution. Trust consumer: CAA DAA assessor (high-stakes regulatory authority) + DAA de-confliction counterparties (other airspace users). Critical gap: Single-vehicle DO-365 evidence is well-established; swarm adds multi-agent coordination layer. Trust question: if swarm AI coordinates with external DAA relays, what confidence does the external de-confliction partner have that the swarm will follow its assigned slot? No mention of: (1) swarm coordination confidence intervals, (2) model cards for the multi-agent decision logic, (3) forensic replay capability for failed encounters.

Recommendation

DAA submission must include: (1) per-aircraft DO-365 evidence (single-vehicle), (2) multi-agent coordination model card (swarm consensus confidence, latency), (3) uncertainty bounds on swarm trajectory prediction for external de-confliction (e.g., '+/- 50m 3-sigma'), (4) MOR protocol for recording all DAA alerts, maneuvers, and inter-swarm messaging.

Emergent Behaviour Evidence Lacks Uncertainty Quantification and Operator Explainability

Pillar T · Analyst TrevorREQ008

criticalconf 9#21

Analysis

REQ008: CAA autonomy reviewer wants airborne limits and emergent-behaviour evidence to assure what the swarm will do collectively. Trust consumer: CAA autonomy reviewer (regulatory authority) + Operator (daily trust). Critical gap: 'Emergent behaviour' is notoriously difficult to assure; trust requires: (1) Regulatory: causal explainability (why did swarm re-form as V instead of line?), (2) Operator: post-hoc explainability (what happened and can I override next time?), (3) Investigator: forensic reconstruction (what decision sequence led to the incident?). No mention of: (a) confidence bounds on swarm formation stability, (b) operator-facing explainability, (c) forensic replay logs.

Recommendation

Swarm architecture case must include: (1) Deterministic consensus protocol with mathematical proof or >99.9% simulation confidence on stable formations, (2) Per-agent decision model cards + inter-agent messaging log, (3) Operator display showing current swarm confidence state and formation alternatives, (4) Forensic replay capability in MOR data (REQ012).

MOR Data Schema Does Not Mandate ML Decision Replay Logs; DPIA Assumes Passive Surveillance

Pillar T · Analyst TrevorREQ012

criticalconf 9#22

Analysis

REQ012: CAA flight safety investigator wants complete recorded evidence, accepted MOR obligations, DPIA for incident investigation and privacy. Trust consumer: Investigator (post-incident trust, forensics). Critical gap: Standard aircraft MOR (altitude, heading, airspeed) is insufficient for AI-enabled swarm. Investigator must be able to replay: (1) Per-agent decision sequence and model confidence, (2) Inter-agent messages and consensus process, (3) Inputs to each ML model at incident moment. DPIA assumes passive flight recording; swarm adds on-board decision logs that may be classified as surveillance. Finding: MOR data schema and DPIA do not address forensic reconstruction of swarm decisions.

Recommendation

MOR case must include: (1) Extended data schema: per-agent model inputs/outputs, confidence scores, consensus messages, (2) Decision-replay capability: frozen model weights + MOR logs must reconstruct incident sequence, (3) DPIA supplement addressing on-board decision logging (is this surveillance? consent model?), (4) Forensic playback tool specification (who has access, under what conditions?).

ODD defined for platform, not for AI subsystem; scope creep unguarded

Pillar F · Analyst FionaREQ003

criticalconf 9#23

Analysis

ConOps defines airspace, weather, crew competency, nominal swarm formation. ODD missing: ML training data envelope (e.g. max aircraft separation, lighting, speed range), DAA sensor performance (SAR saturation, GPS dropout bands), swarm decision diversity threshold (how many consensus-breaking aircaft before auto-recovery?). No explicit boundary: when does autonomy revert to RC-only? What happens if operationally expedient mission scope creeps beyond original DAA training distribution? Operators under schedule pressure may fly outside ODD without crew detection.

Recommendation

Expand ODD with: (1) ML training-data bounds (distribution of aircraft density, speed, altitude, lighting, sensor noise), (2) DAA performance envelope (sensor latency, accuracy, false-alarm rate per scenario), (3) swarm consensus bounds (N-aircraft failure tolerance, minimum consensus threshold), (4) explicit scope-creep gates (crew must re-certify any mission deviating from original ODD; deviation triggers manual supervision). Operator manual: hard-rule examples of out-of-ODD signs (e.g 'if intruder closing speed >150 kts, revert to RC').

CRITICAL: Swarm emergent behaviour unmonitored; no runtime assurance or containment

Pillar F · Analyst FionaREQ008

criticalconf 9#24

Analysis

Requirement asks for 'airborne limits and emergent-behaviour evidence.' But no runtime enforcement: swarm decision-maker (multi-agent consensus algorithm or neural network policy) operates closed-loop in-flight with no human monitor of intermediate states. Failure modes: (1) consensus algorithm diverges (aircraft vote for contradictory maneuvers, resulting in collision within swarm), (2) emergent cascading failure (one aircraft misinterprets formation-keep command, other aircraft react to reposition, spiral out of control), (3) latent distributional shift (training was on 4-aircraft scenarios, trial flies 8; consensus becomes unstable), (4) operator misuse (crew overrides AI-recommended descent with aggressive climb, AI does not anticipate human input, swarm decision becomes invalid). Recovery: manual crew control. Detection: none specified.

Recommendation

Mandatory swarm-assurance gates: (1) Simulator multi-agent stress test: run swarm decision algo against adversarial scenarios (one aircraft sensor failure, cascading conflict, GPS outlier, integrator wind-up), log decision divergence metric (e.g. variance of intended headings), gate if divergence exceeds 15 degrees, (2) Progressive trial build (REQ016): fly 2-aircraft shadowed by RC, log swarm-decision outputs every 100ms, flag anomalies (rapid consensus flip, persistent disagreement >5s), (3) Runtime swarm-health monitor onboard: compute real-time decision diversity (divergence of aircraft intents), alert crew if diversity spikes unexpectedly (sign of instability), (4) Post-flight swarm-decision forensics: replay recorded consensus votes, identify decision points where swarm diverged from nominal, classify as nominal/edge-case/failure, feed back to simulator, (5) Emergent-behaviour gate between N and N+2 aircraft: do not scale until decision diversity and robustness re-validated. Absence = NOT_SUITABLE for multi-aircraft release.

Architecture review scope named; emergent-behaviour test design and acceptance criteria absent

Pillar E · Analyst EvanREQ008

criticalconf 9#25

Analysis

REQ008 asks for 'airborne limits and emergent-behaviour evidence' but does not define: (a) what 'emergent behaviour' means (decentralised swarm consensus algorithm? coordinated manoeuvre? leader-follower dynamics?), (b) state-space boundaries (what swarm configurations are possible/impossible?), (c) failure modes (e.g., leader-loss → who becomes leader? consensus oscillation? breakaway?), (d) test methodology (simulation only? hardware-in-loop? flight test?), (e) acceptance criteria ('stable formation recovery within T seconds', 'no collision in N failure scenarios'). Without these, assurance of 'what the swarm will do' is intuitive, not rigorous.

Recommendation

Swarm architecture assurance: (a) formal specification of decision-making (e.g., consensus algorithm, comms topology, loss-of-quorum rules), (b) FMEA of swarm failure modes (leader loss, link partition, member dropout), (c) acceptance criteria (state convergence time, formation recovery latency, deadlock-freedom proof or test evidence), (d) Monte Carlo or deterministic test design covering ODD + edge-of-envelope + multi-failure scenarios, (e) simulation report covering ≥10,000 virtual encounters, (f) live-trial data from phased gates validating sim correlation.

Phased gates named; success criteria, go/no-go decision rules, and evidence-progression plan absent

Pillar E · Analyst EvanREQ016

criticalconf 9#26

Analysis

REQ016 is the meta-requirement for evidence accumulation: phased gates allow evidence to build and risks to be retired progressively. However, it does not specify: (a) phase definitions (is 'Phase 1' simulation-only? does 'Phase 2' require shadow BVLOS before autonomous?), (b) success criteria per phase (REQ007 DAA test pass rate ≥90%? REQ009 ML model AUROC >X? REQ010 crew takeover latency <T s?), (c) go/no-go decision rule (unanimous CAA/MAA? or applicant-led with CAA audit?), (d) evidence archive (are trial results locked for audit after each phase?), (e) reversibility (can programme pause/restart if evidence degrades?). Without gates, evidence is ad-hoc, not planned.

Recommendation

Phased gates protocol: (a) define phases: Phase 0 (sim only, ML validation), Phase 1 (supervised GNSS flight, manual override available), Phase 2 (shadow BVLOS, crew monitoring, no comms jamming), Phase 3 (autonomous BVLOS, full ODD), (b) per-phase success criteria tied to each REQ (e.g., Phase 1 exit: REQ007 DAA ≥95% success in 1000 encounters, REQ010 crew workload ≤6 NASA-TLX, REQ012 data integrity 100%), (c) go/no-go gate review (date, named CAA/MAA reviewers, decision criteria, documented), (d) trial evidence repository (locked after each gate for audit trail), (e) hold/pause trigger (if performance degrades, how is restart decided?).

No human accountable role named for containment failure; AI autonomous barrier critical to safety

Pillar A · Analyst AriaREQ005

criticalconf 9#27

Analysis

Requirement states 'automatic containment' must prevent fly-away. But 'automatic' implies AI or firmware decides containment without human intervention in real-time. If geofence fails, who is accountable? Requirement does not name a human duty-holder responsible for: geofence design, testing, runtime monitoring, or failure response. Supplier designs it; operator deploys it; but who is answerable if it fails? Accountability gap is critical because this is safety-critical autonomy.

Recommendation

CRITICAL: Add named human accountability: '(a) Supplier is accountable for geofence design and firmware assurance; (b) Operator is accountable for geofence deployment and test validation pre-trial; (c) CAA inspector is accountable for pre-trial audit of geofence test evidence; (d) If containment fails, CAA investigator will determine fault. All roles must be pre-assigned and understood.'

CRITICAL: Termination independent; but human vs. AI authority for termination trigger is undefined

Pillar A · Analyst AriaREQ011

criticalconf 9#28

Analysis

Requirement demands 'independent termination' and 'contingency matrix'—excellent governance. But critical ambiguity: Can AI initiate termination, or is termination human-exclusive? Requirement says 'flight ops inspector wants independent termination… so trial can be made safe in any off-nominal state.' Does 'independent' mean 'redundant human channels' or 'independent from AI'? If AI detects off-nominal state and auto-terminates without human approval, is that 'making safe' or 'autonomous lethality'? For a swarm, termination (command to descend, hover, return) is a collective action; if one aircraft refuses to terminate, does the trial continue? No requirement specifies who is accountable if termination fails or if AI's termination decision is unintended. RAISO (Responsible AI Systems Officer) accountability unaddressed.

Recommendation

CRITICAL: Explicitly state: '(a) Human termination authority is exclusive—no AI shall auto-terminate the trial without explicit crew command; (b) Any AI recommendation to terminate must be logged and immediately escalated to crew for veto/override within 5 seconds; (c) Termination command is pilot-exclusive via hardwired circuit or secure C2 link; (d) Named role: Trial Safety Officer (crew) holds exclusive termination authority; (e) Contingency matrix must include: AI fails to execute pilot termination command—escalation to ATSU. Post-incident, Safety Officer is accountable for all termination decisions made or deferred.'

MAJOR: ML assurance named but data provenance, bias audit, and drift monitoring are optional, not mandatory

Pillar D · Analyst DanaREQ009

majorconf 10#29

Analysis

REQ009 is the only requirement explicitly addressing ML, invoking AMLAS and model freezing. However, it does not mandate: (a) training data audit trail (what data went in, when, from where); (b) label accuracy acceptance criteria; (c) bias analysis (demographic, sensor, mission-context, adversarial); (d) drift-monitoring plan for model performance degradation over trial lifetime; (e) graceful fallbacks if ML confidence drops below acceptance thresholds. The 'frozen model manifest' could hide data quality issues if data lineage is not independently verified.

Recommendation

Require REQ009 to include: 'ML assurance case must document: (a) complete data provenance (sources, timestamps, versions, chain-of-custody for training/test splits); (b) label accuracy (metrics, inter-rater agreement, re-labelling procedures); (c) bias analysis across all identified vectors (sensor, platform, environmental, demographic if applicable); (d) performance acceptance criteria by safety-criticality level; (e) continuous drift monitoring plan (metrics, frequency, re-training triggers, fallback policy); (f) data retention and reproducibility plan for T&E forensics.'

NO USER STORY: accountability for public/stakeholder communication unaddressed

Pillar A · Analyst AriaREQ017

majorconf 10#30

Analysis

Requirement correctly notes: no user story addresses community consultation, press/media, or landowner notification. CAP 1616 (consultation) and GDPR (privacy) imply obligations, but this requirement is orphaned with no accountable role named. Gap: (1) No named role accountable for stakeholder messaging; (2) No requirement for pre-trial public notification of trial (required by CAP 1616 for airspace changes); (3) No media strategy or incident communication plan; (4) If media reports trial negatively before CAA responds, who is accountable for narrative control? (5) GDPR: if trial collects personal data (e.g., location logs of ground crew or observers), who is data controller (operator or CAA)?. This is a major governance gap.

Recommendation

Add REQ017 enforcement: 'Named role: Stakeholder Engagement Officer (CAA or operator delegate). Pre-trial mandatory: (a) CAP 1616 consultation document published; (b) Landowner notification sent with trial details and safety briefing; (c) Media relations plan approved by CAA (incident communication protocol); (d) GDPR data controller designated (operator or CAA) and privacy notice published; (e) Post-trial, lessons-learned summary sent to stakeholders. Operator is accountable for timely execution; CAA is accountable for approval.'

No data completeness criteria defined for AI/ML artefacts

Pillar D · Analyst DanaREQ001

majorconf 9#31

Analysis

REQ001 demands a 'complete' intake pack but does not enumerate what data artefacts the CAA case officer must verify for AI systems. Missing: training dataset documentation, model weights/code manifest, test data splits, label provenance. Without intake completeness criteria for AI, a case could advance with undocumented training data provenance or missing bias analyses.

Recommendation

Add acceptance checklist to REQ001: 'Intake pack must include: (a) training data provenance sheet (source, size, sensors/labels used); (b) model manifest (architecture, hyperparameters, freezing timestamp); (c) independent test dataset statement; (d) bias/fairness analysis scope.'

Contingency Logic Not Paired with Confidence Thresholds; AI vs. Operator Role Ambiguous

Pillar T · Analyst TrevorREQ011

majorconf 9#32

Analysis

REQ011: CAA flight-ops inspector wants independent termination, contingency matrix, ATSU coordination for off-nominal safety. Trust consumer: Flight-ops inspector + Operator. Gap: contingency matrix will list triggers (e.g., 'if C2 loss >60s, RTB'), but does not specify: (1) Who initiates each contingency—AI, operator, or both? (2) What confidence does the AI have that the contingency will succeed? (3) How is independent termination assured when swarm is distributed? (4) What does 'ATSU coordination' mean for an autonomous swarm—pre-loaded flight plans, or real-time handoff?

Recommendation

Contingency case must include: (1) Per-contingency responsibility assignment (AI vs. operator), (2) Confidence thresholds: e.g., 'RTB only if consensus confidence >90%, else emergency termination', (3) Independent hard-kill mechanism (firmware/hardware) verified independently of flight control, (4) ATSU pre-coordination script and swarm position replay for post-incident analysis.

Regulatory Apportionment Does Not Specify Trust Transfer Mechanism or AI Assurance Handoff

Pillar T · Analyst TrevorREQ015

majorconf 9#33

Analysis

REQ015: CAA/MAA liaison wants signed apportionment for gap-free regulatory ownership. Trust consumer: CAA and MAA (inter-regulator trust). Gap: Traditional apportionment (CAA airspace, MAA military ops) assumes static roles. AI-enabled swarm raises: (1) If MOD-provided AI model is used, who assures it—CAA or MOD? (2) Trust transfer: does CAA accept MOD assurance chain (e.g., DO-365 evidence from MOD lab), or require independent CAA review? (3) Classified evidence: how is AI assurance evidence shared under CAP 722H if it contains military ML insights?

Recommendation

Apportionment agreement must specify: (1) AI assurance responsibility: CAA or MOD? (2) Trust-transfer basis: which regulatory accreditations (AMLAS, SACE, DO-365) does CAA accept from MOD without independent re-review? (3) Classified-evidence protocol: redaction/summary rules for CAA review, (4) Incident-investigation responsibility: if swarm fails mid-trial, who owns the forensics?

Classified MOD Evidence Handoff to CAA Not Operationalised; Trust Asymmetry Unresolved

Pillar T · Analyst TrevorREQ018

majorconf 9#34

Analysis

REQ018: No user story addresses how classified MOD evidence (e.g., AI model development, adversarial testing) is presented to CAA under CAP 722H/JSP 440 without compromise. Trust consumer: CAA assessor + MOD (inter-agency trust). Gap: If the swarm AI was developed by MOD, CAA needs sufficient evidence to assure it—but MOD may resist declassification of model details. Trust asymmetry: MOD trusts its own assurance chain; CAA needs independent verification. CAP 722H allows 'sanitised' evidence, but leaves CAA unable to fully validate.

Recommendation

Security-handling protocol must specify: (1) Classification level of each evidence component (e.g., model architecture: SECRET, training data: UNCLASSIFIED), (2) Sanitisation rules: what can be redacted for CAA review without losing assurance value? (3) Trust bridge: independent auditor (e.g., QinetiQ) assures MOD evidence and vouches to CAA? (4) Incident-investigation protocol: if classified model is involved in incident, how does MOR data stay classified while allowing CAA investigation?

Intake checklist exists but evidence completeness criteria are absent

Pillar E · Analyst EvanREQ001

majorconf 9#35

Analysis

REQ001 assumes case officer will receive a 'complete pack' but the pack definition—what constitutes evidence sufficiency for each requirement—is undefined. No acceptance criteria for completeness (e.g., 'safety case present', 'ML model frozen', 'test reports signed'). This is a meta-gap: if case officer cannot verify completeness, later requirements cannot be audited.

Recommendation

Define an intake checklist with evidence artefacts required for each requirement (REQ002–REQ020). Each artefact must name the owner, sign-off role, and acceptance standard (e.g., 'AMLAS case by applicant', 'signed by CAA/MAA SQEP').

Requirement entirely absent; noise, species disturbance, and hazardous-material risk unaddressed

Pillar E · Analyst EvanREQ019

majorconf 9#36

Analysis

REQ019 explicitly notes absence of environmental-impact evidence: noise (especially dusk/night operations), species disturbance (protected bird/bat flight corridors?), and battery-fire hazard on ground. Environmental Impact Assessment (EIA) is statutory under Environmental Assessment (General) Regulations 2017 if trial impacts designated sites. Without evidence, trial may face legal challenge or halt.

Recommendation

Environmental assurance: (a) baseline noise/species survey at trial location, (b) modelling of trial noise impact (e.g., ISO 3744 for aircraft noise, cumulative 4–8 hours/day during sensitive hours), (c) protected-species consultation (if near SPA/SAC, or migration routes, ecological report required), (d) hazardous-material risk (battery fire on ground, fuel spill, debris fall into water/vegetation), (e) mitigation measures (flight hour restrictions, season restrictions, emergency response plan), (f) post-trial monitoring (noise complaints log, species observation if applicable), (g) CAA + Natural England / Scottish Natural Heritage sign-off before trial start.

Requirement entirely absent; lessons-learned, evidence preservation, and authorisation close-out unaddressed

Pillar E · Analyst EvanREQ020

majorconf 9#37

Analysis

REQ020 notes that post-trial governance is absent: no procedure for lessons-learned debrief, evidence archival, final safety case closure, or authority removal. Without this, trial evidence cannot feed into future small-swarm ops certification; CAA loses institutional knowledge; applicant may not demonstrate operational maturity post-trial.

Recommendation

Post-trial assurance plan: (a) mandatory lessons-learned review (CAA + applicant + MAA + crew, within 4 weeks of trial end), (b) final safety case update incorporating trial outcomes, (c) evidence preservation and archival (trial data, incident reports, root-cause analyses, third-party audit findings stored in CAA-accessible repository for ≥7 years), (d) authorisation close-out checklist (all REQ001–020 satisfied? residual risks closed? regulatory obligations discharged?), (e) publication of non-confidential trial report (for regulatory precedent and industry learning), (f) formal CAA/MAA sign-off authorising end of trial and removal of airspace/operational restrictions.

MAJOR: ML components assurable but post-incident forensic accountability chain incomplete

Pillar A · Analyst AriaREQ009

majorconf 9#38

Analysis

Requirement demands AMLAS/SACE case, frozen-model manifest, and explainability evidence. This is strong assurance governance. But: (1) 'Explainability' does not mandate who is accountable for explaining the model after an incident; (2) Frozen-model manifest does not require version-control custodianship or model-state recovery capability; (3) No requirement states who holds the model provenance record or can testify to model correctness post-failure. Accountability gap: assurance exists for pre-trial, but forensic accountability (reconstructing what the ML did, why, and when it failed) is unaddressed.

Recommendation

Extend REQ009: '(a) Supplier must maintain version-controlled model registry with explainability artifacts (e.g., SHAP, attention maps) and commit hashes for all deployed versions; (b) Operator must log which model version was active during trial; (c) CAA must retain model snapshots for incident investigation; (d) Post-incident, a named ML Forensics Officer (CAA or delegated) is accountable for model-state reconstruction and explanation.' Add mandatory: 'Model explainability logs must be recorded at runtime and cross-referenced with incident timeline.'

MAJOR: Workload/duty/take-over evidenced; meaningful human control keystone but no veto-authority guarantee

Pillar A · Analyst AriaREQ010

majorconf 9#39

Analysis

Requirement is strong: 'meaningful human control demonstrated' is the right threshold. Evidence of workload, duty, take-over timing is required. However: (1) 'Meaningful human control' is defined in the User Story but not operationalised—is it 'human in the loop' (can slow decisions), 'on the loop' (can override after AI acts), or pre-decision approval? (2) No guarantee that crew has authority to override all AI-informed decisions immediately without technical barriers; (3) No human-factors test of automation-bias risk (will crew actually override when needed, or rubber-stamp AI decisions under workload?). Assurance is present but incomplete.

Recommendation

Operationalise REQ010: '(a) Define meaningful human control as 'pre-decision veto authority'—human approval required before AI-informed swarm actions execute, except link-loss autonomy (which is pre-approved); (b) CAA human-factors inspector must test crew take-over timing under high-workload scenarios; (c) Automation-bias test mandate: simulate AI failure modes and measure crew detection/override rate; (d) Crew fitness assessment must include swarm-specific decision load (e.g., coordinating multi-aircraft de-confliction judgements).'

NO USER STORY: noise/environmental accountability unaddressed

Pillar A · Analyst AriaREQ019

majorconf 9#40

Analysis

Requirement correctly notes: no user story addresses noise, species disturbance, or battery-fire environmental risk. These are environmental law obligations (Environmental Impact Assessment Regulations, Wildlife and Countryside Act for dusk/night operations, Environmental Permitting for hazardous waste). Accountability gaps: (1) No named role for environmental compliance; (2) No EA (Environmental Assessment) or SEA (Strategic Environmental Assessment) required despite swarm being novel operation; (3) Dusk/night operations (when species are most vocal) may trigger Ecological Impact Assessment under CAP 1616; (4) Battery fire risk on ground is hazardous waste—who is accountable for containment/cleanup?; (5) If trial causes ecological damage (e.g., bird disturbance nesting season), who is liable?. This is unaddressed and creates reputational/legal risk.

Recommendation

Add REQ019 enforcement: 'Named role: Environmental Compliance Officer (operator or CAA designate). Pre-trial mandatory: (a) Environmental Scoping Report produced (noise, species impact, waste hazards); (b) If dusk/night ops proposed, Ecological Impact Assessment required; (c) Battery hazard plan: storage, spill containment, disposal custodian named; (d) Noise Impact Assessment for ground crew and local residents; (e) Post-trial, environmental monitoring data (sound levels, wildlife observations) collected and reported to CAA. Operator is accountable for environmental compliance; CAA inspector audits evidence pre-trial.'

NO USER STORY: post-trial governance, lessons-learned, authorisation close-out unaddressed

Pillar A · Analyst AriaREQ020

majorconf 9#41

Analysis

Requirement correctly notes: no user story covers post-trial lessons-learned, evidence preservation, or authorisation close-out. This is a critical accountability gap. Post-trial, accountability continues: (1) Who is accountable for lessons-learned synthesis and publication?; (2) Data preservation—what happens to trial data after authorisation ends? (3) Investigability—if post-trial incident occurs (e.g., aircraft discovered to be unsafe), can CAA re-open case and retrieve trial evidence?; (4) Authorisation close-out: does CAA publish findings or does trial end with no public record?; (5) Supplier liability: post-trial, what is supplier's obligation to monitor fleet-wide issues discovered in trial?; (6) If trial revealed AI model inadequacy, who is accountable for communicating that risk to other operators using the same model?. No accountability structure for post-trial phase.

Recommendation

Add REQ020 enforcement: 'Named role: Post-Trial Assurance Officer (CAA). Pre-trial, trial closure protocol must be agreed: (a) Data stewardship: operator retains all trial data for 7 years; CAA retains safety findings indefinitely; (b) Lessons-learned synthesis: within 6 months post-trial, CAA publishes sanitised findings (excluding classified material); (c) Supplier notification: if trial identified model inadequacy or safety issue, supplier must issue advisory to all fleet operators; (d) Authorisation closure: CAA formally closes authorisation and notifies stakeholders; (e) Persistent risk: if post-trial analysis uncovers unresolved risk, CAA has authority to re-open investigation. Authorisation close-out cannot occur until lessons-learned are documented.'

SORA data: swarm-specific ODD completeness not linked to ML training distribution

Pillar D · Analyst DanaREQ002

majorconf 8#42

Analysis

SORA 2.0 defines an Operational Design Domain (ODD), but REQ002 does not require evidence that swarm ML models (coordination, DAA) were trained on data representative of that ODD. Risk: swarm detection or coordination models trained on single-aircraft data will fail on multi-agent scenarios. No requirement that SORA ODD and ML training distribution be cross-validated.

Recommendation

Extend REQ002: 'SORA evidence must include: statement that each ML component's training data covers the claimed ODD (weather, airspace, encounter types, swarm sizes), with gaps and fallbacks documented.'

Geofence AI: no robustness requirement against GPS spoofing or ambiguous position data

Pillar D · Analyst DanaREQ005

majorconf 8#43

Analysis

REQ005 mandates automatic containment but does not specify how the geofence AI will handle incomplete, noisy, or adversarial position data. If GPS signal is weak or spoofed, or position estimates are inconsistent across the swarm, the containment decision-making could fail. No requirement for sensor fusion validation, cross-check logic, or graceful fallback.

Recommendation

Require: 'Geofence enforcement must verify GPS freshness and consistency across swarm before containment decisions. Acceptance criteria: (a) detect GPS spoofing/anomalies within X seconds; (b) fall back to dead-reckoning if GPS becomes unavailable; (c) test performance in low-signal conditions (urban canyon, RF interference).'

GDPR-relevant data (public notification, landowner info) not governed; privacy by design missing

Pillar D · Analyst DanaREQ017

majorconf 8#44

Analysis

REQ017 explicitly notes 'Strongly implied by CAP 1616 and GDPR but absent.' Public engagement will involve collecting personal data (names, contact info for affected landowners/communities). No requirement for: (a) GDPR legal basis (consent, legitimate interest); (b) data minimisation (what PII is necessary); (c) retention and deletion policy; (d) privacy-impact assessment. Risk: regulatory rejection, enforcement action, reputational damage.

Recommendation

Add REQ017: 'Public and stakeholder engagement must include: (a) DPIA supplement for engagement data (what PII, legal basis, retention, deletion); (b) consent/contact mechanism (opt-in preferred); (c) transparent communication about trial (risk, data use, privacy); (d) process for landowner objections; (e) data log of all external data collection and disposition.'

SORA Evidence Does Not Explicitly Quantify Confidence in Swarm-Aware Models

Pillar T · Analyst TrevorREQ002

majorconf 8#45

Analysis

REQ002 asks CAA SORA assessor to set SAIL/OSO defensibly on swarm-aware evidence. Trust consumer: SORA assessor (regulatory authority). Basis: Guidance does not specify: (1) whether swarm coordination is deterministic or probabilistic, (2) how consensus failures map to SORA risk levels, (3) what confidence the SORA assessor should have in model generalisation beyond the trial airspace. SORA framework assumes single-vehicle decisions; swarm adds emergent complexity. No mention of uncertainty bounds on multi-vehicle interactions.

Recommendation

Augment SORA submission with: (a) swarm-coordination confidence intervals (e.g., 'consensus achieved >99% in n=10k Monte Carlo trials'), (b) sensitivity analysis on ODD boundary violations, (c) explicit statement of OSO assumptions (e.g., 'assumes <10 concurrent non-cooperative encounters').

ConOps ODD Nominal/Worst-Case Descriptions Lack AI Decision Thresholds

Pillar T · Analyst TrevorREQ003

majorconf 8#46

Analysis

REQ003: CAA safety-case reviewer needs ConOps & ODD to judge nominal and worst-credible behaviour. Trust consumer: Safety reviewer (regulatory authority). ConOps will describe nominal swarm formation and contingency transitions, but the finding: ConOps typically does not specify at what confidence thresholds the AI commits to decisions (e.g., 'swarm relocates only if consensus confidence >95%'). Worst-credible case may not address graceful degradation when AI confidence drops below safe operating envelope.

Recommendation

ConOps must include: (1) AI decision confidence thresholds for each swarm maneuver, (2) graceful degradation logic when confidence falls below threshold (e.g., 'return-to-base if consensus <85%'), (3) operator visibility into current AI confidence state.

TDA Consultation Does Not Address AI-Driven Airspace Violations

Pillar T · Analyst TrevorREQ004

majorconf 8#47

Analysis

REQ004: CAA airspace regulator needs TDA & consultation package to segregate trial from other users. Trust consumer: Airspace coordinator. TDA boundary is typically static. Trust gap: if the AI (e.g., swarm avoidance logic) dynamically rebalances positions near the TDA, consultation stakeholders need to understand: (1) what boundary excursions are tolerable, (2) what confidence the AI has in staying within TDA, (3) how failures trigger handoff to external de-confliction (DAA/ADS-B relays).

Recommendation

TDA submission must include: (1) AI-driven boundary-crossing risk model (e.g., 'swarm position error 3-sigma bound <100m'), (2) coordination protocol with Airspace Change Proposal (ACP) trigger points, (3) operator workload to monitor and correct boundary incursions.

C2 Link Loss Survivability Not Calibrated to AI Decision Latency

Pillar T · Analyst TrevorREQ006

majorconf 8#48

Analysis

REQ006: CAA spectrum specialist wants monitored C2 and inter-vehicle links to survive link loss. Trust consumer: Spectrum specialist + Operator. Gap: if swarm AI is onboard and makes decisions autonomously, trust depends on: (1) operator confidence in pre-loaded contingency logic, (2) C2 link loss duration is shorter than AI decision horizon, or (3) swarm coordination continues on inter-vehicle mesh during C2 blackout. No mention of how operator trusts the swarm's self-coordination when C2 link is down.

Recommendation

C2 assurance case must address: (1) maximum tolerable C2 loss duration before swarm enters safe-hold mode, (2) operator visibility into C2 link status and swarm autonomy mode, (3) inter-vehicle mesh protocol confidence bounds (e.g., 'mesh coordinates with >98% accuracy for <30s blackout').

Insurance Does Not Address AI Failure Modes; Ground Survey Baseline Not Linked to Risk Monitoring

Pillar T · Analyst TrevorREQ013

majorconf 8#49

Analysis

REQ013: CAA authorisation officer wants insurance + ground survey for third-party risk transferability and factual grounding. Trust consumer: Insurance underwriter + Risk authority. Gap: Insurance policy typically excludes autonomous AI failures (manufacturer limits). Ground survey establishes population/property baseline, but does not link to swarm AI risk: (1) If swarm goes rogue due to AI bug, is it covered? (2) If ground survey assumed static population but real-time monitoring shows new structures, how does risk tier change?

Recommendation

Insurance submission must address: (1) AI failure carve-outs: list which AI-failure modes are covered vs. excluded, (2) Per-engagement risk tier linked to real-time ground monitoring (e.g., 'if new structures detected, mission suspended pending re-survey'), (3) Third-party notification protocol (pre-flight SMS to registered buildings).

Crew Competency Does Not Include AI-Mode Failure Scenarios or Swarm Choreography

Pillar T · Analyst TrevorREQ014

majorconf 8#50

Analysis

REQ014: CAA licensing inspector wants evidenced crew competency for qualified operation. Trust consumer: Licensing inspector. Gap: Traditional UAS pilot training covers manual flight, recovery from link loss, emergency procedures. Swarm AI adds novel modes: (1) Consensus failure (e.g., 'swarm splits into two competing consensus groups'), (2) Model rollback (e.g., 'ML model reverted to previous version, behaviour may differ'), (3) Swarm choreography (formation changes, waypoint sequencing). Crew must trust and understand these failure modes.

Recommendation

Training syllabus must include: (1) Swarm consensus mechanics and failure modes (scenario-based), (2) AI confidence-state interpretation (what does 'confidence 63%' mean?), (3) Manual recovery from consensus failure (return-to-base override tests), (4) Model rollback procedures and expected behaviour changes.

Phased Gates Do Not Define AI-Specific Readiness Criteria; Reversibility Not Testable

Pillar T · Analyst TrevorREQ016

majorconf 8#51

Analysis

REQ016: CAA programme manager wants phased gates for progressive evidence build-up and reversibility. Trust consumer: Programme authority. Gap: Phased gates (e.g., 'single-vehicle hover, then 2-vehicle formation, then 10-vehicle swarm') are standard. But AI trust requires additional gates: (1) When does the swarm ML model graduate from simulation-only to flight-tested? (2) At what confidence threshold is the model deemed 'trusted' for next phase? (3) 'Reversibility' means: if phase N fails, do we revert to phase N-1 software, or hold the ML model and revert only the swarm size? Decision-reversal triggers are not specified.

Recommendation

Phased trial plan must include: (1) AI-specific gates: per-phase ML model validation (e.g., 'phase 2 gate: consensus confidence >95% in 100h of sim + 10h of 2-vehicle flight'), (2) Reversal triggers: objective metrics (e.g., 'if consensus drops <80% for >3 consecutive maneuvers, revert to phase N-1 + re-validate model'), (3) Model freeze points: at what phase is the ML model frozen for flight ops (no more online learning)?

Environmental Risk Not Addressed; Noise/Wildlife Baseline and Monitoring Plan Missing

Pillar T · Analyst TrevorREQ019

majorconf 8#52

Analysis

REQ019: No user story covers noise, species disturbance (dusk/night ops), or battery-fire environmental risk. Trust consumer: Environmental authority + affected community. Gap: swarms are noisier than single aircraft due to multi-rotor multi-agent acoustics. Night operations may disrupt nocturnal species. Battery fires pose ignition risk. None of this is addressed in the pack. This is institutional trust gap (environmental stewardship) overlaid on public trust (REQ017).

Recommendation

Environmental assessment case must include: (1) Baseline noise survey (dB profile by location/time), (2) Pre-flight wildlife survey (nesting, migration), (3) Swarm noise model (projected dB vs. single aircraft), (4) Mitigation: flight-altitude limits, time-window restrictions, observer on-site, (5) Post-flight environmental monitoring plan and incident reporting.

Post-Trial Lessons-Learned and Evidence Preservation Not Specified; Trust Cannot Transfer to Future Trials

Pillar T · Analyst TrevorREQ020

majorconf 8#53

Analysis

REQ020: No user story covers post-trial lessons-learned, evidence preservation, or authorisation close-out. Trust consumer: CAA (institutional learning) + Future trial planners + Regulatory precedent. Gap: after trial concludes, swarm data (MOR, model logs, incident reports, crew feedback) must be preserved for: (1) Investigator root-cause analysis, (2) Regulatory learning (what worked, what failed?), (3) Trust transfer: can future trials cite this precedent? Without a close-out and lessons-learned protocol, the trial becomes an isolated event with no institutional impact.

Recommendation

Close-out case must include: (1) MOR data archival plan: who holds the data, for how long, under what access controls? (2) Lessons-learned report: per-phase summary of evidence gaps, model failures, crew feedback, (3) Regulatory precedent: formal CAA-MOD debrief documenting which assurance artifacts (AMLAS case, model cards, MOR schema) were most valuable for future trials, (4) Trust-transfer statement: explicit clearance for future trials to reference this precedent.

No AI-specific completeness criteria for intake

Pillar F · Analyst FionaREQ001

majorconf 8#54

Analysis

Intake form checks for complete ConOps, SORA, C2 evidence, but lacks AI-specific submission gates: ML model manifest freeze, DAA test data traceability, swarm simulator logs, failure-mode registry. An incomplete application absent forensic-explainability evidence could advance to assessment without detection. High-confidence-wrong events (e.g. DAA false-negative collision) would then lack audit trail for investigation. Intake is the first line of defence against latent AI risks.

Recommendation

Add AI-specific intake checklist: (1) frozen ML model manifest with training data hash, (2) DAA test coverage map (ODD scenarios), (3) ML failure-mode register (false +/-ve thresholds, mitigation), (4) swarm simulator scenario library, (5) forensic audit-trail design (inputs, confidences, decisions logged per flight). Reject intake if any missing.

SORA swarm-aware but lacks AI failure-consequence mapping

Pillar F · Analyst FionaREQ002

majorconf 8#55

Analysis

SORA 2.0 swarm extension covers coordination, cooperative intent, but assumes correct AI decisions. No mapping of: false-negative DAA (swarm misses intruder, collision), false-positive (swarm false-scatters, emergency maneuver injures), high-confidence-wrong (swarm confident in bad geofence interpretation). SAIL and OSO are thus defensive only against nominal failures, not against degraded-AI or distributed-sensor-failure modes. Assessor lacks failure-consequence severity language.

Recommendation

Extend SORA to include AI-failure row: map each DAA/swarm-decision failure mode to SAIL/OSO impact (e.g. false-neg DAA = mid-air collision = SAIL reduce airspace, OSO require continuous radar overlay). Assign pilot/sensor/AI-assurance mitigation credit only if forensic explainability and runtime detection are contractual.

TDA airspace segregation does not protect against AI-misdirected swarm

Pillar F · Analyst FionaREQ004

majorconf 8#56

Analysis

TDA achieves segregation from manned traffic, but cannot defend against geofence malfunction or swarm-decision software bug that drives entire fleet out of bounds. Example: swarm-coordinator interprets false GPS anomaly as 'we have drifted east, correct west' and all aircraft push westward into populated zone. TDA airspace alone is not containment.

Recommendation

TDA approval contingent on independent low-level geofence enforcement (REQ005 critical). TDA document must include: (1) swarm failure-response procedure (if any aircraft exits TDA boundary, force entire swarm descent to 50ft AGL and hover), (2) radar plot verification during trial setup (confirm GPS-derived position matches ground radar), (3) ATSU alert protocol if swarm detected approaching boundary (ATSU triggers ground-command geofence override).

C2 link-loss survival relies on AI fallback with no proven robustness

Pillar F · Analyst FionaREQ006

majorconf 8#57

Analysis

Requirement: 'link loss is survivable.' Survivability relies on: onboard swarm decision-maker maintains formation and executes last-known mission, DAA avoids intruders autonomously, geofence enforces containment. But no test mandate: what happens if link loss occurs during swarm reconfiguration? DAA confidence drops below threshold? GPS/IMU fusion drifts? All three failure modes plus link loss = CFIT risk. No specification of link-loss timer (how long before swarm auto-descends?) or fallback-to-RC protocol.

Recommendation

C2 assurance must include: (1) link-loss scenario matrix (loss during ascent, cruise, descent, coordinated maneuver), (2) onboard fallback FSM (state machine for each scenario, including timeout to auto-descent if link not restored in 60s), (3) field test: deliberately drop C2 link at 10 points in mission profile, verify swarm behaviour matches fallback FSM, (4) pilot training on manual recovery from link loss, (5) post-flight log analysis to detect confidence-degradation signs during link-loss.

Crew take-over authority not independent from autonomy; override assurance weak

Pillar F · Analyst FionaREQ010

majorconf 8#58

Analysis

Requirement asks for 'workload, duty and take-over evidence.' But no specification: (1) take-over time (how long for crew to recognize AI failure and manually recover?), (2) take-over authority (when crew overrides AI, is override guaranteed to execute? Can AI fight back?), (3) false confidence trap (crew trusts AI output without independent verification, misses contradiction between AI and sensor reality), (4) task saturation (during coordinated swarm maneuver, can crew monitor 8 DAA channels + swarm consensus + geofence simultaneously?), (5) training on disagreement protocol (what is crew trained to do when AI-recommended action contradicts crew intuition?). Workload studies are passive; do not enforce operator limits.

Recommendation

Human-oversight assurance must include: (1) Takeover protocol: explicit decision tree (if crew commands descent, does AI obey or second-guess?), onboard control authority matrix (crew can always assert: emergency descent, link-loss fallback, single-aircraft isolation), (2) Override testing: field validation (crew commands 10 emergency maneuvers under representative workload, measure response time, verify AI releases control within <500ms), (3) Contradiction detection training: scenario-based sim before trial (crew practices detecting/resolving AI-sensor contradictions, e.g DAA clear but visual intruder, radar plot shows conflict), (4) Workload cap: empirical measurement during trial build-up (log crew inputs, response latencies, missed alarms), gate expansion if workload metric exceeds threshold, (5) Crew briefing on AI limitations: documented examples of DAA false scenarios (sensor glint, cooperative aircraft without IFF, weather radar clutter), so crew expects and recognizes anomalies.

Crew training absent AI-failure response and edge-case decision scenarios

Pillar F · Analyst FionaREQ014

majorconf 8#59

Analysis

Requirement: 'evidenced competency so trial is flown by qualified crew.' Standard training covers: airspace rules, swarm formation, C2 procedures, emergency descent. But no mandate: (1) AI-failure scenario training (what does crew do if DAA outputs 'clear' but crew visually sees intruder?), (2) edge-case decision authority (if swarm consensus is tied/deadlocked, who breaks tie: onboard logic or crew manual override?), (3) override-and-recovery (crew overrides AI descent command with climb; how does crew ensure AI accepts override and relinquishes control?), (4) scenario drills (simulated DAA false-negative mid-swarm maneuver, crew must detect and recover within 15 seconds). Competency sign-off is binary; does not confirm crew readiness for AI-failure modes.

Recommendation

Crew training curriculum expansion: (1) AI-failure scenario module: 5-10 scripted scenarios (DAA false-negative, swarm consensus deadlock, geofence malfunction, C2 link loss + DAA distrust), crew response measured (detection time, decision quality, manual recovery time), gate if all scenarios not successfully completed, (2) Override protocol drills: crew trains on forcing manual control in each flight phase (hover, cruise, coordinated maneuver), verifies AI release <500ms, practices recovery from AI-resisting-override scenario, (3) Contradiction detection: crew briefed on scenarios where AI output contradicts sensor (radar conflict, IFF dropout, GPS outlier), trained to recognize and escalate, (4) Competency re-certification post-incident: if crew makes misjudgment during trial (e.g trusts false DAA output), mandatory retraining + sim check before resumption. Certification doc must detail AI failure scenarios covered.

ConOps/ODD structure present; edge-of-envelope and failure scenarios underspecified

Pillar E · Analyst EvanREQ003

majorconf 8#60

Analysis

REQ003 asks for 'nominal and worst-credible behaviour' but does not define ODD boundaries quantitatively (e.g., 'wind speed <15 m/s, visibility >1 km, temperature -10 to +40°C') or list off-nominal scenarios (e.g., single-swarm-member GPS denial, partial C2 link degradation, non-cooperative intruder at edge of geofence). Without explicit ODD and failure-mode scenarios, V&V test design cannot be scoped; risk assessment cannot validate 'worst-credible' claims.

Recommendation

ConOps must include: (a) ODD boundary table (all continuous and discrete operating parameters), (b) off-nominal tier (degraded sensors, partial link loss, single-aircraft failure), (c) failure-mode FMEA linked to V&V test cases, (d) swarm reconfiguration strategy (e.g., can 3-ship formation continue as 2-ship?), (e) evidence artefacts (FMEA, test design, simulation report).

TDA and coordination package scope unclear; no airspace segregation acceptance criteria

Pillar E · Analyst EvanREQ004

majorconf 8#61

Analysis

REQ004 nominally satisfied by TDA notice and airspace coordination with ATSU. However, no evidence of residual risk assessment: what air traffic will still be in TDA airspace (e.g., gliders, helicopters, transiting manned)? What separation standards are enforced (procedural, electronic)? How is swarm boundary maintained (geofence + ADS-B broadcast + TDA lateral/vertical limits)? CAA must verify that segregation is 'fit-for-purpose' — this requires quantified residual-traffic risk and separation assurance plan.

Recommendation

TDA coordination package must include: (a) residual air-traffic risk assessment (known VFR, glider, commercial paths in TDA), (b) separation enforcement architecture (geofence tolerances, ADS-B publishing, TDA radar/ATSU monitoring), (c) swarm boundary validation test (e.g., 'aircraft constrained to ±X m lateral, ±Y ft altitude in GPS-denied scenario'), (d) contingency link-loss egress plan, (e) ATSU escalation thresholds.

Containment function specified; no independent validation or failure-mode linkage

Pillar E · Analyst EvanREQ005

majorconf 8#62

Analysis

REQ005 requires 'automatic containment' but does not specify: (a) containment mechanism (airborne geo-fence, ground-based override, both?), (b) accuracy tolerance (how much breach is tolerable before failsafe triggers?), (c) failure modes (GPS spoofing, GCS command injection, GNSS outage), (d) test plan (flying to geofence edge in multiple wind/sensor conditions), (e) independent verification (third-party audit of containment code, swarm-interaction effects). Without these, REQ005 is a design intent, not assurance.

Recommendation

Geofence assurance case must include: (a) architectural description and failsafe design, (b) acceptance criteria (breach tolerance ≤X m, recovery time ≤T s), (c) unit + integration test report (signed), (d) swarm-specific test: all N aircraft breach-tested concurrently; no cascading failure, (e) failure-mode matrix (GPS loss → hardcoded safe altitude egress; GCS comms loss → pre-loaded fence holds), (f) SQEP independent review.

C2 link & spectrum requirements stated; link-loss survivability evidence absent

Pillar E · Analyst EvanREQ006

majorconf 8#63

Analysis

REQ006 requires 'monitored C2 and inter-vehicle links' but does not quantify: (a) link latency (<100 ms? <1 s?), (b) link reliability (bit-error rate, packet loss tolerance), (c) loss-of-link survivability (how long can swarm fly autonomously? does swarm maintain formation or disperse?), (d) spectrum interference risk (shared band test report), (e) swarm-level link redundancy (single link vs. multi-path). 'Non-interfering' is assumed but not demonstrated.

Recommendation

C2 assurance case: (a) link specification document (latency, PER, bandwidth per aircraft), (b) loss-of-link holding test (N aircraft, T-second GNSS-only flight, zero separation violation), (c) spectrum clearance and interference survey (OFCOM certification + applicant test report), (d) swarm-link interdependencies (e.g., can one aircraft's link loss trigger cascading failure? test evidence required), (e) independent RF and systems validation.

Meaningful human control declared; workload assessment and takeover readiness unquantified

Pillar E · Analyst EvanREQ010

majorconf 8#64

Analysis

REQ010 requires 'meaningful human control' and 'take-over evidence' but lacks: (a) definition of 'meaningful' (supervisory monitoring only? loop closure required? decision veto capability?), (b) workload assessment (HMI complexity, situational awareness metrics, fatigue risk during phased trial), (c) takeover latency requirement (<T seconds? trained, emergency-only?), (d) test evidence (crew-in-loop simulation, takeover drills under nominal and failure conditions), (e) competency criteria (pilot hours, swarm-ops specific training sign-off). 'Meaningful' without metrics is unverifiable.

Recommendation

Human factors assurance: (a) operational concept for crew role (supervisory, monitoring frequency, veto authority), (b) HMI design with usability testing (≥X subjects, task success rate ≥Y%), (c) workload study (NASA-TLX or equivalent during all phases), (d) takeover test matrix (nominal, one-member loss, link degradation, intruder alert; takeover latency ≤T s for each), (e) crew training plan with competency sign-off by CAA HF specialist, (f) continuous workload monitoring during trial with red-line thresholds for pausing operations.

Flight termination and contingency concepts stated; acceptance criteria and test evidence absent

Pillar E · Analyst EvanREQ011

majorconf 8#65

Analysis

REQ011 asks for 'independent termination, contingency matrix, and ATSU coordination' but does not specify: (a) termination modes (parachute per aircraft? thrust-vector cutoff? glide-to-landing?), (b) trigger criteria (loss of swarm coherence? single-member failure? crew command?), (c) acceptance criteria (all N aircraft ground-safe within T seconds, separation ≥X m during descent), (d) contingency matrix link to ODD/ConOps (which failure modes trigger which contingency?), (e) test evidence (live drop tests, simulated terminations, recovery success rate metrics). Contingency is treated as generic procedure, not swarm-specific assurance.

Recommendation

Contingency assurance: (a) detailed contingency decision tree (trigger → action → outcome for each ODD scenario), (b) flight termination design per aircraft + swarm behaviour (do all terminate simultaneously? coordinated sequence?), (c) acceptance criteria (descent rate <X ft/s, landing dispersal ≤Y m, zero secondary hazards), (d) test report: simulation covering contingency in ≥50 ODD scenarios, (e) live trials in phased gates (virtual-only first, shadow BVLOS later), (f) ATSU coordination protocol with worked example, (g) independent review by CAA flight ops specialist.

MOR/DPIA stated; failure-mode forensics linkage and data-retention evidence absent

Pillar E · Analyst EvanREQ012

majorconf 8#66

Analysis

REQ012 requires 'complete recorded evidence' and 'MOR obligations' but does not specify: (a) what data is recorded (flight telemetry, ML model decisions, C2 comms, inter-swarm messages, all frames/images? sampling rate?), (b) recording artefacts per ICAO-like data standard (e.g., do requirements reference DO-358 flight-data standards?), (c) failure-mode traceability (if incident occurs, can investigator link data → decision → outcome?), (d) retention periods (how long post-trial?), (e) DPIA scope and evidence of compliance (subject consent? anonymization? third-party sharing controls?). MOR/DPIA are statutory; omitting evidence standards creates investigability risk.

Recommendation

Data assurance case: (a) detailed data dictionary (telemetry, ML inputs/outputs, decision logs, comms records, resolution, frequency), (b) recording architecture diagram with redundancy (on-board + ground + cloud backup), (c) failure-mode → data-requirement matrix (e.g., 'if collision risk suspected, require image buffer ± 30 s + ML confidence traces'), (d) data retention and governance policy (post-trial archive duration, deletion criteria), (e) DPIA compliance report signed by DPO, (f) incident investigation protocol with test evidence (e.g., mock incident investigation on trial data sample), (g) independent audit by CAA data specialist.

Competency evidence required; training content, assessment criteria, and sign-off authority absent

Pillar E · Analyst EvanREQ014

majorconf 8#67

Analysis

REQ014 asks for 'evidenced competency' but does not define: (a) training curriculum (swarm ops specific? formation flying? loss-of-link recovery? simulation hours required?), (b) assessment criteria (practical test pass marks? simulator performance thresholds?), (c) sign-off authority (operator, CAA, independent?), (d) competency maintenance (refresher cycles during phased trial?). Without these, competency is subjective; CAA cannot verify crew are prepared for trial demands.

Recommendation

Crew assurance: (a) training syllabus covering swarm ops, HMI, contingencies, loss-of-link, emergency takeover (simulator + theory), (b) assessment plan with practical and knowledge tests, (c) signing-off process by CAA-approved examiner or operator PIC with CAA oversight, (d) competency records (dates, assessor, scores) retained post-trial, (e) continuous monitoring during phased gates (pass/fail per phase), (f) supplementary training triggered by performance issues or trial phase advancement.

Spectrum specialist accountable; link-loss survival not traced to human override authority

Pillar A · Analyst AriaREQ006

majorconf 8#68

Analysis

User story names 'CAA spectrum specialist' as accountable for C2 assurance. 'Link loss is survivable' implies swarm has programmed autonomy to act without C2 during link-loss. But who is accountable if autonomous link-loss behaviour is unintended or harmful? Requirement does not trace accountability for the autonomous action itself—only for C2 link design. The AI/firmware making autonomous decisions in link-loss state has no named human duty-holder.

Recommendation

Add: 'For each autonomous link-loss behaviour, name the human accountable: (a) Supplier designs autonomy; (b) Operator approves autonomy limits pre-trial; (c) CAA certifies autonomy is safe. Post-incident, operator is accountable for decision to deploy that autonomy.'

DAA assessor accountable for performance; AI decision logic not separated from evidence

Pillar A · Analyst AriaREQ007

majorconf 8#69

Analysis

User story names 'CAA DAA assessor' as accountable for DO-365 alignment. But swarm de-confliction is AI-intensive (collision-avoidance algorithms, cooperative path-planning). Requirement lumps 'evidence' and 'performance' without separating: (1) sensor/detection evidence, (2) AI decision logic, (3) avoidance action. If avoidance fails, which AI layer is accountable? Assessor signs evidence, but who is accountable for AI's avoidance recommendation credibility?

Recommendation

Refactor REQ007 to separate: (a) Detection performance (sensor evidence); (b) AI de-confliction logic (frozen model registry, explainability); (c) Avoidance action authority (human override or AI-autonomous?). Name accountable role for each.

CAA programme manager accountable for phasing gates; gate decision authority undefined

Pillar A · Analyst AriaREQ016

majorconf 8#70

Analysis

Requirement names 'CAA programme manager' as accountable for phased gates. 'Evidence builds progressively and is reversible' implies gates have go/no-go authority. But requirement does not specify: (1) Who decides gate go/no-go (programme manager, safety committee, CAA executive)?; (2) What evidence triggers gate failure—is it objective thresholds or subjective judgment?; (3) If AI is used to assess gate readiness (e.g., ML model predicts trial readiness), who is accountable if AI assessment is wrong?; (4) Reversibility—if Phase 2 fails, can trial revert to Phase 1 or is Phase 1 data invalidated? Accountability for gate decisions is unclear.

Recommendation

Extend REQ016: 'Each gate decision shall be made by a named CAA Safety Committee (programme manager + lead inspector + legal/risk officer). Gate criteria must be objective and measurable. If AI assists gate assessment, it must be explainable; final gate decision is human-exclusive. Gate reversion policy must be pre-trial defined and signed. Failure to meet gate criteria triggers review; if review is inconclusive, trial is paused pending additional evidence.'

C2/spectrum: no data integrity assurance for swarm comms; bit-error rate and latency not constrained

Pillar D · Analyst DanaREQ006

majorconf 7#71

Analysis

REQ006 requires 'monitored C2 and inter-vehicle links' but does not mandate integrity checks on the data flowing over those links. If swarm coordination messages are corrupted (bit errors, reordering), or if latency exceeds the coordination algorithm's assumptions, the swarm ML model will operate on corrupted/stale state data, leading to unpredictable collective behaviour.

Recommendation

Add to REQ006: 'C2/inter-vehicle links must carry: message timestamps, sequence numbers, checksums/CRC. Acceptance criteria: bit-error rate <1e-6, latency jitter <50ms, lost-message rate detectable and logged. Swarm coordination must include data-freshness checks.'

Classified data and IP handling: no protocol for presenting MOD evidence to CAA without compromise

Pillar D · Analyst DanaREQ018

majorconf 7#72

Analysis

REQ018 identifies a genuine gap: MOD may contribute classified or IP-sensitive evidence (sensor specs, algorithm details, training data sources). No mechanism defined for: (a) sanitizing evidence for CAA review (removing classified details); (b) compartmentalising access (who sees what); (c) independent T&E reproducibility without leaking classified data. This blocks third-party assurance and incident investigation.

Recommendation

Add REQ018: 'Security classification protocol must define: (a) classified evidence assessment (what cannot be disclosed, why); (b) open-equivalent substitute (can unclassified proxy data be provided); (c) CAA-only access protocols for classified evidence; (d) third-party T&E on open-equivalent data, with CAA verification on classified subset; (e) MOD/CAA/Industry data-sharing agreement (who, what, when, how).'

Data recording and MOR obligations present; AI forensic data not foregrounded

Pillar F · Analyst FionaREQ012

majorconf 7#73

Analysis

Requirement addresses 'complete recorded evidence, MOR obligations, DPIA.' Data recording is typical: flight logs (position, altitude, attitude), C2 messages, radar plots, ADS-B. But no specification: (1) AI-forensic data (DAA decision logs, swarm consensus votes, geofence checks, ML confidence scores), (2) retention policy (how long are logs kept? Who has access?), (3) post-incident analysis SLA (within how many hours must incident investigator receive forensic extract?), (4) DPIA scope (are crew personal performance metrics (reaction time, override rate) subject to DPIA?). MOR process is administrative; does not enforce model-retraining governance or incident-to-learning loop.

Recommendation

Data recording and MOR expansion: (1) AI-forensic data mandate: all AI subsystem logs (DAA, swarm, geofence, ML confidence scores) recorded at 10 Hz minimum, dual-redundant storage, (2) Retention policy: all logs retained for 2 years post-incident (or 5 years if no incident), encrypted, access controlled via MOU between operator, CAA, insurance, (3) DPIA: explicitly scope crew performance data (reaction time, override rate) as potentially sensitive, anonymize before sharing with third parties, (4) Post-incident analysis protocol: incident investigator has SLA access to forensic extract within 12 hours, forensic report due within 1 week, includes root-cause (was it AI failure, operator error, sensor glitch, systematic issue?), learning (how to retrain/redesign to prevent recurrence?), (5) MOR closure loop: post-incident finding feeds back to REQ009 ML retraining gate; cannot resume until closure confirmed.

Insurance adequacy unclear for AI-failure scenarios

Pillar F · Analyst FionaREQ013

majorconf 7#74

Analysis

Requirement: 'insurance and ground survey so third-party risk is transferable and grounded.' Ground survey establishes baseline: structures, people, impact zones. But insurance must cover AI-failure consequences (mid-air collision with manned aircraft, fly-away into populated area, cascading swarm-collision fatality). Standard UAS liability policies often exclude 'loss of control' or 'software failure' scenarios. No specification: (1) insured loss limits per scenario (collision = GBP X, fly-away = GBP Y?), (2) insurance underwriter review of ML assurance case (does insurer require independent DAA certification?), (3) excess/deductible (who absorbs cost of first incident?), (4) claim investigation protocol (insurer cooperates with CAA forensic investigation?).

Recommendation

Insurance assurance: (1) Obtain Errors & Omissions or Cyber policy explicitly covering autonomous flight-control and swarm-coordination failures; exclude 'loss of control' general exclusion, (2) Insurance underwriter to review ML assurance case (REQ009), AI failure mode register, geofence/termination design (REQ005, REQ011); underwriter approval required before trial, (3) Claims protocol: insurer to cooperate with CAA MOR (REQ012) without delay; claim settlement contingent on root-cause disclosure, (4) Ground survey extended: include GPS spoofing risk zones, RF interference sources, secondary impact zones if swarm exits primary TDA by design-failure distance (e.g 200m).

Phased gates present; AI-specific evidence gates not detailed

Pillar F · Analyst FionaREQ016

majorconf 7#75

Analysis

Requirement: 'phased gates so evidence builds progressively and is reversible.' Standard gates: single-aircraft RC, 2-aircraft close formation, 4-aircraft open formation, 8-aircraft dispersed, autonomous decision-making. But no explicit AI-evidence gates: at which point does DAA autonomous detection activate (2-aircraft or 4)? When does swarm-decision consensus algo take control (4-aircraft or 8)? What forensic data must be analyzed post-gate-0 before gate-1 approval (false-alarm rate on 1000 sim encounters? Geofence boundary-condition test passed?)? Progression can advance to autonomous swarm release without validating each AI subsystem.

Recommendation

Phased trial gates with AI-specific evidence requirements: (1) Gate 0 (RC + ground sim): DAA simulator performance validated (ROC curve, false-alarm <1%, false-miss <1e-5), swarm-decision consensus tested on 20+ scenarios, geofence boundary tested (GPS jitter ±10m, altitude ±5m, confirm containment), (2) Gate 1 (1-aircraft autonomous DAA): 10 flight hours, log all DAA decisions, zero false-negatives acceptable (if any close encounter not detected, investigate and retrain), confidence calibration validated (high-confidence output matches true-positive rate), (3) Gate 2 (2-aircraft swarm consensus shadowed by RC): 10 flight hours, log consensus votes, measure decision diversity, analyze edge cases (GPS dropout, one-aircraft sensor loss), (4) Gate 3 (4-aircraft swarm autonomous, DAA + consensus live): 20 flight hours, forensic analysis of all decisions (post-flight, classify each decision as nominal/edge-case/anomaly), anomaly count <5%, (5) Gate 4 (8-aircraft swarm autonomous): post-Gate 3 forensic analysis accepted, CAA/MAA sign-off. Each gate failure triggers investigative freeze + redesign/retrain cycle. This prevents scope creep into autonomous release.

Post-trial forensic close-out and lessons-learned governance not specified

Pillar F · Analyst FionaREQ020

majorconf 7#76

Analysis

Requirement notes absent post-trial close-out protocol. End-of-trial checklist typically: return aircraft, close airspace, release crew. But no specification: (1) forensic data retention and analysis (all trial logs analyzed for AI anomalies, edge cases, model confidence drift?), (2) lessons-learned mandatory disclosure (critical findings on DAA robustness, swarm-decision consensus instability, geofence GPS-spoofing risk published to CAA AI/ML community?), (3) model retraining authorization (if trial revealed model limitations, is retraining mandated before commercialization?), (4) regulatory close-out (CAA sign-off on trial completion, findings accepted, is trial repeatable or one-off?).

Recommendation

Post-trial assurance and close-out: (1) Forensic close-out report (within 6 weeks of trial end): all AI subsystem logs analyzed, anomalies classified (nominal / edge-case / unexplained), false-positive/negative rates computed, swarm-decision diversity trends examined, geofence boundary-condition robustness summary, (2) Lessons-learned package: critical findings (e.g 'DAA false-alarm rate 1.2% under GPS-challenged conditions') published anonymized to CAA AI/ML working group, (3) Model retraining decision gate: if trial revealed distributional shift or model weakness, retraining mandatory + re-qualification before commercialization, if no significant findings, model can be released, (4) Regulatory authorization close-out: CAA issues formal trial completion certificate, specifies: findings accepted, no further investigation needed, trial data preserved for 2 years, future BVLOS swarm trials can reference this trial evidence + lessons.

Insurance and survey scope declared; residual risk quantification and coverage validation absent

Pillar E · Analyst EvanREQ013

majorconf 7#77

Analysis

REQ013 nominally requires insurance + ground survey but does not link these to residual risk: (a) ground survey scope undefined (what hazards? population density? critical infrastructure?), (b) insurance coverage assumptions unclear (does policy cover swarm ops? agreed valuation of trial aircraft and third-party liability?), (c) residual risk assessment post-mitigation absent (after geofence, DAA, C2 backup, what third-party risk remains? is insurance limit adequate?). Without this linkage, REQ013 is procedural compliance, not assurance.

Recommendation

Insurance & environmental assurance: (a) detailed ground survey (critical infrastructure, dwelling density, vehicle traffic, water, wildlife sensitivity, at trial location + 1 km buffer), (b) residual risk assessment per ConOps (probability of uncontrolled impact × impact consequence = residual risk), (c) insurance policy review with broker confirmation of swarm-ops coverage and limits, (d) third-party risk analysis per CAP 722H (shadow model or QRA) linking residual hazards to insurance adequacy, (e) survey report signed by competent surveyor, (f) CAA validation that risk is acceptable or mitigations sufficient.

Interface alignment required; apportionment signed but scope and dispute resolution absent

Pillar E · Analyst EvanREQ015

majorconf 7#78

Analysis

REQ015 requires 'signed apportionment' but does not define: (a) scope of MOD apportionment (is ML assurance MOD-owned or shared? who certifies swarm algorithms?), (b) dispute resolution if CAA/MAA interpretation diverges during trial, (c) evidence handoff (how are classified MOD test results presented to CAA?). Without clarity, regulatory misalignment post-trial risks programme delay.

Recommendation

Apportionment document must include: (a) functional apportionment table (each requirement → CAA owner, MOD owner, or shared), (b) evidence ownership per requirement (who produces? who signs? who validates?), (c) interface control (e.g., 'MOD provides sworn classified test report; CAA reviews redacted summary'), (d) escalation path (named CAA/MAA technical contacts for disputes), (e) joint signature by CAA and MAA technical authorities.

CAA case officer accountability clear; AI completeness-check not named

Pillar A · Analyst AriaREQ001

minorconf 8#79

Analysis

User story names 'CAA case officer' as accountable for case opening. This is human-named accountability. However, if an AI system assists completeness checking (e.g., form validation, checklist cross-reference), no requirement states who is accountable if the AI incorrectly clears an incomplete pack. Is the case officer liable for trusting the AI? Likely yes, but not explicit.

Recommendation

Add: 'The CAA case officer remains accountable for case validity; AI completeness assistance must be auditable and override-able without penalty.'

SORA assessor accountable; AI evidence synthesis not scoped

Pillar A · Analyst AriaREQ002

minorconf 8#80

Analysis

User story names 'CAA SORA assessor' as accountable for SAIL/OSO setting. But swarm-aware SORA evidence may be synthesised or filtered by AI (e.g., ML-assisted swarm risk classification). No accountability chain for evidence quality or bias. Assessor could over-rely on AI's preliminary risk ranking without meaningful human control over evidence evaluation.

Recommendation

Add: 'The SORA assessor must demonstrate independent review of all AI-synthesised evidence before SAIL/OSO approval; syntheses must be explainable and challengeable.'

Safety-case reviewer accountability named; AI input to ODD not bounded

Pillar A · Analyst AriaREQ003

minorconf 8#81

Analysis

User story names 'CAA safety-case reviewer' as accountable for ConOps/ODD validation. But if AI generates ODD candidate scenarios (e.g., via Monte Carlo swarm simulation), who is accountable for scenario completeness? Reviewer signs off, but may not be able to audit all AI-generated scenarios in bounded time. Automation bias risk: reviewer assumes AI covered all cases.

Recommendation

Add: 'CAA reviewer must certify that ODD was independently validated; AI-generated scenarios must be segregated and explicitly accepted by human sign-off.'

CAA authorisation officer accountable for third-party risk grounding; accountability chain clear

Pillar A · Analyst AriaREQ013

minorconf 8#82

Analysis

User story names 'CAA authorisation officer' as accountable for insurance/ground-survey validation. Implication: if ground environment is wrongly assessed and third-party injury occurs, CAA officer's authorisation is basis of liability. This is named accountability. However: no requirement specifies who is accountable if AI-assisted ground survey (e.g., drone-based environmental mapping or risk heat-mapping) is inaccurate. If AI recommends 'risk acceptable' and CAA officer relies on that recommendation, who is liable? Likely the officer, but not explicit.

Recommendation

Add: 'If AI assists ground-environment survey or risk assessment, the CAA authorisation officer remains accountable for survey accuracy and must independently validate all AI recommendations. AI-assisted risk models must be explainable and challengeable.'

ConOps nominal/worst-case does not specify data failure modes

Pillar D · Analyst DanaREQ003

minorconf 7#83

Analysis

REQ003 asks for 'nominal and worst-credible behaviour' but does not require analysis of what happens when sensor data is incomplete, delayed, or ambiguous. Worst-case should include: GPS jitter, camera occlusion, radar interference, comms dropout. AI systems must degrade gracefully into those states.

Recommendation

Add to REQ003 ConOps narrative: 'Worst-credible cases must include sensor data failures (GPS loss, camera blind spots, swarm comms delay >X ms). For each, describe AI fallback (e.g. reduced swarm autonomy, return-to-home)'.

Flight termination must tolerate sensor data loss; no contingency data defined

Pillar D · Analyst DanaREQ011

minorconf 7#84

Analysis

REQ011 requires contingency and termination procedures, but does not specify: (a) what sensor data must be available for safe termination (GPS, comms, power state); (b) fallback termination if data is incomplete (dead-stick landing, parachute); (c) contingency matrix data (e.g. triggering conditions, decision latency, success criteria). Risk: if data is corrupted during an off-nominal event, termination logic may fail.

Recommendation

Add: 'Contingency procedures must define: (a) minimum-data termination mode (what if GPS/comms lost); (b) termination decision logic with explicit data inputs and validity checks; (c) acceptance criteria (latency, success rate in simulation/field test); (d) training data for crew on contingency scenarios.'

No data on crew training scenarios; gap between training distribution and ODD

Pillar D · Analyst DanaREQ014

minorconf 7#85

Analysis

REQ014 requires 'evidenced competency' but does not mandate: (a) what training scenarios (data) crew trained on; (b) whether those scenarios cover the ODD; (c) recency of training (drift in crew skill over time). Crew trained on sim-only data may not handle real-world swarm dynamics.

Recommendation

Add: 'Crew competency evidence must include: (a) training scenario database (coverage of ODD, swarm sizes, off-nominal events); (b) pass/fail criteria linked to incident-free flight; (c) recurrent training schedule and performance tracking.'

Gate criteria do not explicitly include data/AI assurance milestones

Pillar D · Analyst DanaREQ016

minorconf 7#86

Analysis

REQ016 requires 'phased gates' for progressive evidence building, but gate criteria are not specified. Risk: trial advances without confirming DAA model generalization, swarm-behaviour stability, or MOR completeness. Each gate should verify AI data assumptions have not been violated.

Recommendation

Define gate criteria explicitly: 'Each phase gate must verify: (a) AI model performance on holdout test data meets acceptance criteria; (b) no data drift detected in prior phase (confidence scores, misclassification rates stable); (c) MOR data complete and correctly classified; (d) crew training up-to-date on observed off-nominal swarm behaviours.'

No data preservation or lessons-learned requirement; reproducibility lost after trial ends

Pillar D · Analyst DanaREQ020

minorconf 7#87

Analysis

REQ020 notes 'evidence preservation and authorisation close-out' are missing. If trial data (MOR, AI models, test results) is not archived with metadata and access controls, independent T&E, incident investigation, and regulatory audit become impossible. Lessons-learned process must capture: bias observations, data quality issues, model drift, crew feedback.

Recommendation

Add REQ020: 'Post-trial close-out must include: (a) data archive (MOR, models, test data) with metadata and classification; (b) access-control policy (CAA, operators, third-party T&E); (c) lessons-learned report (data quality issues, AI surprises, bias observations, crew feedback); (d) recommendations for future trials (data governance, model update frequency, monitoring baselines); (e) authorisation close-out sign-off.'

Community awareness gap; AI risk not part of public narrative

Pillar F · Analyst FionaREQ017

minorconf 7#88

Analysis

Requirement notes absent public/stakeholder engagement (CAP 1616, GDPR implied). No explicit mention of communicating AI risk to community: local residents may not understand that swarm trial involves autonomous flight-control and AI-based collision avoidance. If incident occurs (e.g fly-away caused by geofence AI malfunction), community trust collapses and future autonomous trials become politically impossible.

Recommendation

Public engagement with AI transparency: (1) Community leaflet explaining trial scope: 'This trial includes AI-based collision detection and autonomous swarm coordination. Safety measures include redundant geofence, flight termination, and crew oversight. If an incident occurs, it will be investigated and findings disclosed,' (2) Pre-trial landowner consultation: notify within 1km radius, offer open-day to observe setup and meet crew, Q&A on AI safety, (3) Incident communication plan: if incident occurs, operator to publish preliminary findings within 7 days, final report within 4 weeks, root-cause determination (was it AI, sensor, operator, environment?), lessons learned shared with industry, (4) Opt-in monitoring: offer residents who consent ability to view trial progress (number of flights, no incident count, etc) on public dashboard.

CAA airspace regulator accountable; inter-regulator coordination role unclear

Pillar A · Analyst AriaREQ004

minorconf 7#89

Analysis

User story names 'CAA airspace regulator' as accountable for segregation. But TDA coordination may involve MOD/MAA (if military airspace) or civilian TDA operators. No requirement names the accountable role if coordination fails or boundaries are breached. Accountability seam between CAA and other airspace users is unaddressed.

Recommendation

Add: 'CAA must name the coordination point and veto authority for TDA overlaps; inter-regulator apportionment for airspace breach accountability must be signed.'

Licensing inspector accountable; swarm-specific competency may be under-assessed

Pillar A · Analyst AriaREQ014

minorconf 7#90

Analysis

User story names 'CAA licensing inspector' as accountable for competency validation. But swarm operation is new; competency assessment may under-weight swarm-specific skills (e.g., multi-aircraft decision-making, swarm override authority, AI failure-mode recognition). Requirement does not mandate swarm-specific competency standards or test AI-interaction fitness. Accountability is named but scope may be insufficient.

Recommendation

Extend REQ014: 'CAA licensing inspector must validate swarm-specific competencies: (a) Crew can recognise and override AI-recommended de-confliction if unintended; (b) Crew understand emergent swarm behaviour limits and can detect off-nominal emergence; (c) Crew can execute trial termination under high workload. Swarm-specific competency test must include AI failure scenarios.' Add: 'Trainer is accountable for ensuring crew meets swarm-specific competency gates.'

TDA coordination: no data sync protocol for real-time airspace-use data

Pillar D · Analyst DanaREQ004

minorconf 6#91

Analysis

REQ004 addresses TDA regulatory coordination but does not specify how real-time airspace data (NOTAMs, conflicting-user position feeds) will be ingested into the trial's geofence and coordination models. If live airspace data is stale, incomplete, or corrupted, the AI may not update geofence boundaries or swarm avoidance in time.

Recommendation

Add to REQ004: 'TDA package must define: data sources for live airspace updates, latency SLAs, fallback behaviour if data feed is lost or stale, and validation checks on incoming airspace-use data.'

No data on crew's situational-awareness inputs (AI explanations); bias in human decision-making not addressed

Pillar D · Analyst DanaREQ010

minorconf 6#92

Analysis

REQ010 requires 'meaningful human control' but does not specify what data (AI confidence scores, sensor fusion visualizations, swarm state) the crew will receive, how accurate those explanations must be, or whether crew training data (scenarios) represents the ODD. Risk: crew over-trusts AI if explanations are incomplete; over-corrects if explanations are noisy.

Recommendation

Add: 'Crew oversight must include: (a) real-time AI explanation fidelity acceptance criteria; (b) training scenario data covering nominal and off-nominal swarm states; (c) human-factors study of crew decision-making when AI confidence is low or contradictory; (d) procedure for crew to query AI reasoning (explainability SLA).'

Ground survey data may be stale; no requirement for live environmental monitoring

Pillar D · Analyst DanaREQ013

minorconf 6#93

Analysis

REQ013 requires 'ground survey' data to ground third-party risk assessment, but survey data (buildings, obstacles, population) can become stale. If survey data used to train obstacle-avoidance models is outdated, or if live ground monitoring (e.g. occupied/unoccupied status) is not available, the AI's risk assessment will be biased toward old conditions.

Recommendation

Add: 'Ground survey must include: (a) recency stamp (survey date, planned refresh); (b) live-update mechanism for high-change areas (e.g., construction, temporary structures); (c) data-quality assurance (confidence bounds on obstacle positions, population density).'

No data-sharing protocol between CAA and MAA; data governance boundary unclear

Pillar D · Analyst DanaREQ015

minorconf 6#94

Analysis

REQ015 requires 'signed apportionment' of regulatory ownership, but does not define what data (incident reports, sensor logs, model updates) will flow between CAA and MAA, or how classified MOD data is handled. Risk: data siloed or misclassified, slowing incident investigation.

Recommendation

Add data-governance annex: 'CAA/MAA data sharing must define: (a) what data flows in each direction (incident reports, sensor logs, model performance metrics); (b) classification handling (OUO, IP, national-security material); (c) SLA for data exchange; (d) secure data channels and audit logging.'

No data on environmental monitoring; noise, species, battery-fire risk unquantified

Pillar D · Analyst DanaREQ019

minorconf 6#95

Analysis

REQ019 notes missing requirements for 'noise species disturbance at dusk/night or battery-fire environmental risk.' No baseline environmental data, monitoring protocol, or incident-reporting mechanism defined. Risk: undetected environmental harm, legal liability.

Recommendation

Add environmental-monitoring requirement: 'Trial must include: (a) baseline environmental survey (noise, species presence/activity, ground conditions); (b) monitoring protocol (noise loggers, wildlife observers, ground inspection) during trial; (c) incident-reporting SOP (fire, spill, wildlife harm); (d) remediation plan if thresholds exceeded.'

Regulatory apportionment non-specific on AI failure accountability

Pillar F · Analyst FionaREQ015

minorconf 6#96

Analysis

Requirement: 'signed apportionment so regulatory ownership is gap-free.' Typical apportionment: CAA owns airspace, MAA owns platform capability, operator owns mission execution. But no specification: if AI (e.g ML DAA) fails and causes incident, who is accountable for investigation (CAA, MAA, or vendor)? Who mandates model retraining (CAA AI/ML working group, MAA)? Who decides if trial resumes post-incident (CAA/MAA jointly)? Ambiguity could delay incident response or create finger-pointing.

Recommendation

Apportionment agreement to explicitly include AI accountability: (1) DAA failure: MAA responsible for model qualification and retraining proposal, CAA approves retraining before resumption, (2) Swarm-decision failure: operator responsible for scenario build-up and operator training, MAA responsible for algorithm robustness, CAA approves edge-case resolution, (3) Geofence/termination failure: MAA responsible for independent assurance (L2/L3/L4 design), CAA approves before trial, (4) Post-incident: joint CAA/MAA forensic investigation, root-cause report due within 2 weeks, decision on resumption within 1 week post-closure, (5) Disclosure: critical AI-failure findings disclosed to industry via CAA AI/ML working group (anonymized).

No protocol for classified MOD AI data in CAA case review

Pillar F · Analyst FionaREQ018

minorconf 6#97

Analysis

Requirement notes absent protocol for CAP 722H / JSP 440 classified material. If MOD provides classified DAA training data or swarm-algorithm documentation for CAA review, how is this handled? No specification: (1) CAA AI/ML assessor security clearance, (2) review location (secure facility?), (3) evidence acceptance (can CAA certify 'classified DAA assurance' without owning evidence?), (4) post-incident access (can investigator access classified AI logs post-incident?).

Recommendation

Security protocol: (1) MOD and CAA to sign Security Protocol Agreement pre-trial, defining: review location (MOD secure facility or CAA cleared space?), assessor clearance requirement (SC or equivalent), permitted note-taking (sanitized findings only, no classification details in CAA file), (2) Alternative: MOD provides unclassified redacted summary of DAA assurance case; MOD retains classified AI logs, provides them to CAA investigator only post-incident under DPA/Official Secrets Act, (3) Post-incident access: MOU signed pre-trial governing investigator access to classified logs, investigation report (unclassified) published within 8 weeks.

Environmental impact not addressed; noise and fire risk not mitigated

Pillar F · Analyst FionaREQ019

minorconf 6#98

Analysis

Requirement notes absent environmental impact assessment (noise, species disturbance, battery-fire risk). Not strictly an AI risk, but: (1) if swarm malfunctions and loiters over sensitive habitat at dusk (when birds are active), AI containment failure causes environmental harm, (2) if geofence AI malfunction causes swarm to land in woodland, battery fire risk increases. Environmental compliance is separate gating item.

Recommendation

Environmental assessment: (1) Noise survey: baseline measurements, trial noise profile during mission phases (ascent, cruise, descent), impact on adjacent residential/sensitive zones, mitigation (altitude/routing restrictions), (2) Species survey: ornithologist to assess dusk/night disturbance risk if trial extends to low-light ops, (3) Battery/fire risk: swarm landing procedure (automated safe landing or manual recovery?) if geofence enforces containment descent, confirm landing zone clear of high-fire-risk vegetation, post-landing battery inspection protocol, (4) Environmental incident protocol: if AI malfunction causes unplanned landing in sensitive area, operator must notify local authority + wildlife agency within 4 hours.

STRONG: Complete recorded evidence & investigability mandate; accountability for forensic reconstruction clear

Pillar A · Analyst AriaREQ012

infoconf 10#99

Analysis

Requirement is strong. 'Complete recorded evidence', 'accepted MOR obligations', 'DPIA', and 'incident is investigable' name investigability as accountability driver. Implication: CAA flight-safety investigator is accountable for incident reconstruction. Data must be complete enough to reconstruct swarm state at incident time. This is the backbone of post-hoc accountability. Minor gap: no requirement specifies how long data must be retained, who is data custodian, or what happens if data is lost/corrupted.

Recommendation

Minor: Add: 'Operator is data custodian and accountable for data integrity for 7 years post-trial; CAA-designated investigator has authority to seize data and forensic logs at any time. Data retention timeline and custodian roles must be pre-trial signed.'

Intake Completeness Does Not Govern Trust Consumer Readiness

Pillar T · Analyst TrevorREQ001

infoconf 9#100

Analysis

REQ001 ensures the CAA case officer has a complete pack to open a valid case. Trust consumer: CAA intake clerk (low stakes). Basis: checklist completeness, not substantive trust in AI components. No explainability requirement here—this is a gating function. Completeness gates downstream assessment, but does not itself establish trust in AI outputs.

Recommendation

Ensure the intake checklist explicitly references REQ009 (AMLAS case) and REQ012 (MOR data) completeness as mandatory gates. Do not proceed to SORA assessment (REQ002) until ML assurance evidence is certified complete.