AXIS Framework

Project Overview

Industrial Control Systems (ICS) are increasingly vulnerable to sophisticated cyber-physical attacks as they integrate with modern IT infrastructure. While AI-based anomaly detection systems can identify threats effectively, they provide only abstract numerical alerts that lack the operational context human operators need for rapid response.

AXIS bridges this critical semantic gap by leveraging Retrieval-Augmented Generation (RAG) with Large Language Models to transform opaque numerical alerts into context-rich, actionable natural language explanations that operators can understand and act upon immediately.

SWaT Testbed Python TensorFlow LlamaIndex GPT-4o-mini MITRE ATT&CK Retrieval-Augmented Generation

The Semantic Gap in ICS Security

Current AI-based Industrial Anomaly Detection (IAD) systems face a critical limitation: they excel at detecting anomalies but fail to explain them in operationally meaningful ways. Operators receive alerts like "anomaly detected at timestamp T with reconstruction error 0.85" without understanding:

What component is affected and its role in the process
Why the anomaly occurred and potential root causes
What the impact could be on system safety and operations
How to respond effectively with specific mitigation steps

Limitations of Current Approaches

Existing explainable AI (XAI) methods provide feature attribution scores but still fall short:

Feature rankings lack narrative structure and operational context
No connection to known adversarial tactics or attack patterns
Require significant domain expertise to interpret correctly
Increase cognitive load rather than reducing it

AXIS Framework

AXIS operates through a three-stage pipeline that systematically transforms low-level anomaly detections into high-level operational intelligence:

Stage I: Knowledge Base Curation

Curates a multi-source knowledge base from system documentation, equipment specifications, and cybersecurity threat intelligence (MITRE ATT&CK for ICS). Documents are parsed, normalized, and embedded for semantic retrieval.

Implementation: Multi-source metadata extraction and knowledge base construction from ICS documentation and cybersecurity frameworks. Implements document parsing, normalization, and vector embedding storage.

Tech Stack: Python, LlamaExtract, OpenAI Embeddings, LlamaIndex

View Repository →

Stage II: Anomaly Detection & Attribution

Uses LSTM autoencoders for unsupervised anomaly detection on time-series ICS data. Employs ensemble attribution methods (MSE, Saliency Maps, LEMNA) to identify influential system components.

Implementation: LSTM-based anomaly detection with ensemble feature attribution methods. Fork of established research codebase with modifications for AXIS integration and improved attribution ensemble.

Tech Stack: Python, TensorFlow/Keras, LSTM, SHAP, LEMNA

Stage II: Anomaly Detection & Attribution

View Repository →

Stage III: RAG-Enhanced Explanation

Synthesizes feature attributions with retrieved domain knowledge using advanced RAG techniques. Generates structured natural language explanations with root causes, impacts, and mitigation strategies.

Implementation: Advanced retrieval-augmented generation pipeline for context-aware explanation synthesis. Implements metadata-driven retrieval, tactic inference, and structured LLM prompting.

Tech Stack: Python, LlamaIndex, OpenAI GPT-4o-mini, RAG

View Repository →

Key Results

42%

Overall Explanation Quality Improvement

150%

Adversarial Context Accuracy Increase

33%

Reduction in Cognitive Load

Quantitative Content Analysis

The metadata-enhanced RAG pipeline demonstrated superior explanation quality across three key dimensions:

Process Grounding Accuracy: 7.14% improvement through targeted component retrieval
Physical Impact Accuracy: 25% improvement in predicting operational consequences
Adversarial Context Accuracy: 150% improvement in linking anomalies to attack techniques

Evaluation Metric	N-RAG (Avg. Score out of 2)	ME-RAG (Avg. Score out of 2)	Improvement of ME-RAG over N-RAG (%)
Process Grounding Accuracy	1.75	1.875	7.14 %
Physical Impact Accuracy	1.00	1.25	25 %
Adversarial Context Accuracy	0.75	1.875	150 %
Overall Score	3.5	5	42 %

User-Centric Evaluation

User study with six participants showed significant improvements in explanation usability:

Confidence: Increased from 3.0 to 4.5 (5-point scale)
Actionability: Rated 4.33 vs baseline raw XAI output
Cognitive Load: Reduced from 3.5 to 2.33
Trustworthiness: Perfect score (5.0) vs 3.5 for naive approach

Operational Overhead Analysis

The advanced pipeline incurs manageable operational costs for significantly improved quality:

Latency: 34% increase (11.97s → 16.02s average)
Cost: 89% increase ($0.0003 → $0.0006 per explanation)
Token Usage: 217% increase in input tokens for richer context

Operational Metric	N-RAG (Average)	ME-RAG (Average)	Increase for ME-RAG over N-RAG (%)
Latency (s)	11.97	16.02	33.82 %
Input Tokens	446.875	1415.25	216.70 %
Output Tokens	343.625	483.25	40.63 %
Monetary Cost ($)	0.0003375	0.0006375	88.89 %

Dissertation

The complete literature review, research methodology, implementation details, and comprehensive evaluation are documented in the dissertation submitted as part of the MSc Computing Science degree at the University of Glasgow.

View Dissertation

Download PDF

AXIS: Anomaly eXplanations for Industrial control Systems

Project Overview

The Semantic Gap in ICS Security

Limitations of Current Approaches

AXIS Framework

Stage I: Knowledge Base Curation

Stage II: Anomaly Detection & Attribution

Stage III: RAG-Enhanced Explanation

Key Results

Quantitative Content Analysis

User-Centric Evaluation

Operational Overhead Analysis

Dissertation

View Dissertation