Leveraging Retrieval-Augmented Large Language Models for Generating Context-Aware Anomaly Explanations in Industrial Control Systems
Siddhartha Pratim Dutta | September 2025
Industrial Control Systems (ICS) are increasingly vulnerable to sophisticated cyber-physical attacks as they integrate with modern IT infrastructure. While AI-based anomaly detection systems can identify threats effectively, they provide only abstract numerical alerts that lack the operational context human operators need for rapid response.
AXIS bridges this critical semantic gap by leveraging Retrieval-Augmented Generation (RAG) with Large Language Models to transform opaque numerical alerts into context-rich, actionable natural language explanations that operators can understand and act upon immediately.
Current AI-based Industrial Anomaly Detection (IAD) systems face a critical limitation: they excel at detecting anomalies but fail to explain them in operationally meaningful ways. Operators receive alerts like "anomaly detected at timestamp T with reconstruction error 0.85" without understanding:
Existing explainable AI (XAI) methods provide feature attribution scores but still fall short:
AXIS operates through a three-stage pipeline that systematically transforms low-level anomaly detections into high-level operational intelligence:
Curates a multi-source knowledge base from system documentation, equipment specifications, and cybersecurity threat intelligence (MITRE ATT&CK for ICS). Documents are parsed, normalized, and embedded for semantic retrieval.
Implementation: Multi-source metadata extraction and knowledge base construction from ICS documentation and cybersecurity frameworks. Implements document parsing, normalization, and vector embedding storage.
Tech Stack: Python, LlamaExtract, OpenAI Embeddings, LlamaIndex
Uses LSTM autoencoders for unsupervised anomaly detection on time-series ICS data. Employs ensemble attribution methods (MSE, Saliency Maps, LEMNA) to identify influential system components.
Implementation: LSTM-based anomaly detection with ensemble feature attribution methods. Fork of established research codebase with modifications for AXIS integration and improved attribution ensemble.
Tech Stack: Python, TensorFlow/Keras, LSTM, SHAP, LEMNA
Synthesizes feature attributions with retrieved domain knowledge using advanced RAG techniques. Generates structured natural language explanations with root causes, impacts, and mitigation strategies.
Implementation: Advanced retrieval-augmented generation pipeline for context-aware explanation synthesis. Implements metadata-driven retrieval, tactic inference, and structured LLM prompting.
Tech Stack: Python, LlamaIndex, OpenAI GPT-4o-mini, RAG
The metadata-enhanced RAG pipeline demonstrated superior explanation quality across three key dimensions:
Evaluation Metric |
N-RAG (Avg. Score out of 2) |
ME-RAG (Avg. Score out of 2) |
Improvement of ME-RAG over N-RAG (%) |
---|---|---|---|
Process Grounding Accuracy | 1.75 | 1.875 | 7.14 % |
Physical Impact Accuracy | 1.00 | 1.25 | 25 % |
Adversarial Context Accuracy | 0.75 | 1.875 | 150 % |
Overall Score | 3.5 | 5 | 42 % |
User study with six participants showed significant improvements in explanation usability:
The advanced pipeline incurs manageable operational costs for significantly improved quality:
Operational Metric |
N-RAG (Average) |
ME-RAG (Average) |
Increase for ME-RAG over N-RAG (%) |
---|---|---|---|
Latency (s) | 11.97 | 16.02 | 33.82 % |
Input Tokens | 446.875 | 1415.25 | 216.70 % |
Output Tokens | 343.625 | 483.25 | 40.63 % |
Monetary Cost ($) | 0.0003375 | 0.0006375 | 88.89 % |
The complete literature review, research methodology, implementation details, and comprehensive evaluation are documented in the dissertation submitted as part of the MSc Computing Science degree at the University of Glasgow.