Paper reading ... note the paper briefly.
Detect and process anomalies for big data in real-time is difficult, here is start from two considerations: (1) difficult for typical algorithms to scale and retain the data's real-time characteristics based on the data's volume and velocity; (2) many existing algorithms only consider the content of the data source.
The contextual anomaly detection framework proposed is consist of two steps: (1) content detection; (2) context detection.
The research has been evaluated against: (1) real-world sensor data-sets; (2) open-source toolbox.
Introduction and Background:
Types of anomalies: (1) point / content anomalies; (2) context anomalies; (3) collective anomalies.
Types of algorithms to detect anomalies: (1) unsupervised; (2) supervised; (3) semi-supervised.
The normal ways to handle contextual anomaly applications: (1) transformed into a point anomaly problem, apply separate point anomaly detection techniques to the same data-set, within different contexts, however it's necessary to define the contexts of normal and anomalous records prior; (2) utilize the existing structure within the records to detect anomalies using all the data concurrently, however it requires a higher computational complexity.
Many previous anomaly detection algorithms in the sensor domain focus on using the sequential information of the reading to predict a possible value and then compare this value to the actual reading; Little work has been performed in providing context-aware anomaly detection algorithms; Some people has proposed different work or discussed from different aspects, eg. attribute graph, neural networks.
Research design and methodology:
Illustrates the process of the technique from a component-level:
Algorithm: Contextual Anomaly Detection
content = UnivariateGaussianPredictor(SensorValur)
if IsAnomalous(content) || IsRandomContextCheck(content) then
profile = GetSensorProfile(SensorValue);
context = MultivariateGaussianPredictor(SensorValue, profile);
if IsAnomalous(context) then
return Anomaly = true;
return Anomaly = false;
return Alomaly = false;
Overview of the algorithm:
Primary reason for creating a separating concerns between content and context: (1) scalability; (2) a sensor may be acting non-anomalous within its own history of values, but not when viewed with sensors with similar context.
The concept of sensor profile: contextually aware representation of the sensor, as a subset of its attributes. And it is defined using a multivariate clustering algorithm.
Result and evaluation:
The evaluation of the framework was performed using three sets of data: (1) a set of HVAC electricity sensors; (2) a set of temperature sensors; (3) a set for a traffic system in California; And there is a positive result after the cross-validation using real-world data and comparing with open-source toolbox;
Future work to explore:
(1) wide data-sets; (2) Implement the framework within a working business environment that is streaming live data to the central repository; (3) exploited the anomalous information; (4) the proposed modules can be modified and updated with other types of algorithms; (5) additional modules could be added to the framework itself.