Paper reading ... note the paper briefly.


Detect and process anomalies for big data in real-time is difficult, here is start from two considerations: (1) difficult for typical algorithms to scale and retain the data's real-time characteristics based on the data's volume and velocity; (2) many existing algorithms only consider the content of the data source.

The contextual anomaly detection framework proposed is consist of two steps: (1) content detection; (2) context detection.

The research has been evaluated against: (1) real-world sensor data-sets; (2) open-source toolbox.

Introduction and Background:

Types of anomalies: (1) point / content anomalies; (2) context anomalies; (3) collective anomalies.

Types of algorithms to detect anomalies: (1) unsupervised; (2) supervised; (3) semi-supervised.

The normal ways to handle contextual anomaly applications: (1) transformed into a point anomaly problem, apply separate point anomaly detection techniques to the same data-set, within different contexts, however it's necessary to define the contexts of normal and anomalous records prior;  (2) utilize the existing structure within the records to detect anomalies using all the data concurrently, however it requires a higher computational complexity.

Many previous anomaly detection algorithms in the sensor domain focus on using the sequential information of the reading to predict a possible value and then compare this value to the actual reading; Little work has been performed in providing context-aware anomaly detection algorithms; Some people has proposed different work or discussed from different aspects, eg. attribute graph, neural networks.

Research design and methodology:

Illustrates the process of the technique from a component-level:

Overview of the algorithm:

overview of anomaly detection

Primary reason for creating a separating concerns between content and context: (1) scalability; (2) a sensor may be acting non-anomalous within its own history of values, but not when viewed with sensors with similar context.

The concept of sensor profile: contextually aware representation of the sensor, as a subset of its attributes. And it is defined using a multivariate clustering algorithm.

Result and evaluation:

The evaluation of the framework was performed using three sets of data: (1) a set of HVAC electricity sensors; (2) a set of temperature sensors; (3) a set for a traffic system in California; And there is a positive result after the cross-validation using real-world data and comparing with open-source toolbox;

Future work to explore:

(1) wide data-sets; (2) Implement the framework within a working business environment that is streaming live data to the  central repository; (3) exploited the anomalous information; (4) the proposed modules can be modified and updated with other types of algorithms; (5) additional modules could be added to the framework itself.