Data Analytics for Large Personal Lifelogs

Introduction

As wearable technology has become significantly cheaper, people increasingly rely on such devices to record profiles of individual behaviour by capturing and monitoring personal activities. This activity is often referred to as lifelogging, Considering the heterogeneous nature of the data created, as well as its appearance in form of constant data streams, lifelogging shares features that are usually attributed to big data. A typical example of wearable cameras is Microsoft’s SenseCam that can capture vast personal archives per day. A significant challenge is to organise and analyse such large volumes of lifelogging data, turn raw data set was collected by different sensors into meaningful information to the users.

Screen Shot 2015-09-17 at 20.46.20

To date, various aspects of lifelogging have been studied, such as the development of sensors, efficient capture and storage of data, processing and annotating the data to identify events, improved search and retrieval of information, assessment of user experience, design of user interfaces for applications of memory aids, diet monitoring, or analysis of activities of daily living (ADL).

Given the relative success of these efforts, the research challenge has now shifted from broader aspects of data management to that of retrieving information from the vast quantities of captured data. Current applications address this by employing automatic classifiers for segmenting a whole day’s recording into events and searching the historical record, or by building ontology-based multi-concept classifiers and searching for specific events. More recent research suggests use of statistical mapping from low-level visual features to semantic concepts of personal lifelogs. It is important to note that these approaches are based on training classifiers from a set of annotated ground truth images. Although supervised methods can lead to more accurate outcome in terms of detecting known patterns, they require prior knowledge from a domain expert to be fed into the system. In addition, the result for the classifier depend heavily on the quality and quantity of the training data, i.e. are biased to detection of activities that are defined and known to the domain expert a priori. Given that visual lifelogs usually consist of large and often unstructured collections of multimedia information, such a ‘concept-based’ and ‘rule-based method’ for analysing lifelogging data is not suitable for all use-cases. Ideally, an algorithm should be able to detect unknown phenomena occurring at different frequencies in such data.

Data

In this study, the data were generated from one person wearing the SenseCam over a six day period, from a Saturday to a Thursday. These particular days were chosen in order to include a weekend, where normal home activity varied in comparison to events on weekdays or a working week. Data statistics are reported in below table 4.1. To create a ground truth(In machine learning, the term ground truth refers to the accuracy of the training set’s classification for supervised learning techniques. This is used in statistical models to support or reject research hypotheses.), the user reviewed her collection and manually marked the boundary image between all events.

Screen Shot 2015-09-17 at 19.07.40

Methods and Results

Detrended Fluctuation Analysis (DFA) was used initially to analyse image time series, recorded by the SenseCam and exposed strong long-range correlation in these collections. It implies that continuous low levels of background information picked up all the time by the device. Consequently, DFA provides a useful background summary.

Screen Shot 2015-09-17 at 19.09.41

In the plot of log F(n) vs log n for different box sizes (Figure 4.1), the exponent H=0.93203 is clearly greater than 0.5, and reflects strong long-range correlation on images from the SenseCam, i.e. indicates that the time series is not a random walk (A random walk is a mathematical formalisation of a path that consists of a succession of random steps.), but is cyclical, implying that continuous low levels of background information are picked up constantly by the device. Consequently, the DFA provides a measure of many similar ‘typical’ backgrounds or environments.

The dynamics of the largest eigenvalue were examined, using the Wavelet Transform method. The technique gives a clear picture of the movements in the image time series by reconstructing these using each wavelet component. Some peaks were visible across all scales as shown in Figure 4.2. Studying the largest eigenvalue across all wavelet scales, it provides a powerful tool for examination of the nature of the captured SenseCam data.

Screen Shot 2015-09-17 at 19.09.50

Leave a Reply

Your email address will not be published. Required fields are marked *