Gamification, Sensors and Big Data


What is Gamification?

“Gamification is the use of game design elements in non-game context,” as defined by Deterding et al. (2011,pp. 9-15). Huotari and Hamari (2012,pp. 17-22) define gamification as “a process of enhancing a service with affordances for gameful experiences in order to support user’s overall value creation”. For example, every time you buy a starbucks coffee and collected points toward a free coffee, you’re being gamified.


Who are Gamers? What are they?

There are several findings are highlighted by Moss (2015) at the Rocks Digital Conference: 42% of Americans play video games at least 3 hours per day; The average gamer is 35; Nearly 44% gamers are female; There are 22 billion dollars are spent video games and 960 million games are on smart phone.

Bartle (2003) published Designing Virtual Worlds in which the differentiated four game users (better known as Richard Bartle player types) who are only tangentially related to the avatars they may choose in playing video games.
Richard Bartle player type’s gamification taxonomy can be shown as below figure:


In Gamification it is critical to monitor the performance of your metrics closely to insure your business getting the best results possible and allows different players are happy.

What Gamification can do?

Gamification is using game mechanics to ensure users return visits, more engagement, increased sales, increased learning and participation.

This means the business are trying to get people motivated and engaged to take action in order to achieve a goal or bigger purpose. Gamification helped the business is defining a framework in-which to how to win or lost. Gamification is recognizing that games are actually very powerful and sophisticated layers of mechanics to motivate people to achieve. The game mechanics can be applied to anything where human behavior needs to be optimized.


Different types of Gamification Mechanics

There are over 52 different types of game mechanics. However, the seven most commonly used mechanics in gamification are:



Sensors includes accelerometers, location detection, wireless connectivity and cameras, offer another big step towards closing the feedback loop in personalized data. Many devices and services help with tracking physical activity, caloric intake, sleep quality, posture, and other factors involved in personal well-being.

In 2006, Nike developed an app called Nike+ which tracks running distance, speed, and time. It stores the data every time you run so you can monitor your progress. You can compete with friends and other Nike+ users to try and run the furthest distance. The purpose is to get people active and running, and they use gamification to do so. There are leaderboards for different time frames (weekly, monthly, and so on). If you are one of the top runners, your profile is displayed on the leaderboard for bragging rights. Nike+ can export the data to other apps (such as food tracking apps for calories burned) and can also publish your information to your social media accounts. Nike+ has become a community of people who all enjoy running and want to push each other to keep improving and live healthy lifestyles.

In order to gain a competitive advantage, Nike implemented gamification to their marketing mix and was able to secure a controlling portion of the running shoe market. In 2006, Nike controlled 47% of the running shoe market and launched Nike+ in the middle of the year. In 2007, they controlled 57% of the market, and 61% in 2009. They’ve had a controlling portion of the running shoe market ever since. In 2007 there were 500,000 Nike+ members. In 2013, there were 11,000,000 Nike+ members and the numbers keep growing.



Big Data

The success of Nike+ is an example of how measuring performance is useful to gaining key insight. For users, they get to understand their running patterns better. For Nike, servers worth of Nike+ data can be turned into strategic business decisions to improve company performance.

Nike’s vast amount of data on the physical performance of runners has turned into an initiative to see what more can be done with the data to make inferences upon social and mental behavior as well. The Nike+Accelerator program gave data access to software developers so that new startups could be created around the data available.

Nike also used the data to perform life-cycle analysis of the shoe material because not much was known about the sustainable quality since Nike’s supply chain is quite long. By analyzing data of 7 million users, most of which are long-term users, Nike was able to help 600 designers make cost-effective, quality-enhancing, and sustainable changes to product material.

All thanks to gamification. Stefan Olander, VP of Nike’s Digital Sport division, said, “People want credit for their physical activity.” It’s true. Not everyone can be a professional, but people are passionate about engaging in being active. Nike+ game mechanics turns everyone into smatter self-coaches through motivation and social inspiration. Nike turned the most uncomplicated sports in the world, running, into a data-driven social sport that gives users access to tons of data about their personal achievements. Runners can use this data to become better at running, resulting in a healthier lifestyle. In addition, Nike gives software developers open access to this data.


Deterding, S., Dixon, D., Khaled, R., and Nacke, L. (2011) From game design elements to gamefulness: defining gamification. Proc. Int. Academic MindTrek Conference pp. 9-15.

Huotari, K., and Hamari, J. (2012) Defining gamification: a service marketing perspective. Proceeding of the 16th International Academic MindTrek Conference, pp. 17-22.

Moss, M. (2015) ‘Gigital Marketing with Website Gamification’. Available at:
(Accessed: 06 September 2015).

Bartle, R.A. (2003) Designing Virtual Worlds. 1st edn. New Riders.

Data Analytics for Large Personal Lifelogs


As wearable technology has become significantly cheaper, people increasingly rely on such devices to record profiles of individual behaviour by capturing and monitoring personal activities. This activity is often referred to as lifelogging, Considering the heterogeneous nature of the data created, as well as its appearance in form of constant data streams, lifelogging shares features that are usually attributed to big data. A typical example of wearable cameras is Microsoft’s SenseCam that can capture vast personal archives per day. A significant challenge is to organise and analyse such large volumes of lifelogging data, turn raw data set was collected by different sensors into meaningful information to the users.

Screen Shot 2015-09-17 at 20.46.20

To date, various aspects of lifelogging have been studied, such as the development of sensors, efficient capture and storage of data, processing and annotating the data to identify events, improved search and retrieval of information, assessment of user experience, design of user interfaces for applications of memory aids, diet monitoring, or analysis of activities of daily living (ADL).

Given the relative success of these efforts, the research challenge has now shifted from broader aspects of data management to that of retrieving information from the vast quantities of captured data. Current applications address this by employing automatic classifiers for segmenting a whole day’s recording into events and searching the historical record, or by building ontology-based multi-concept classifiers and searching for specific events. More recent research suggests use of statistical mapping from low-level visual features to semantic concepts of personal lifelogs. It is important to note that these approaches are based on training classifiers from a set of annotated ground truth images. Although supervised methods can lead to more accurate outcome in terms of detecting known patterns, they require prior knowledge from a domain expert to be fed into the system. In addition, the result for the classifier depend heavily on the quality and quantity of the training data, i.e. are biased to detection of activities that are defined and known to the domain expert a priori. Given that visual lifelogs usually consist of large and often unstructured collections of multimedia information, such a ‘concept-based’ and ‘rule-based method’ for analysing lifelogging data is not suitable for all use-cases. Ideally, an algorithm should be able to detect unknown phenomena occurring at different frequencies in such data.


In this study, the data were generated from one person wearing the SenseCam over a six day period, from a Saturday to a Thursday. These particular days were chosen in order to include a weekend, where normal home activity varied in comparison to events on weekdays or a working week. Data statistics are reported in below table 4.1. To create a ground truth(In machine learning, the term ground truth refers to the accuracy of the training set’s classification for supervised learning techniques. This is used in statistical models to support or reject research hypotheses.), the user reviewed her collection and manually marked the boundary image between all events.

Screen Shot 2015-09-17 at 19.07.40

Methods and Results

Detrended Fluctuation Analysis (DFA) was used initially to analyse image time series, recorded by the SenseCam and exposed strong long-range correlation in these collections. It implies that continuous low levels of background information picked up all the time by the device. Consequently, DFA provides a useful background summary.

Screen Shot 2015-09-17 at 19.09.41

In the plot of log F(n) vs log n for different box sizes (Figure 4.1), the exponent H=0.93203 is clearly greater than 0.5, and reflects strong long-range correlation on images from the SenseCam, i.e. indicates that the time series is not a random walk (A random walk is a mathematical formalisation of a path that consists of a succession of random steps.), but is cyclical, implying that continuous low levels of background information are picked up constantly by the device. Consequently, the DFA provides a measure of many similar ‘typical’ backgrounds or environments.

The dynamics of the largest eigenvalue were examined, using the Wavelet Transform method. The technique gives a clear picture of the movements in the image time series by reconstructing these using each wavelet component. Some peaks were visible across all scales as shown in Figure 4.2. Studying the largest eigenvalue across all wavelet scales, it provides a powerful tool for examination of the nature of the captured SenseCam data.

Screen Shot 2015-09-17 at 19.09.50

Large Personal Lifelogs

What is the Quantified Self?

Our body is constantly sending out signals that, if listened to carefully, allow us to better understand the state of our personal well-being. For example, feeling weak and tired after a long sleep can be seen as indication that the quality of the sleep was low. It might reveal evidences regarding our personal fitness level or our mental state. Being aware of the importance of our body signals, medical doctors heavily reply on them when they have to make a diagnosis. Adopting these methods, a growing number of people have now started to constantly measure the fitness level of their bodies, using a variety of equipment to collect and store the data. These individuals can be considered to be part of the Quantified Self (QS) movement that uses instruments to record numerical data on all aspects of our lives: inputs (food consumed, surrounding air-quality), states (mood, arousal, blood oxygen levels), and performance (mental, physical). The data acquisition through technology: wearable sensors, mobile apps, software interfaces, and online communities.

The technology research and advisory company Gartner predicts that by 2017, the Quantified Self movement will evolve into a new trend with around 80% of consumers collecting or tracking personal data. Moreover, they predict that by 2020, the analysis of consumer data collected from wearable devices will be the foundation for up to five percent of sales from the Global 1000 companies. Given these predictions, it comes with no surprise that more and more companies try to enter the market with novel wearable device. In fact, a multitude of devices, services and apps are available now that track almost everything we do nowadays.



Memory is the process by which information is encoded, stored, and retrieved. Given enough stimuli and rehearsal, humans can remember information for many years and recall that information whenever required. However, not all stimuli are strong enough to generate a memory that can be recalled easily. One of the most valuable and effective ways to reduce the impact of Age-related Memory Impairment on everyday functioning is by using external memory aids. Lifelogging technologies can significantly contribute to the realisation of these external memory aids. Lifelogging is the process of automatically, passively and digitally recording aspects of our life experiences. This includes visual lifelogging, where the lifeloggers wear head-mounted cameras or cameras mounted in front of the chest, that capture personal activities through the medium of images or video. Despite its relative novelty, visual lifelogging is gaining popularity because of projects like the Microsoft SenseCam.

The SenseCam device is a small, lightweight wearable device that automatically captures a wearer’s every moment as a series of images and sensor readings. It has been shown recently that such images and other data can be periodically reviewed to recall and strengthen individuals’ memory. Normally, the SenseCam captures an image at the rate of one every 30 seconds and collects about 4,000 images in a typical day. It can also be set up to take more frequent images by detecting sudden changes in the wearer’s environment such as significant changes in light level, motion and ambient temperature. The SenseCam generate a very large amount of data for a single typical day.


BIG Lifelogging data

“Big Data” applications are generally believed to have four elements which popularly characterise a big data scenario, and these are volume, variety, velocity and veracity. In this section we will examine how lifelogging does, or does not conform to those four characteristics because there are certain advantages which “big Data” technologies could bring to the lifelogging application.


Lifelogging is essentially about generating and capturing data, whether it comes from sensors from sensors, our information accesses, our communications, and so on. One characteristic which makes lifelogging a big data application and poses both challenges and opportunities for data antlyics, is because of the variety in the data sources.

Primary data includes sources such as physiological data from wearable sensors (heart rate, respiration rate, galvanic skin response, etc.), movement data from wearable accelerometers, location data, nearby bluetooth devices, WiFi networks and signal strengths, temperature sensors, communication activities, data activities, environmental context, images or video from wearable cameras, and that does not take into account the secondary data that can be derived from this primary lifelog data through semantic analysis. All these data sources are tremendously varied and different. In lifelogging, all these varied sources merge and combine together to form a holistic personal lifelog where the variety across data sources is normalised and eliminated.

The velocity of data refers to the subtle shifting changes in patterns within a data source or stream, and this is not much of an issue for lifelog data, yet, because most lifelogging analysis and processing work is not done in applications which require identifying a pattern or a change in real time. This is one of the trends for future work; real-time pattern analysis could potentially be employed for healthcare monitoring and real-time interventions.

Lifelogging generates continuous streams of data on a per-person basis, however despite the potential for real-time interactions, most of the applications for lifelogging we have seen to date do not yet operate in a real-time mode. So while lifelogging does not yet have huge volume, this volume of data is constantly increasing as more and more people lifelog. For a single individual, the data volumes can be large when considered as a Personal Information Management challenge, but in terms of big-data analysis, the data volumes for a single individual are small. Considering a lifelog of many people, thousands, perhaps millions, all centrally stored by a service provider, then the data analytics over such huge archives becomes a real big-data challenge in terms of volume of data.

Finally, veracity refers to the accuracy of data and to it sometimes being imprecise and uncertain. In the case of lifelogging, because much of our lifelog data can be derived from sensors which may be trou- blesome, or have issues of calibration and sensor drift, as described in Byrne and Diamond (2006). Hence, we can see that lifelogging does have issues of data veracity which must be addressed. Semantically, such data may not be valuable without additional processing. In applications of wireless sensor networks in environmental monitoring, for example, trust and reputation frameworks to handle issues of data accuracy have been developed, for example RFSN (Reputation-based Framework for High Integrity Sensor Networks) developed by Ganeriwal et al. (2008). Similarly, data quality is a major issue in enterprise information processing.

Byrne, R. and Diamond, D. (2006). Chemo/bio-sensor networks. Nature Materials, 5(6):421-424.

Ganeriwal, S., Balzano, L. K., and Srivastava, M. B. (2008). Reputation based framework for high integrity sensor networks. ACM Transactions on Sensor Networks, 4(3):1-37.

Try R: Analysis Google Scholar citation data

What is R?

R is a data analysis tool: data scientists, statisticians, analysts, and others who need to make sense of data use R for statistical analysis, data visualization, and predictive modeling.

R is an open-source software project. It’s totally FREE. The source code of R is also open for inspection and modification to anyone who wants to see how the methods and algorithms work under the covers.

Learn R?

There are thousands of websites that offer tutorials to learn R. It’s best to start of simple like “Try R code school”, which is a step by step guide for learning the basics of R. There are a total of 8 Chapter Badges that can be “earned”. Once you completed the tutorial, you will be awarded a badges, like below:


Example using R?

As a researcher, Google Scholar Citations lets you track citations to your publications over time. Recently, I found an interesting R package, called Scholar.

This scholar package provides functionality to extract citation data from Google Scholar. It allows you to compare multiple scholars and predict future h-index values. There’s a full guide on Github (along with the source code).

Now I will show you step by step how to use this package to extract information.

1. Download and Install Scholar package

You can download this package from CRAN.
Once the download finished, you can install the package by typing in the R Console:

2. Get profile data from a Google Scholar

Once you opened a Google Scholar profile page, the URL will contain a string that ends with user=qj74uXkAAAAJ. To use this Scholar package, you need to reference scholars by this id. For example Stephen W. Hawking’s data:


3. Compare multiple scholars

You can also compare multiple scholars, for example Stephen W. Hawking and Albert Einstein:


Result is:


4. Predicting future h-index values

A scholar’s h-index is n if they have pulished at least n papers that have been cited at least n times each. Now we show how to use the scholar package to predict future h-index values:


Result is :



Discovering knowledge from data using Google Fusion Tables


Good Fusion Tables is a free Web service provided by Google for data management. Data is stored in the form of tables that web users can view and download. In this blog, I illustrate how to discover knowledge from data by applying Google Fusion Tables. I visualise the total population and population density of Ireland’s 26 counties.

Data Sets

Three data sets were used for this study. The first data set about the population numbers of Ireland’s 26 counties was obtained from the Central Statistics Office Website. It consists of a table listing the 2011 Census Population Data. This was imported into an Excel 2010 Spreadsheet and saved as “Ireland_ Population” on Google Drive and the local machine. The second data was downloaded from the Irish Independent Website. It comes in form of a KML file and contains geographic information of Ireland’s county boundaries. I saved this data as “map_lead.KML” on Google Drive and the local machine. The last data set was named “Ireland_Population_Density”. It includes four columns, namely county name, population, area and population density. In this data set, the county name and population was obtained from the first data set. The area was obtained from Wikipedia. The population density of each county was calculated by population divided by area.


There are several steps involved to create a heat map using Fusion Tables.

Step One: Prepare data for Fusion Tables

Not every data set that was obtained is perfect, i.e., we have to “clean” the raw data and prepare it for further analysis. For example, the Ireland Population data set that we obtained from the Central statistics Office website also includes Province, Males, Females and Total Persons. It can be reduced to two spreadsheet columns, namely province and total number of persons. Also, it can be reduced to 26 Rows, one row for each county. In this data set, I found that County Laoighis has been corrected to Laois and that Tipperary North and Tipperary South were merged to Tipperary. The total number of persons of Tipperary was calculated by summing the total persons of Tipperary North and Tipperary South.

Step Two: Apply data to Fusion Tables

When you click “Create” on the start page of Google Fusion Tables, a dialogue window as shown as below appears. As the next step, you have to locate the Ireland_Population data set and click “Next” at the bottom of the window. Once the data set is imported, clicking on “Next” will open the next screen. Here, you can edit basic information such as table name. Once finished, click “Finish”.


Next, you need to upload the second table, map_lead.KML, which is the one that contains geographic information of Ireland’s county boundaries. To import the second table, you need to open a new browser window or tab and repeat the process above.

Step Three: Merge and upload two tables

Now that the two tables have been imported to Fusion Tables, the next step is to merge them into one that has population numbers and state boundary information. Go back to the uploaded Ireland_Population table, click on “File” and in the drop-down menu, select “Merge”.

In the next window, Fusion Tables asks what and where the second table is. Then, select the second table, map_lead.KML, and click “Next”.

In the next window, we have to match the sources. We match county “Name” column of the first table with county “Name” column of the second table. Next, click “Merge”. Then, click “View table”. The merged table is created. Then we click on the “Map of geometry” tab and a heatmap diagram is generated.

Step Four: Style and publish the heat map

We can click on the “Change feature styles” button and manually break the range of population numbers into six buckets, add the map legend as well and finally click on “Done”.

By default, the map can only be viewed by the creator. In order to share the map, we can click the “Share” label on the upper-right corner of the map, select the visibility option of “Public on the web” and click on “save”.

After that, we can click on “Map of geometry” tab and then “Publish”. The first heat map of Ireland’s Population is created.

We repeated the process above and the second heat map of Ireland Population Density is created as well.


The first map shows the Total Population of Ireland’s 26 counties.

The second map shows the Population Density of Ireland’s 26 counties.


As we can see from results, Dublin areas rank highly in the most densely populated counties. Almost 1/3 of Ireland’s population lives in or around Dublin. There are a number of reasons for that, including:
– Dublin is the capital of Ireland
– It’s Ireland’s main port.
– It’s Ireland’s main transport focus.
– It’s Ireland’s most important educational, cultural, and commercial centre.
– It’s a zone of attraction for migrants, and foreign direct investment.

In the west of Ireland the population is lower as shown on the first map. The population density is significantly lower. Some reasons for the lower population density in the west of Ireland include:
– Lack of good transport routes
– Distance from large markets like Dublin and UK
– Lack of third level colleges for education
– Lack of job opportunities