Try R: Analysis Google Scholar citation data

What is R?

R is a data analysis tool: data scientists, statisticians, analysts, and others who need to make sense of data use R for statistical analysis, data visualization, and predictive modeling.

R is an open-source software project. It’s totally FREE. The source code of R is also open for inspection and modification to anyone who wants to see how the methods and algorithms work under the covers.

Learn R?

There are thousands of websites that offer tutorials to learn R. It’s best to start of simple like “Try R code school”, which is a step by step guide for learning the basics of R. There are a total of 8 Chapter Badges that can be “earned”. Once you completed the tutorial, you will be awarded a badges, like below:

tryR

Example using R?

As a researcher, Google Scholar Citations lets you track citations to your publications over time. Recently, I found an interesting R package, called Scholar.

This scholar package provides functionality to extract citation data from Google Scholar. It allows you to compare multiple scholars and predict future h-index values. There’s a full guide on Github (along with the source code).

Now I will show you step by step how to use this package to extract information.

1. Download and Install Scholar package

You can download this package from CRAN.
Once the download finished, you can install the package by typing in the R Console:
install.packages(“scholar”)

2. Get profile data from a Google Scholar

Once you opened a Google Scholar profile page, the URL will contain a string that ends with user=qj74uXkAAAAJ. To use this Scholar package, you need to reference scholars by this id. For example Stephen W. Hawking’s data:

f1

3. Compare multiple scholars

You can also compare multiple scholars, for example Stephen W. Hawking and Albert Einstein:

f2

Result is:

f3

4. Predicting future h-index values

A scholar’s h-index is n if they have pulished at least n papers that have been cited at least n times each. Now we show how to use the scholar package to predict future h-index values:

f4

Result is :

f5

References:
http://www.r-bloggers.com
http://tryr.codeschool.com
https://cran.r-project.org/web/packages/scholar/index.html

Discovering knowledge from data using Google Fusion Tables

Introduction

Good Fusion Tables is a free Web service provided by Google for data management. Data is stored in the form of tables that web users can view and download. In this blog, I illustrate how to discover knowledge from data by applying Google Fusion Tables. I visualise the total population and population density of Ireland’s 26 counties.

Data Sets

Three data sets were used for this study. The first data set about the population numbers of Ireland’s 26 counties was obtained from the Central Statistics Office Website. It consists of a table listing the 2011 Census Population Data. This was imported into an Excel 2010 Spreadsheet and saved as “Ireland_ Population” on Google Drive and the local machine. The second data was downloaded from the Irish Independent Website. It comes in form of a KML file and contains geographic information of Ireland’s county boundaries. I saved this data as “map_lead.KML” on Google Drive and the local machine. The last data set was named “Ireland_Population_Density”. It includes four columns, namely county name, population, area and population density. In this data set, the county name and population was obtained from the first data set. The area was obtained from Wikipedia. The population density of each county was calculated by population divided by area.

Methods

There are several steps involved to create a heat map using Fusion Tables.

Step One: Prepare data for Fusion Tables

Not every data set that was obtained is perfect, i.e., we have to “clean” the raw data and prepare it for further analysis. For example, the Ireland Population data set that we obtained from the Central statistics Office website also includes Province, Males, Females and Total Persons. It can be reduced to two spreadsheet columns, namely province and total number of persons. Also, it can be reduced to 26 Rows, one row for each county. In this data set, I found that County Laoighis has been corrected to Laois and that Tipperary North and Tipperary South were merged to Tipperary. The total number of persons of Tipperary was calculated by summing the total persons of Tipperary North and Tipperary South.

Step Two: Apply data to Fusion Tables

When you click “Create” on the start page of Google Fusion Tables, a dialogue window as shown as below appears. As the next step, you have to locate the Ireland_Population data set and click “Next” at the bottom of the window. Once the data set is imported, clicking on “Next” will open the next screen. Here, you can edit basic information such as table name. Once finished, click “Finish”.

s1

Next, you need to upload the second table, map_lead.KML, which is the one that contains geographic information of Ireland’s county boundaries. To import the second table, you need to open a new browser window or tab and repeat the process above.

Step Three: Merge and upload two tables

Now that the two tables have been imported to Fusion Tables, the next step is to merge them into one that has population numbers and state boundary information. Go back to the uploaded Ireland_Population table, click on “File” and in the drop-down menu, select “Merge”.

In the next window, Fusion Tables asks what and where the second table is. Then, select the second table, map_lead.KML, and click “Next”.

In the next window, we have to match the sources. We match county “Name” column of the first table with county “Name” column of the second table. Next, click “Merge”. Then, click “View table”. The merged table is created. Then we click on the “Map of geometry” tab and a heatmap diagram is generated.

Step Four: Style and publish the heat map

We can click on the “Change feature styles” button and manually break the range of population numbers into six buckets, add the map legend as well and finally click on “Done”.

By default, the map can only be viewed by the creator. In order to share the map, we can click the “Share” label on the upper-right corner of the map, select the visibility option of “Public on the web” and click on “save”.

After that, we can click on “Map of geometry” tab and then “Publish”. The first heat map of Ireland’s Population is created.

We repeated the process above and the second heat map of Ireland Population Density is created as well.

Results

The first map shows the Total Population of Ireland’s 26 counties.

The second map shows the Population Density of Ireland’s 26 counties.


Discussion

As we can see from results, Dublin areas rank highly in the most densely populated counties. Almost 1/3 of Ireland’s population lives in or around Dublin. There are a number of reasons for that, including:
– Dublin is the capital of Ireland
– It’s Ireland’s main port.
– It’s Ireland’s main transport focus.
– It’s Ireland’s most important educational, cultural, and commercial centre.
– It’s a zone of attraction for migrants, and foreign direct investment.

In the west of Ireland the population is lower as shown on the first map. The population density is significantly lower. Some reasons for the lower population density in the west of Ireland include:
– Lack of good transport routes
– Distance from large markets like Dublin and UK
– Lack of third level colleges for education
– Lack of job opportunities

References
http://www.mulinblog.com/google-maps-tutorial-part-1-what-fusion-tables-is-and-does/
http://www.cso.ie/en/statistics/population/populationofeachprovincecountyandcity2011/
http://www.independent.ie/editorial/test/map_lead.kml
https://en.wikipedia.org/wiki/List_of_Irish_counties_by_area