Manipulating IPEDS data to work in Tableau

Through my Institutional Research internship at Denison University, I work a lot with IPEDS data. IPEDS, or the Integrated Postsecondary Education Data System, is a website that has facts such as retention rates, size, religious affiliation, and revenues, for thousands of public and private colleges. This data is useful and goes back decades, but it is stored in summary form; or wide format rather than long. Example:

Institution.Name Alabama Alaska Arizona Arkansas California Colorado
Abilene Christian University 0 2 5 0 17 16
Adrian College 0 0 0 0 3 0
Adventist University of Health Sciences 1 0 0 0 1 1
Agnes Scott College 4 0 0 2 11 0
Alaska Pacific University 0 10 0 0 2 1

The problem I run into when using this kind of data in Tableau is that I can’t “color” maps or area charts by the location variable because every state/location is stored in a separate column.

Tableau works most smoothly with data that is stored as individual observations (i.e. a row for every student represented in the table above, with columns for ‘StudentID’ ‘College’ and ‘Location’) because it can quickly aggregate the observations and create that summary data on its own. To transform the summary data shown above into something that tableau can deal with, we have to “melt” it.

This will not create individual observations for each student, but it will combine all the states into one column called “Location”, and all the corresponding values into a column called “Value”.

When loading this data frame into Tableau, “Location” becomes the variable and “Value” is your measure.

The only package you will need to install to transform the data is “reshape”.

require(reshape)

Creating the list

To begin, we need to read in the data tables. Data downloaded from IPEDS is stored as a ‘csv’, so that makes importing it easy, but if your data is stored as an Excel workbook, you can either convert it (save as “.csv”) or install the package XLconnect, which can import excel sheets into R.

This code will store each of your data frames as a separate element in a list called “Residences”, and then name each element with the year it reflects.

temp = list.files(pattern="*.csv")
Residences <- list()
for (k in 1:length(temp)){
 Residences[[k]] <- read.csv(temp[k])
}
names(Residences) <- c(1986,1988,seq.int(1992,2014,2))

“list.files” is a function that retrieves the names of any file type you specify. It defaults its search to your working directory, but you can edit that to any folder on your computer.

#temp = list.files(path = "User/You/Your_Path", pattern="*.any_file_type_you_want", full.names = TRUE)

If you do this, make sure to specify that “full.names” equals TRUE, or “read.csv” won’t work when reading the vector elements.

Melting the list

To melt this list, use “melt.list”. This will recursively melt each data frame in the list, and then combine them all into one long data frame.

The argument “id.vars” tells the function which columns you want to use as your identifying variables, or which ones you don’t want going in the “Location” column.

Specifying “level = ‘Year’” tells the function to name our new column “Year”.

RESI <- melt.list(Residences, id.vars = c("UnitID", "Institution.Name"), level = "Year")
UnitID Institution.Name Location Value Year
222178 Abilene Christian University Alabama 1 1986
168528 Adrian College Alabama 0 1986
133872 Adventist University of Health Sciences Alabama 0 1986
138600 Agnes Scott College Alabama 7 1986
102669 Alaska Pacific University Alabama 0 1986
188526 Albany College of Pharmacy and Health Sciences Alabama 0 1986
128498 Albertus Magnus College Alabama 0 1986
168546 Albion College Alabama 0 1986
210571 Albright College Alabama 0 1986
237118 Alderson Broaddus University Alabama 0 1986

Just as a personal preference, I like to order my tables alphabetically.

RESI <- RESI[order(RESI$Institution.Name),]
UnitID Institution.Name Location Value Year
222178 Abilene Christian University Alabama 1 1986
222178 Abilene Christian University Alaska 0 1986
222178 Abilene Christian University Arizona 12 1986
222178 Abilene Christian University Arkansas 1 1986
222178 Abilene Christian University California 18 1986

As you can see, this new table stores the state names under a single column “Location”, but it still tells you how many students live in each state by connecting it with the column “Value. This column ‘Location’ will become the variable in Tableau that you can use to color Area Charts or Maps, while the ‘Value’ column will become the measure variable in Tableau. Tableau’s”Sum” function works correctly here because it groups its operation by school and by Location.

That’s all- if you have any comments or suggestions please let me know, as I am still learning R and Tableau