DAiR_Workshop2024

Modified

July 15, 2024

Bioinformatics and Data Science Summer Workshops 2024 Emblem INBRE Header

Author: Dr. Hamed Abdollahi  
PI: Dr. Homayoun Valafar  

 

Datasets/Data Sources of Interest

Sample Data sets of workshop


R built-in data sets

R comes with several built-in data sets, which are generally used as demo data for playing with R functions.

To see the list of pre-loaded data:

Loading a built-in R data


Other Data sets

Dataset
Examples of Other Datasets
Type Package Name Author(s) CRAN Objective & GitHub Installatin
Baseball mlbgameday Kris Eberwein Archived Data Collection install.packages("mlbgameday") devtools::install_github("keberwein/mlbgameday")
Basketball ballr Ryan Elmore Archived Current and Historical Basketball Data install.packages("ballr")
devtools::install_github("rtelmore/ballr")
Hockey nhlapi Jozef Hajnala Active NHL API install.packages("nhlapi")
remotes::install_github("jozefhajnala/nhlapi")
Movie No Package IMDb NA Dataset(s) of movie information Download from the Hyperlink
Movie No Package IMDb NA Dataset(s) of movie information Download from the Hyperlink

Import Data

In this workshop we recommend the Tidyverse approach to learning and using R

Below are some of the core tidyverse packages that are loaded with the function: library(tidyverse).

package use package use
dplyr data wrangling forcats categorical data / factors
ggplot2 visualization lubridate dates and times
readr import CSV stringr regular expressions / strings
purrr iteration / functional programing tidyr pivot data
readxl import Excel files haven import SPSS/Stata/SAS
  • read.csv: For reading in comma separated value files (“.csv”).

  • read.delim: For reading in delimited text files (“.txt”).

  • scan: For reading a file, or keyboard input, into a vector.

  • read_excel: For reading in excel spreadsheets (“.xls” or “.xlsx”). From the readxl package.