All Incubator Short Takes

HOPE Tutorial

September 24, 2024

Identifying Student Success Insights Through Data Analysis

Part 2: HOPE Scholarship Diagnostic Tutorial

Introduction

The HOPE scholarship is awarded to Georgia residents who demonstrate academic excellence and offers financial support to students by covering part of their tuition at eligible higher education institutions. However, many students who begin their studies with the HOPE scholarship eventually lose it. Moreover, students who lose the scholarship graduate at much lower rates compared to students who retain the scholarship. This pattern highlights the importance of the HOPE scholarship as an “access” scholarship, providing key financial support that helps students stay enrolled and graduate. Analyzing data to determine (1) how many students lose the scholarship at different credit hour checkpoints, (2) gaps in scholarship retention between different student groups, and (3) differences in graduation rates between HOPE and non-HOPE students can provide valuable insights into this key student success area.

This tutorial will guide you through the creation of multiple data visualizations that provide an overview of outcomes for HOPE scholarship students. An example report featuring these visualizations is available, along with the corresponding example data. While this tutorial focuses on HOPE scholarship data, similar analyses can be applied to a wide range of scholarship programs.

Note

This tutorial assumes that you have some familiarity with R and {tidyverse} packages that will be used to clean, analyze, and visualize the data.

Initial Steps

Before creating the visualizations, load the necessary packages and data.

1. Load packages

Show/Hide Code

library(janitor)
library(dplyr)
library(tidyr)
library(glue)
library(ggplot2)
library(scales)

2. Load data

Show/Hide Code

hope_raw <- read.csv("data/hope_tutorial_data.csv", na.strings = "") %>% clean_names()

head(hope_raw, 10)

      cohort pell hope_start hope_30 hope_60 hope_90 grad
1  Fall 2016    0          1       0       0      NA    0
2  Fall 2015    1          1       1       0       0    1
3  Fall 2015    1          1       0       1       1    1
4  Fall 2016    0          0       0       0       0    0
5  Fall 2014    0          1       1       1       1    1
6  Fall 2015    1          1       1       1       1    1
7  Fall 2017    0          0       0       0      NA    0
8  Fall 2018    1          0       0       0       1    1
9  Fall 2014    1          0       0      NA      NA    0
10 Fall 2013    0          1       1       1       1    1

The data set contains 10,000 first-time freshmen who enrolled in Fall terms from 2013 to 2018 (6-year graduation years of 2019 to 2024). Each row of the data represents one student and includes a value for the following variables:

cohort: The fall term the student first enrolled at the institution.
pell: A binary value indicating if the student was eligible to receive a Pell Award at the time of enrollment (1 = yes; 0 = no).
hope_start: A binary value indicating if the student had the HOPE scholarship at the time of enrollment (1 = yes; 0 = no).
hope_30: A binary value indicating if the student had the HOPE scholarship at the 30 credit hour checkpoint (1 = yes; 0 = no, NA = not enrolled).
hope_60: A binary value indicating if the student had the HOPE scholarship at the 60 credit hour checkpoint (1 = yes; 0 = no, NA = not enrolled).
hope_90: A binary value indicating if the student had the HOPE scholarship at the 90 credit hour checkpoint (1 = yes; 0 = no, NA = not enrolled).
grad: A binary value indicating if a student graduated within 6 years (1 = yes; 0 = no).

Note

HOPE scholarship recipients must meet specific academic requirements, such as maintaining a 3.0 GPA, to remain eligible for the scholarship. Eligibility is assessed at three checkpoints: when the student has attempted 30, 60, and 90 credit hours.

Visualizations

We will create four figures that help identify patterns in the data and effectively communicate the insights to leadership.

The first figure is a donut plot. This type of figure is used to display the relative proportions of different categories. In our case, we will use it to show the number of students who enroll with the HOPE scholarship compared to those who enroll without it.
The second figure is a bar chart. We will use it to compare the number of students with the HOPE scholarship at different checkpoints to determine how many students maintain the scholarship over time.
The third figure is a line chart, which is useful for identifying trends over time. We will use it to examine potential equity gaps in HOPE scholarship retention across the credit hour checkpoints.
The fourth figure is a dumbbell plot, which will illustrate the difference in graduation rates between HOPE and non-HOPE students for each cohort.

Creating the Donut Plot

Clean the data

The first figure will show the proportion of students who enroll with the HOPE scholarship. To calculate those numbers, use the logic in the following code chunk. First, group the data by ‘hope_start’ and use tally() to count the number of students in each ‘hope_start’ group. Next, inside mutate(), format the counts with commas, calculate the percentage of each group relative to the total count, create a text label for each group to include the formatted count and percentage, and calculate the vertical position for each label location.

Show/Hide Code

hope_donut <- hope_raw %>% 
  group_by(hope_start) %>% 
  tally() %>% 
  mutate(n_pretty = prettyNum(n, big.mark = ",", scientific = FALSE), 
         percent = round(n/sum(n)*100, 0),
         label = ifelse(hope_start == "1", glue("HOPE\n{n_pretty}\n({percent}%)"), glue("Non-HOPE\n{n_pretty}\n({percent}%)")), 
         label_y_location = ifelse(hope_start == '0', sum(n)-(n/2), n/2))

Render the donut plot

Next, follow the steps outlined below to create the plot. An important component is coord_polar(), where setting theta = "y" will apply polar coordinates to the y-axis and transform the standard bar chart into a donut plot.

Show/Hide Code

hope_donut %>% 
1  ggplot(aes(x = 1.5, y = n, fill = as.factor(hope_start))) +
2  theme_void() +
  theme(plot.title = element_text(size = 13, color = "black", face = "bold", 
                                  hjust = 0.5, margin = margin(t = 10, b = 5))) +  
3  labs(title = "HOPE Status Distribution in First Fall Term\nAmong First-Time Freshmen, 2013-2018 Entering Cohorts") +
4  geom_bar(stat = "identity", color = "white", width = 0.8) +
5  coord_polar(theta = "y") +
6  xlim(c(0, 2.5)) +
7  scale_fill_manual(values = c("#58595B", "#0554A3")) +
  guides(fill = "none") +
8  geom_text(
    aes(x = 2.5, y = label_y_location, label = label),  
    fontface = "bold",
    color = c("#58595B", "#0554A3"),
    size = 4
  )

1: Initialize the ggplot object using the ‘hope_donut’ data and set the mapping aesthetics.
2: Use theme_void() to remove most of the non-data figure pieces for a clean look, and then customize the plot title with theme().
3: Add the title text.
4: Add the bar with white borders and a specific width. Uee stat = "identity" to make the bar height proportional to the value of the y aesthetic (in this case the value of ‘n’).
5: Convert the bar chart into a donut plot by applying polar coordinates with the y-axis as the angular axis.
6: Set the x-axis limits.
7: Manually set the fill colors for the two ‘hope_start’ categories.
8: Add text labels to the plot using the y location we calculated in the previous step and with colors that match the fill colors of the bar.

Creating the Bar Chart

Clean the data

The next figure will show the number of students with the HOPE scholarship at different checkpoints. Inside summarise(), use across() to sum the occurrences where a column value equals 1 for each column that starts with ‘hope_’, and then transform the data frame to long format. The resulting data frame will have one column with four time points and one column with counts indicating the number of students with the HOPE scholarship at each time point.

Show/Hide Code

hope_bar <- hope_raw %>% 
  summarise(across(starts_with("hope_"), ~ sum(. == 1, na.rm = TRUE))) %>%
  pivot_longer(everything(), names_to = "time", values_to = "count")

Next, inside mutate(), calculate the number and the percentage of students who have lost the scholarship at each checkpoint (relative to the starting count), and create a new factor column with ordered labels for each time point.

Show/Hide Code

hope_bar <- hope_bar %>% 
  mutate(running_loss = count[1] - count,
         running_loss_percent = round(running_loss/count[1]*100),
         checkpoint = factor(c("Start with\n HOPE", "HOPE at\n30 Credits", "HOPE at\n60 Credits", "HOPE at\n90 Credits"), 
                              levels = c("Start with\n HOPE", "HOPE at\n30 Credits", "HOPE at\n60 Credits", "HOPE at\n90 Credits")))

Render the bar chart

Follow the steps below to create the plot.

Show/Hide Code

hope_bar %>%
1  ggplot(mapping = aes(x = checkpoint, y = count, label = prettyNum(count, big.mark = ",", scientific = FALSE))) +
2  theme_void() +
  theme(
    plot.title = element_text(size = 13, color = "black",  face = "bold", hjust = 0.5, margin = margin(t = 10, b = 10)),
    axis.text.x = element_text(size = 12, face = "bold", color = "black", margin = margin(t = -5))
  ) +
3  labs(title = "HOPE Counts at Checkpoints Among First-Time Freshmen,\n 2013-2018 Entering Cohorts") +
4  geom_bar(
    fill = "#0554A3",
    stat = "identity",
    width = 0.8
  ) +
5  geom_text(
    color = "white",
    vjust = 2,
    size = 5,
    fontface = "bold"
  ) +
6  geom_text(
    data = hope_bar %>% filter(time != "hope_start"),
    aes(x = checkpoint, y = count, label =  glue("(-{running_loss_percent}%)")),
    color = "white",
    vjust = 4.5,
    size = 4,
    fontface = "bold"
  )

1: Initialize the ggplot object using the ‘hope_bar’ data and set the mapping aesthetics. In the donut plot example above, the count labels were formatted with commas inside the data frame. Here, we’ll format the labels directly in the aesthetics by setting the label equal to prettyNum(count, big.mark = ",", scientific = FALSE).
2: Apply a clean theme using theme_void() and then customize the plot’s title and x-axis text using theme().
3: Add the plot title text.
4: Add bars with a specific fill color and width.
5: Add text labels that show the counts. Use vjust = 2 to position the labels inside the bars.
6: Add text labels that show the percent loss. We don’t want to display a percent loss at the first time point, so filter out the ‘hope_start’ row. Use glue() to concatenate the label components (parentheses, negative sign, percent, and percent symbol).

Creating the Line Chart

The third figure will show HOPE scholarship retention across the three credit hour checkpoints among two student groups: Pell-eligible and non-Pell eligible students. If additional data are available at your institution, comparisons can also be made for other groups, such as first-generation status, race, ethnicity, gender, and so forth.

Clean the data

First, group the data by ‘pell’ and then use across() with sum() to count the number of students with HOPE at each time point (i.e., each column that starts with ‘hope_’). Since percentages are more effective for comparing the two groups, the next step is to calculate the percentage of students with HOPE at each time point. Use across() inside of mutate() to do this, and create new column names that start with ‘percent_’ using the ‘names’ argument. Because we initially grouped by ‘pell’ the resulting data frame will contain both the counts and the percentages for Pell and non-Pell students at each of the four time points.

Show/Hide Code

hope_by_pell <- hope_raw %>% 
  group_by(pell) %>% 
  summarise(across(starts_with("hope_"), 
                   ~ sum(. == 1, na.rm = TRUE), .names = "{col}")) %>% 
  mutate(across(starts_with("hope_"), 
                ~ round(. / hope_start * 100), .names = "percent_{col}"))

Second, select the ‘pell’ column along with all columns that start with ‘percent’ from the ‘hope_by_pell’ data frame. Then, use pivot_longer() to transform the data frame from a wide to a long format.

Show/Hide Code

hope_by_pell_long <- hope_by_pell %>% 
  select(pell, starts_with("percent")) %>% 
  pivot_longer(cols = starts_with("percent"), 
               names_to = "checkpoint", 
               values_to = "percent")

Third, convert the ‘checkpoint’ and ‘pell’ columns to factors with specific levels and labels, which will be used in the figure.

Show/Hide Code

hope_by_pell_long$checkpoint <- factor(hope_by_pell_long$checkpoint, 
                                       levels = c("percent_hope_start", "percent_hope_30", "percent_hope_60", "percent_hope_90"),
                                       labels = c("All HOPE\nAwardees at\nFirst Term", "HOPE at\n30 Credits", "HOPE at\n60 Credits", "HOPE at\n90 Credits"))

hope_by_pell_long$pell <- factor(hope_by_pell_long$pell, 
                                 levels = c(1, 0),
                                 labels = c("Pell", "non-Pell"))

Lastly, we’ll create a custom theme to use in the plot. This approach can be helpful if a figure type will be used multiple times, as it prevents the need to repeat the theme code each time the figure is rendered. For example, if data for other student groups are available to examine additionally equity gaps in HOPE retention rates, the same theme can be reused across all plots.

Show/Hide Code

custom_plot_theme <- function() {
  theme_classic() + 
    theme(
      plot.title = element_text(size = 14, face = "bold", color = "black", hjust = 0.5),
      axis.title.x = element_blank(),
      axis.title.y = element_text(size = 12, face = "bold", color = "black", margin = margin(r = 15)),
      axis.text.x = element_text(size = 12, color = "black", margin = margin(t = 5)),
      axis.text.y = element_text(size = 12, color = "black"),
      axis.line = element_line(linewidth = 1),
      axis.ticks.x = element_blank(),
      axis.ticks.y = element_line(linewidth = 1),
      legend.title = element_blank(),
      legend.position = "top",
      legend.text = element_text(size = 10)
    )
}

Render the line plot

Now we are ready to create the plot using the steps below.

Show/Hide Code

hope_by_pell_long %>% 
1  ggplot(aes(x = checkpoint, y = percent, label = percent, group = pell, fill = pell, color = pell)) +
2  custom_plot_theme() +
3  labs(
    title = "Percentage of Pell and non-Pell Students who Retain HOPE,\n2013-2018 Entering Cohorts", 
    y = "Percent of Students"
  ) +
4  geom_line(linewidth = 1) +
5  geom_label(
    color = "white",
    size = 5
  ) +
6  scale_y_continuous(
    limits = c(40, 102),
    breaks = seq(40, 100, 20),
    labels = function(x) paste0(x, "%")
  ) +  
7  scale_fill_manual(values = c("#0554A3", "#58595B")) +
8  scale_color_manual(values = c("#0554A3", "#58595B")) +
9  guides(fill = "none", color = guide_legend(override.aes = list(linewidth = 3)))

1: Initialize the ggplot object using the ‘hope_by_pell_long’ data and set the mapping aesthetics. Be sure to set the group, fill, and color arguments all equal to ‘pell’.
2: Apply the custom theme function that was defined in the last section.
3: Add the plot title and y-axis text.
4: Add lines to the plot with a specific width.
5: Add text labels that show the percentages.
6: Customize the y-axis scale. Because the label box at the first time point has a y-value of 100, set the max limit to 102 to make sure the label box doesn’t get cut off. Additionally, breaks = seq(40, 100, 20) defines where tick marks will appear, and labels = function(x) paste0(x, "%") formats the labels as percentages.
7: Manually sets the fill colors for the pell groups.
8: Manually sets the line colors for the pell groups.
9: Customizes the legend. Since both color and fill set to the same colors, we don’t don’t need both in the legend, so use fill = "none" to remove the fill legend. Then use color = guide_legend(override.aes = list(linewidth = 3)) to increase the line width in the legend for better visibility.

Creating the Dumbbell Chart

The final figure will show the difference in graduation rates between HOPE and non-HOPE students for each cohort.

Clean the data

First, filter the data to include only students who are enrolled at the 60 credit hour checkpoint (column ‘hope_60’). This is done by excluding any rows where ‘hope_60’ is NA, because these represent students who are no longer enrolled. Next, group the data by ‘cohort’ and ‘hope_60’ and calculate the total number of students enrolled at this checkpoint, the number of these students who graduate, and the percentage of students who graduate (i.e., the graduation rate).

Show/Hide Code

hope_grad_60 <- hope_raw %>% 
  filter(!is.na(hope_60)) %>% 
  group_by(cohort, hope_60) %>% 
  summarise(hope_60_count = n(),
            grad_count = sum(grad),
            grad_rate = round(grad_count/hope_60_count*100))

Render the dumbell plot

Follow the steps below to create the final plot.

Show/Hide Code

hope_grad_60 %>%
1  ggplot(mapping = aes(x = cohort, y = grad_rate, group = cohort, color = factor(hope_60), label = grad_rate)) +
2  custom_plot_theme() +
  theme(
    axis.title.x = element_text(size = 12, face = "bold", color = "black", margin = margin(t = 10)),
  ) +
3  labs(
    title = "Six-Year Graduation Rates by HOPE Status at 60 Credit Hours", 
    x = "Graduation Year",
    y = "Graduation Rate"
  ) +
4  geom_line(
    color = "gray70",
    linewidth = 1
  ) +
5  geom_point(size = 9) +
6  geom_text(
    color = "white",
    size = 4,
  ) +
7  scale_color_manual(
    values = c("#58595B", "#0554A3"),
    labels = c("Non-HOPE", "HOPE")
  ) +
8  scale_x_discrete(labels = seq(2019, 2024, by = 1)) +
9  scale_y_continuous(
    limits = c(20, 100),
    breaks = seq(20, 100, 20),
    labels = function(x) paste0(x, "%")
  ) +
10  guides(color = guide_legend(reverse = TRUE, override.aes = list(size = 4)))

1: Initialize the ggplot object using the ‘hope_grad_60’ data and set the mapping aesthetics. Because we are creating a dumbbell for each cohort, setting group = 'cohort' will ensure the lines connect the graduation rate points for each cohort.
2: Apply the custom theme function and adjust the formatting of the X-axis title.
3: Add text for the plot title and axis titles.
4: Add lines with a light gray color and a specific width.
5: Add points to represent the graduation rates.
6: Add text labels to show the graduation rates.
7: Manually set the colors for the points.
8: Update the x-axis text labels to reflect the 6-year graduation year.
9: Customize the y-axis scale. Use breaks = seq(20, 100, 20) to display tick marks every 20 percent from 20 to 100, and use labels = function(x) paste0(x, "%") to format the labels as percentages.
10: Reverse the legend order to list the HOPE group first, and increase the size of the legend points for better visibility.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

All Incubator Short Takes

Summer Success Academy

Chatbot Launch

HOPE Tutorial

HOPE Tutorial

Identifying Student Success Insights Through Data Analysis

Introduction

Initial Steps

1. Load packages

2. Load data

Visualizations

Creating the Donut Plot

Clean the data

Render the donut plot

Creating the Bar Chart

Clean the data

Render the bar chart

Creating the Line Chart

Clean the data

Render the line plot

Creating the Dumbbell Chart

Clean the data

Render the dumbell plot

All Incubator Short Takes

Summer Success Academy

Chatbot Launch

HOPE Tutorial

HOPE Tutorial

Introduction

Initial Steps

1. Load packages

2. Load data

Visualizations

Creating the Donut Plot

Clean the data

Render the donut plot

Creating the Bar Chart

Clean the data

Render the bar chart

Creating the Line Chart

Clean the data

Render the line plot

Creating the Dumbbell Chart

Clean the data

Render the dumbell plot

Manage Cookies

Additional Fields