DFW Tutorial
Identifying Student Success Insights Through Data Analysis
Part 3: High-Enrollment, High-DFW Diagnostic Tutorial
The percentage of students earning grades of D, F, or withdrawing in a course (the DFW rate) is a key performance indicator of student success. In particular, DFW rates in high-enrollment courses can highlight areas where a large proportion of students are struggling. This report identifies (1) which courses have the highest number of DFW grades, (2) whether D and F grades are more frequent than withdrawals, (3) how DFW rates vary across course sections, and (4) the impact of instructional modality on DFW outcomes.
This tutorial will demonstrate how to create key performance indicators and data visualizations that offer insights into DFW rates in critical, high-enrollment courses. An example report is available, as are example data that can be used to follow along.
Visualization Overview
We will create five figures that help identify patterns in the data and effectively communicate the insights to leadership.
The first figure is a bar chart that displays the total number of DFW grades within each course from 2019 to 2024.
The second figure is a stacked bar chart, a figure type that is generally used to display the relationship between two categorical variables. In this case, one variable is the course and the other is the percentage of students in each grade category (DF or W). This chart helps determine whether DF or W grades contribute more to the overall number of DFW grades.
The third figure is a scatter plot with quadrants. A scatter plot displays the relationship between two continuous variables (one variable on the x-axis and one variable on the y-axis). Quadrants are used to help identify which of the courses have a relatively high average DFW rate and high variability (measured using standard deviation) across sections.
The fourth figure is a dumbbell plot, which is used to show the difference in DFW rates between face-to-face and online/hybrid sections of the same course.
The fifth figure is a lollipop plot and displays the odds ratios for receiving a DFW grade between face-to-face and online/hybrid instruction types. The odds ratio indicates how much more likely a student in an online/hybrid section is to receive a DFW grade compared to a student in a face-to-face section.
Initial Steps
Load the necessary packages and the data.
1. Load the packages
2. Load the data
Show/Hide Code
year course crn total_enroll dfw df w modality
1 2019 Course ADAD 81910 21 0 0 0 face
2 2019 Course AGAB 56634 12 4 4 0 on_hyb
3 2019 Course BCEJ 90732 11 1 1 0 face
4 2019 Course ADHF 95693 21 4 2 2 face
5 2019 Course AGGB 95823 40 1 1 0 face
6 2019 Course BEHA 88566 45 16 13 3 face
7 2019 Course EBC 22924 304 58 33 25 on_hyb
8 2019 Course BBAH 90819 26 5 4 1 face
9 2019 Course GAE 52466 102 2 2 0 on_hyb
10 2019 Course AFED 16499 21 1 1 0 on_hyb
The data set contains 10,000 (hypothetical) course sections from 2019 to 2024. Each row of the data represents one section and includes a value for the following variables:
- year: The year the course/section was offered.
- course: The course name.
- crn: The course registration number (i.e., the section number).
- total_enroll: The number of students that enrolled in, completed, and received a grade for the course.
- dfw: The number of students that received a D, F or W grade in the section
- df: The number of students that received a D or F grade in the section
- w: The number of students that received a W grade in the section
- modality: A categorical variable indicating the instruction modality for the section. (face = face-to-face; on_hyb = online or hybrid)
Identifying the High-Enrollment, High-DFW Courses
1. Analyze the data
The high-enrollment, high-DFW courses need to be identified before making any visualizations. The metric that will be used to identify the courses is the total number of DFW grades. First, use group_by()
to aggregate by course and summarise()
to calculate the DFW counts and a few other key metrics that will be used later in the tutorial. Next, use arrange()
to order the courses from the highest to the lowest number of DFW grades. Last, use slice()
to select the top 20 courses.
2. Render the bar chart
Next, display the top 20 courses along with the DFW counts using a bar chart. Use the ggplot()
function along with various geom layers to create the chart, as outlined in the steps below the code chunk.
Show/Hide Code
dfw_by_course_top20 %>%
1 ggplot(mapping = aes(x = dfw_count, y = reorder(course, dfw_count), label = prettyNum(dfw_count, big.mark = ","))) +
theme_classic() +
2 theme(
plot.title = element_markdown(size = 15, color = "black", face = "bold", hjust = 0.5, margin = margin(b = 15)),
axis.title = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_text(size = 12, color = "black", margin = margin(r = -15)),
axis.line = element_blank(),
axis.ticks = element_blank(),
legend.position = "none"
) +
3 labs(title = "Courses With the Most DFW Grades Since 2019") +
4 geom_bar(
stat = "identity",
fill = "#2B3555"
) +
5 geom_text(
color = "white",
size = 5,
nudge_x = -65
)
- 1
-
Initialize the ggplot object using the dfw_by_course_top20 data and set the mapping aesthetics. Use
y = reorder(course, dfw_count)
to order the courses from top to bottom based on the DFW counts, and set the label argument toprettyNum(dfw_count, big.mark = ",")
to display the DFW counts. - 2
- Customize the theme for a clean look and feel.
- 3
-
Add a descriptive title using the
title
argument inside thelabs()
function. - 4
- Add bars with a specific fill color.
- 5
-
Use
geom_text()
to add the DFW count label to each bar. Usenudge_x = -65
to move the labels inside the bar.
Creating a Stacked Bar Chart
In this section, we will create a stacked bar chart that shows the percentage of DF grades and W grades relative to the total number of DFW grades in each course. This will help determine whether DF grades or W grades contribute more to the overall number of DFW grades.
1. Prepare the data for the chart
First, use select()
to choose only the course, df_count, and w_count columns from the dfw_by_course_top20 data frame. Then use pivot_longer()
to transform the data from wide to long form.
Next, group by course and use mutate()
to create two new columns: one column for the percentage of DF grades or W grades and another for the text label that will be used in the figure.
2. Render the stacked bar chart
Follow the steps below to create the chart. Note that a narrative title is used with color highlighting that matches the colors in the chart.
Show/Hide Code
dfw_by_course_top20_long %>%
1 ggplot(mapping = aes(x = percent, y = reorder(course, count), fill = fct_rev(count_type), label = text_label)) +
theme_classic() +
2 theme(
plot.title = element_markdown(size = 15, color = "black", face = "bold", margin = margin(b = 15)),
axis.title = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_text(size = 12, color = "black", margin = margin(r = -15)),
axis.line = element_blank(),
axis.ticks = element_blank(),
legend.position = "top"
) +
3 labs(
title = "High DFWs are driven by a high percentage of
<span style = 'color:#0554A3;'>DFs</span> rather than
<span style = 'color: #26A5CA;'>Ws</span>",
) +
4 geom_bar(
position = "stack",
stat = "identity"
) +
5 geom_text(
position = position_stack(vjust = 0.8),
color = "white",
size = 5,
) +
6 scale_fill_manual(
values = c("#26A5CA", "#0554A3"),
labels = c("w_count" = "W", "df_count" = "DF")
) +
7 guides(fill = guide_legend(title = NULL, reverse = TRUE))
- 1
-
Define the data for the plot and set the mapping. Set the
fill
argument tofct_rev(count_type)
so that the DF and W bars have different fill colors, and reverse the plotting order so that DFs are on the left and Ws are on the right. - 2
-
Customize the theme. Set
plot.title
toelement_markdown()
to enable markdown formatting in the title. - 3
-
Customize the title by highlighting specific text with a color. Use the format
<span style = 'color:HEXCOLOR;'>TEXT</span>
replacingHEXCOLOR
with the desired color’s hex code andTEXT
with the content to highlight. - 4
-
Add the
geom_bar
layer. Set theposition
argument tostack
to created a stacked bar plot. - 5
-
Add text labels and set
vjust
to 0.8 to position the text toward the right side of each bar. - 6
- Specify the fill colors and adjust the legend labels for better readability.
- 7
-
Remove the legend title using
title = NULL
and reverse the legend order to match the plot, with DF on the left and W on the right.
Creating a Scatter Plot
The courses with the most DFW grades consist of multiple sections, each of which have an individual DFW rate. To explore the variability of DFW rates across sections within a course, we will calculate the average DFW rate and the standard deviation for each course and display these metrics in a scatter plot. Standard deviation measures how spread out, or variable, the numbers in a group are from the average.
1. Prepare the data for the plot
Begin with the dfw_raw data frame which includes total enrollment and DFW counts for every section. Filter this data frame to include only the courses in the dfw_by_course_top20 data frame. Next, calculate the DFW rate for each section. Finally, group by course and calculate both the average DFW rate and the standard deviation.
2. Prepare the scales for the plot
The plot should have quadrants of equal sizes, so we’ll center the x and y axes around the averages of the x and y variables. To accomplish this, it will be helpful to create a few new variables that will be used within the scaling and annotation functions in the next step.
Show/Hide Code
# Assigning the X axis mean, max, and min to new variables.
# Will use these in the ifelse statement below and in a ggplot layer to set
# the X axis scale limits
x_mean <- mean(dfw_top20_rates$dfw_rate_sd)
x_max <- max(dfw_top20_rates$dfw_rate_sd) + 1.5
x_min <- min(dfw_top20_rates$dfw_rate_sd) - 1.5
# Use ifelse statement to assign the range.
# This lets the entire range of values show on the X axis and also
# keeps the mean X value as the center of the X axis
x_range <- ifelse((x_max-x_mean) > (x_mean-x_min), (x_max-x_mean), (x_mean-x_min))
# Assigning the Y axis mean, max, and min to new variables.
# Will use these in the ifelse statement below and in a ggplot layer to set
# the Y axis scale limits
y_mean <- mean(dfw_top20_rates$dfw_rate_avg)
y_max <- max(dfw_top20_rates$dfw_rate_avg)
y_min <- min(dfw_top20_rates$dfw_rate_avg)
# Use ifelse statement to assign the range.
# This lets the entire range of values show on the Y axis and also
# keeps the mean Y value as the center of the Y axis
y_range <- ifelse((y_max-y_mean) > (y_mean-y_min), (y_max-y_mean), (y_mean-y_min))
3. Render the quadrant plot
Finally, create the plot using the steps outlined below the code chunk.
Show/Hide Code
dfw_top20_rates %>%
1 ggplot(mapping = aes(x = dfw_rate_sd, y = dfw_rate_avg)) +
theme_classic() +
theme(
2 plot.title = element_text(size = 14, color = "black", face = "bold", hjust = 0.5, margin = margin(b = 15)),
plot.subtitle = element_text(size = 12, color = "black", hjust = 0.5, margin = margin(b = 15)),
axis.title.x = element_text(size = 12, color = "black", face = "bold", margin = margin(t = 10)),
axis.title.y = element_text(size = 12, color = "black", face = "bold", margin = margin(r = 10)),
axis.text = element_text(size = 11, color = "black"),
axis.line = element_line(linewidth = 1),
axis.ticks = element_line(linewidth = 1),
) +
3 labs(
title = "Average DFW Rate and Standard Deviation, 2019-2024",
subtitle = "Label: Course (Average DFW Rate; Standard Deviation)",
x = "Variation (Standard Deviation Between Sections; %)",
y = "Average DFW Rate (%)",
) +
4 geom_hline(
aes(yintercept = mean(dfw_rate_avg)),
color = "#58595B",
linetype = "dashed"
) +
annotate(
"text",
x = 7,
y = y_mean,
label = glue("Avg: {round(y_mean, 1)}%"),
vjust = -0.5,
hjust = 0.6
) +
5 geom_vline(
aes(xintercept = mean(dfw_rate_sd)),
color = "#58595B",
linetype = "dashed"
) +
annotate(
"text",
x = x_mean,
y = 12,
label = glue("Avg: {round(x_mean, 1)}%"),
angle = 90,
vjust = -0.5,
hjust = 0.5
) +
6 geom_point(
color = "#0554A3",
size = 6
) +
7 geom_label_repel(
aes(label = glue("{course} ({round(dfw_rate_avg, 1)}%; {round(dfw_rate_sd, 1)}%)")),
max.overlaps = 100,
min.segment.length = 0,
size = 2.5,
fontface = "bold"
) +
8 scale_x_continuous(
limits = c((x_mean - x_range), (x_mean + x_range)),
labels = percent_format(scale = 1)
) +
scale_y_continuous(
limits = c((y_mean - y_range), (y_mean + y_range)),
labels = percent_format(scale = 1)
)
- 1
- Define the data used for the plot and set the mapping aesthetics: standard deviation on the x-axis and DFW rate on the y-axis.
- 2
- Customize the theme.
- 3
- Add the title, subtitle, and axis labels.
- 4
- Add a horizontal dashed line with a y-axis intercept equal to the average DFW rate and include a text annotation of the average.
- 5
- Add a vertical dashed line with an x-axis intercept equal to the average standard deviation and include a text annotation of the average.
- 6
- Add the points, with a specific color and size.
- 7
- Add labels to the points that include the course, the DFW rate, and the standard deviation.
- 8
- Adjust the x and y scales.
Creating a Dumbbell Plot
The next visualization is a dumbbell plot, which will show the difference in DFW rates between face-to-face and online/hybrid instruction modality within the same course.
1. Prepare the data
First, filter the dfw_raw data frame to include only the courses in the dfw_by_course_top20 data frame and calculate the DFW rate for each section. Next, group the data by course and modality, and calculate the total enrollment, DFW count, and average DFW rate for each group. The resulting data frame provides a summary of the average DFW rate for each instruction modality for each course.
We are actually going to make two dumbbell plots and then combine them into a single graphic. One plot will display the courses where face-to-face instruction has a lower DFW rate compared to online/hybrid instruction, while the other will display courses where face-to-face instruction has a higher DFW rate. To achieve this, group by course and then use case_when()
inside of mutate()
to assign each course to one of the two categories. Additionally, we will create another variable, abc_count, which represents the total number of A, B, and C grades. This will be used in the next section for generating the lollipop plot.
Show/Hide Code
dfw_by_modality_top20 <- dfw_by_modality_top20 %>%
group_by(course) %>%
mutate(lower_dfw = case_when(
dfw_rate[modality == "face"] < dfw_rate[modality == "on_hyb"] ~ "face",
dfw_rate[modality == "face"] > dfw_rate[modality == "on_hyb"] ~ "on_hyb",
TRUE ~ "equal"),
abc_count = total_enroll - dfw_count) %>%
ungroup()
Next, create two separate data frames that will be used to create the two dumbbell plots. The first data frame, dfw_by_modality_top20_face, will include courses where face-to-face instruction has a lower or equal DFW rate compared to online/hybrid instruction. The second data frame, dfw_by_modality_top20_on_hyb, will contain courses where online/hybrid instruction has a lower DFW rate than face-to-face instruction.
2. Prepare the scale and titles
Before creating the plots, define a variable to ensure the x-axis scale is consistent across both plots. This ensures that visual comparisons between the two figures are accurate. Additionally, create narrative titles for each figure that include color highlights that match the plot’s colors.
Show/Hide Code
# Set the X scale max
x_scale_max <- ifelse(max(dfw_by_modality_top20_face$dfw_rate) > max(dfw_by_modality_top20_on_hyb$dfw_rate),
max(dfw_by_modality_top20_face$dfw_rate) + 5,
max(dfw_by_modality_top20_on_hyb$dfw_rate) + 5)
# Create figure title text
face_title <- str_c("Courses with *lower* DFW Rates for <span style = 'color:", "#0554A3", ";'>Face-to-Face</span> instruction
compared <br> with <span style = 'color:", "#26A5CA", ";'>Online/Hybrid</span> instruction")
on_hyb_title <- str_c("Courses with *higher* DFW Rates for <span style = 'color:", "#0554A3", ";'>Face-to-Face</span> instruction
compared <br> with <span style = 'color:", "#26A5CA", ";'>Online/Hybrid</span> instruction")
3. Create each plot
The code below creates each plot and assigns each to a ggplot object. Since the only differences between the two plots are the data and the titles, the detailed steps are provided only for the face-to-face figure.
Show/Hide Code
face_to_face_figure <- dfw_by_modality_top20_face %>%
1 ggplot(mapping = aes(x = dfw_rate, y = reorder(course, dfw_rate), color = modality, group = course)) +
theme_classic() +
2 theme(
plot.title = element_markdown(size = 15, color = "black", face = "bold", margin = margin(b = 15)),
axis.title.x = element_text(size = 12, color = "black", face = "bold", margin = margin(t = 10)),
axis.title.y = element_blank(),
axis.text = element_text(size = 11, color = "black"),
axis.line = element_line(linewidth = 1),
axis.ticks = element_line(linewidth = 1),
) +
3 labs(
title = face_title,
x = "DFW Rate (%)"
) +
4 geom_line(
size = 2,
color = "gray"
) +
5 geom_point(size = 7) +
6 scale_color_manual(values = c("#0554A3", "#26A5CA")) +
7 scale_x_continuous(
limits = c(0, x_scale_max),
labels = percent_format(scale = 1)
) +
8 guides(color = "none")
online_hybrid_figure <- dfw_by_modality_top20_on_hyb %>%
ggplot(mapping = aes(x = dfw_rate, y = reorder(course, dfw_rate), color = modality, group = course)) +
theme_classic() +
theme(
plot.title = element_markdown(size = 15, color = "black", face = "bold", margin = margin(t = 20, b = 15)),
axis.title.x = element_text(size = 12, color = "black", face = "bold", margin = margin(t = 10)),
axis.title.y = element_blank(),
axis.text = element_text(size = 11, color = "black"),
axis.line = element_line(linewidth = 1),
axis.ticks = element_line(linewidth = 1),
) +
labs(
x = "DFW Rate (%)",
title = on_hyb_title
) +
geom_line(
size = 2,
color = "gray"
) +
geom_point(size = 7) +
scale_color_manual(values = c("#0554A3", "#26A5CA")) +
scale_x_continuous(
limits = c(0, x_scale_max),
labels = percent_format(scale = 1)
) +
guides(color = "none")
- 1
-
Define the data and the mapping aesthetics for the plot. Use
y = reorder(course, dfw_rate)
to order the courses from highest to lowest DFW rate,color = modality
to give distinct colors to each modality, andgroup = course
to connect the DFW rate points for each course with a line. - 2
- Customize the theme.
- 3
- Set the plot title and label the x-axis.
- 4
-
Add the
geom_line
layer with a specific size and color. - 5
-
Add the
geom_point
layer. - 6
- Set the colors for the points.
- 7
- Customize the x-axis scale using the predefined variable.
- 8
- Remove the legend to reduce clutter.
4. Render the dumbbell plot
Finally, we’ll render the plots by adding them together and using plot_layout()
from the {patchwork} package. Set ncol
to 1 and nrow
to 2 to arrange the plots in one column with two rows. Additionally, set the width to 4 for each plot. The height of the plots will be scaled proportionately based on the number of rows (i.e., number of courses) in each data frame, ensuring a balanced display for both figures.
Creating a Lollipop Plot
In this section, we’ll assess whether there is a difference in the likelihood of receiving a DFW grade based on instruction modality. This is accomplished by calculating the odds ratios and p-values that compare face-to-face versus online/hybrid instruction modalities. We’ll also create a lollipop plot to display the results.
1. Analyze the data
First, initialize the odds_df data frame, which will store the odds ratios and p-values for each course. Then initialize another data frame, temp_df, that is used to store counts of ABC and DFW grades for each instruction modality. Additionally, initialize a variable, i, to access the correct row in odds_df
Show/Hide Code
# Initialize a data frame to store the odds ratios and p-values
odds_df <- data.frame(course = unique(dfw_by_modality_top20$course),
odds_ratio = NA,
p_value = NA)
# Initialize a data frame to store the frequencies, which will be added in the loop
temp_df <- data.frame(modality = as.factor(c("face", "face", "on_hyb", "on_hyb")),
result = as.factor(c("abc", "dfw", "abc", "dfw")),
freq = NA)
# Initialize i to start at 1 and use in the loop to access the row number of the odds_df data frame
i = 1
Use a for
loop to iterate through each of the top 20 courses and calculate the odds ratios and p-values. For each iteration, filter dfw_by_modality_top20 to include data only for the current course. Once filtered, assign the grade frequencies for both modalities to the temp_df data frame. Next, fit the logistic regression with result (ABC v. DFW) as the dependent variable and modality (face-to-face v. online/hybrid) as the independent variable, with weights
set to the freq variable. After running the model, extract the odds ratio and p-value from the summary and store them in the odds_df data frame. Note that the odds ratio is converted from log form by taking the exponent. Finally, increment the variable i by 1 to move to the next row of odds_df
Show/Hide Code
for (name in odds_df$course){
# Create temp data frame with only one course
filtered_df <- filter(dfw_by_modality_top20, course == name)
# Add frequencies from the temp df to the logit table
temp_df$freq <- c(filtered_df[[1,7]], filtered_df[[1,4]], filtered_df[[2,7]], filtered_df[[2,4]])
# Run the glm
temp_glm <- glm(result ~ modality, weights = freq, data = temp_df, family = binomial(logit))
# Assign the summary of the glm to a variable that can be used in the next step to extract coefficients
temp_summary <- summary(temp_glm)
# Extract coefficients from the summary, covert from log form by taking the exponent,
# round them to 2 decimals, and add to the odds_df data frame
odds_df[i, 2] <- round(exp(temp_summary$coefficients[2, 1]), 2) # odds ratio
odds_df[i, 3] <- round(temp_summary$coefficients[2, 4], 3) # p-value
i = i + 1
}
2. Prepare the data for the plot
The next step is to add a few variables to the odds_df data frame that will aid in visualizing the results. First, create a new variable, odds_ratio_category, to categorize the courses based on their p-value and odds ratio. Next, assign colors to each category and format p-values for better readability. Last, generate two variables that will be used to label the plot.
Show/Hide Code
odds_df <- odds_df %>%
mutate(odds_ratio_category = case_when((p_value < 0.05 & odds_ratio >= 1) ~ "above1",
(p_value < 0.05 & odds_ratio < 1) ~ "below1",
(p_value >= 0.05 ~ "neither")),
color_id = case_when(odds_ratio_category == "above1" ~ "#0554A3",
odds_ratio_category == "below1" ~ "#26A5CA",
odds_ratio_category == "neither" ~ "gray"),
p_value_label = case_when(p_value <= 0.001 ~ "<0.001",
.default = as.character(p_value)),
above1_label = case_when(odds_ratio >= 1 ~ glue("{odds_ratio} ({p_value_label})"),
TRUE ~ ""),
below1_label = case_when(odds_ratio < 1 ~ glue("{odds_ratio} ({p_value_label})"),
TRUE ~ ""))
3. Render the lollipop plot
Finally, create the plot.
Show/Hide Code
odds_df %>%
1 ggplot(mapping = aes(x = odds_ratio, y = reorder(course, odds_ratio), color = color_id)) +
theme_classic() +
2 theme(
plot.title = element_markdown(size = 15, color = "black", face = "bold", margin = margin(b = 15)),
plot.subtitle = element_markdown(size = 13, color = "black", margin = margin(b = 15)),
plot.caption = element_text(size = 10, color = "black", hjust = 0.5, margin = margin(t = 10)),
axis.title.x = element_text(size = 12, color = "black", face = "bold", margin = margin(t = 10)),
axis.title.y = element_blank(),
axis.text = element_text(size = 11, color = "black"),
axis.line = element_line(linewidth = 1),
axis.ticks = element_line(linewidth = 1),
) +
3 labs(
title = "In most courses, students in Online/Hybrid sections are <span style = 'color: #0554A3;'>**more likely to DFW**</span>
<br> rather than <span style = 'color: #26A5CA;'>less likely to DFW</span> compared with Face-to-Face sections",
subtitle = "In some courses, there is <span style = 'color: gray50;'>**no difference in DFW likelihood**</span>",
caption = "The text next to each point displays the Odds Ratio with the P-Value in parentheses",
x = "Odds Ratio"
) +
4 geom_vline(
aes(xintercept = 1),
color = "#58595B",
linetype = "dashed"
) +
5 geom_segment(
aes(xend = 1, yend = course),
size = 1.5
) +
6 geom_point(size = 7) +
7 geom_text(
aes(label = above1_label),
nudge_x = 0.35,
size = 4,
na.rm = TRUE
) +
8 geom_text(
aes(label = below1_label),
nudge_x = -0.35,
size = 4
) +
9 scale_color_identity() +
10 scale_x_continuous(limits = c(0, (max(odds_df$odds_ratio) + 0.5)))
- 1
- Define the mapping to plot the odds ratio on the x-axis and reorder courses by the odds ratio on the y-axis.
- 2
- Customize the theme.
- 3
- Set the plot title, subtitle, caption, and x-axis label with custom styling in the title and subtitle for color highlighting
- 4
- Insert a dashed vertical line at x = 1 to visually separate odds ratios above and below 1.
- 5
- Use geom_segment to connect each point to the vertical reference line at x = 1, with a specified line size.
- 6
- Add the data points with a specified size.
- 7
- Include text labels for points where the odds ratio is above 1, with a horizontal adjustment (nudge_x) and specified text size.
- 8
- Include text labels for points where the odds ratio is below 1, with a horizontal adjustment (nudge_x) and specified text size.
- 9
-
Use
scale_color_identity()
to apply colors directly from the color_id variable in the data frame. - 10
- Set the limits of the x-axis to range from 0 to slightly beyond the maximum odds ratio, adding extra space for better visualization.