Lesson 20: Getting Help

Introduction

As you continue your R programming and learning journey, you’re sure to run into issues, errors, and other challenges. One of the best things about R is that there is a big community of R users who have almost certainly run into the same issues, errors, and challenges that you’re having. Many of these people have gotten help and solved their problems through blogs, books, forums, and other resources freely available online. When you need help, finding these resources is the best first step. If you can’t find the answer to your question, the next best way to solve your problem is by creating a reproducible example that demonstrates your issue and posting it on a coding question-and-answer site. This lesson will give some tips for how to find the most reliable answers to your coding questions online and will also demonstrate how to create a good reproducible example.

Start by searching

If you get an error message you don’t understand, copying and pasting the error message into your search engine of choice will often be enough to get you unstuck. Depending on the error message or search term you use, it can be useful to add “R” to the search to restrict results to R programming. Additionally, you may want to add the name of the package you’re using to the search to further refine results.

When searching for results, you may find solutions that use different packages or code style from what has been presented in this course. There are many different approaches to writing R code, so you may find solutions that look different from the code you’ve written in this course. One trick that may help you find more familiar code is to add “tidyverse” to your search terms, since this course has relied heavily on functions from the tidyverse package.

Make a reproducible example

If searching doesn’t lead to an answer, a good next step in getting help is to create a reproducible example (or “reprex”) that shows the problem you’re having. The goal of creating a reprex is to make a simplified version of the question you want to solve so that other people can help you solve it. This involves including all the data and code required for other people to reproduce your problem and attempt to solve it. When making a reprex, it’s important to strip away extra code that’s not causing problems or is not related to your question. Being able to isolate the specific line of code or function that is returning errors is key.

Let’s return to the arrest dataset from previous lessons to make a reprex. Read in the data from GitHub:

library(tidyverse)

arrests_file_url <- "https://github.com/CSGJusticeCenter/va_data/raw/refs/heads/main/courses/intro_r/arrests.csv"
arrests <- read_csv(arrests_file_url)

Next, run this chunk of code and inspect the results and warning messages returned.

arrests |> 
  mutate(
    arrest_date = mdy(arrest_date),
    booking_date = ymd(paste(booking_year, booking_month, booking_day)),
    arrest_date_time = mdy_hm(paste(arrest_date, arrest_time)),
    booking_date_time = ymd_hm(paste(booking_date, booking_time)),
    arrest_to_booking_min = booking_date_time - arrest_date_time
  ) |>
  group_by(charge_group) |> 
  summarize(
    mean = mean(arrest_to_booking_min),
    median = median(arrest_to_booking_min),
  ) |> 
  arrange(desc(mean))
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `arrest_date_time = mdy_hm(paste(arrest_date, arrest_time))`.
Caused by warning:
! All formats failed to parse. No formats found.
# A tibble: 22 × 3
   charge_group            mean    median 
   <chr>                   <drtn>  <drtn> 
 1 Against Family/Child    NA secs NA secs
 2 Aggravated Assault      NA secs NA secs
 3 Burglary                NA secs NA secs
 4 Disorderly Conduct      NA secs NA secs
 5 Disturbing the Peace    NA secs NA secs
 6 Driving Under Influence NA secs NA secs
 7 Forgery/Counterfeit     NA secs NA secs
 8 Fraud/Embezzlement      NA secs NA secs
 9 Gambling                NA secs NA secs
10 Homicide                NA secs NA secs
# ℹ 12 more rows

Notice that both the mean and median columns all appear to have NA secs for their values, so there’s something wrong in the calculation. But it’s not clear from the result where the error is in this block of code. Next, look at the error message, which indicates that there was an issue with the arrest_date_time column and that:

! All formats failed to parse. No formats found.

It might not be clear what this warning means, so you could search for that message in a search engine. But it also may help to narrow down where in the code the issue is arising. It seems to have something to do with the arrest_date_time column, so a good next step is to try to isolate code and data related to this column.

arrests |> 
  mutate(
    arrest_date = mdy(arrest_date),
    arrest_date_time = mdy_hm(paste(arrest_date, arrest_time)),
  ) |> 
  select(arrest_date, arrest_time, arrest_date_time)
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `arrest_date_time = mdy_hm(paste(arrest_date, arrest_time))`.
Caused by warning:
! All formats failed to parse. No formats found.
# A tibble: 1,000 × 3
   arrest_date arrest_time arrest_date_time
   <date>      <chr>       <dttm>          
 1 2020-02-04  14 10       NA              
 2 2022-06-28  20 30       NA              
 3 2023-05-01  10 00       NA              
 4 2021-10-19  17 00       NA              
 5 2021-11-30  00 30       NA              
 6 2022-01-08  20 30       NA              
 7 2022-05-28  22 00       NA              
 8 2021-04-27  03 15       NA              
 9 2020-01-14  23 50       NA              
10 2021-03-29  00 47       NA              
# ℹ 990 more rows

It seems like the process of creating the arrest_date_time column from arrest_date and arrest_time isn’t working correctly, since the results are all NA. This is really helpful, since you can eliminate the extra group_by() and summarize() code, which don’t seem germane to this issue.

Now that you’ve isolated the problematic section of the code, there’s one other thing missing from the reprex before you can post this on a coding help site: the data! You’re the only one who has the arrests dataset on your computer, and if you post this code without the data, it will be much harder for someone else to help you. One option would be to upload arrests.csv with the question, but that isn’t a good idea for a couple of reasons. First, for data privacy reasons you likely won’t be able to share the contents of your data. Second, someone online doesn’t need all the raw data to help solve your problem—a few rows would probably be enough.

To get around this, create a small, sample dataset in your reprex that mirrors your real dataset. You can even make up fake dates and times, so there are no data security concerns. You can use the tribble() function to create a new data frame. The first two elements that begin with tildes are the column names, followed by the values for the columns with each row separated by a line break.

arrests_sample <- tribble(
  ~arrest_date, ~arrest_time,
  "8/26/2020", "13 08",
  "12/17/2020", "23 12",
  "7/7/2020", "00 34"
)

arrests_sample
# A tibble: 3 × 2
  arrest_date arrest_time
  <chr>       <chr>      
1 8/26/2020   13 08      
2 12/17/2020  23 12      
3 7/7/2020    00 34      

Now, you can use this sample dataset to illustrate the issue you’re having.

arrests_sample |> 
  mutate(
    arrest_date = mdy(arrest_date),
    arrest_date_time = mdy_hm(paste(arrest_date, arrest_time)),
  )
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `arrest_date_time = mdy_hm(paste(arrest_date, arrest_time))`.
Caused by warning:
! All formats failed to parse. No formats found.
# A tibble: 3 × 3
  arrest_date arrest_time arrest_date_time
  <date>      <chr>       <dttm>          
1 2020-08-26  13 08       NA              
2 2020-12-17  23 12       NA              
3 2020-07-07  00 34       NA              

To bring it all together, you can use the reprex function to create a reproducible example to copy and paste into your coding website of choice. To use this function, copy the code you want to include to your clipboard and run:

reprex::reprex()
arrests_sample <- tribble(
  ~arrest_date, ~arrest_time,
  "8/26/2020", "13 08",
  "12/17/2020", "23 12",
  "7/7/2020", "00 34"
)
#> Error in tribble(~arrest_date, ~arrest_time, "8/26/2020", "13 08", "12/17/2020", : could not find function "tribble"

arrests_sample |> 
  mutate(
    arrest_date = mdy(arrest_date),
    arrest_date_time = mdy_hm(paste(arrest_date, arrest_time)),
  )
#> Error in mutate(arrests_sample, arrest_date = mdy(arrest_date), arrest_date_time = mdy_hm(paste(arrest_date, : could not find function "mutate"

Wait, here’s an error caused by creating the reprex. The errors say could not find function "tribble" and could not find function "mutate", which is not the issue you’re trying to solve. The cause of this error is that you did not include the code that attaches packages to the reprex! To make an example truly reproducible, you need to include the code, the data, and the packages you used in the reprex. In this case, add library(tidyverse) to attach the tidyverse package before the rest of the code. If you copy all that code and run reprex, you now have a fully reproducible example that others can use to help troubleshoot the issue.

reprex::reprex()
library(tidyverse)

arrests_sample <- tribble(
  ~arrest_date, ~arrest_time,
  "8/26/2020", "13 08",
  "12/17/2020", "23 12",
  "7/7/2020", "00 34"
)

arrests_sample |> 
  mutate(
    arrest_date = mdy(arrest_date),
    arrest_date_time = mdy_hm(paste(arrest_date, arrest_time)),
  )
#> Warning: There was 1 warning in `mutate()`.
#> ℹ In argument: `arrest_date_time = mdy_hm(paste(arrest_date, arrest_time))`.
#> Caused by warning:
#> ! All formats failed to parse. No formats found.
#> # A tibble: 3 × 3
#>   arrest_date arrest_time arrest_date_time
#>   <date>      <chr>       <dttm>          
#> 1 2020-08-26  13 08       NA              
#> 2 2020-12-17  23 12       NA              
#> 3 2020-07-07  00 34       NA

<sup>Created on 2025-01-21 with [reprex v2.1.1](https://reprex.tidyverse.org)</sup>

Sometimes this process of preparing a reprex will lead you to the source of the problem and allow you to resolve the issue on your own. But if not, there are a number of places to ask your question and post your reprex in the hopes that someone will answer it.

Where to ask your question

If you’re a corrections analyst, the Corrections Analyst Community Forum is a great place to post your questions. If you’re not already a member, register here.

The best R-specific forum to ask and answer questions is the Posit Community. Generally, this is a welcoming community and has lots of contributors who are expert R users.

Stack Overflow is another community question-and-answer site where you can post your question. There is also a robust R community there, but you may find the users slightly less beginner-friendly than the other two forums mentioned here.

Generative AI

AI models like ChatGPT and Claude are also popular tools for getting help with coding. While the quality of code they generate varies, they often are good at debugging or explaining code. For example, inputting the original code and error from above into ChatGPT results in a relatively complete diagnosis of the possible issues and has many suggestions for how to fix the code:

The issue you’re facing is likely due to a mismatch or incorrect format when trying to parse the arrest_date and arrest_time values using the mdy_hm() function in the mutate() step.

The warning message, “All formats failed to parse. No formats found,” indicates that the mdy_hm() function couldn’t interpret the arrest_date and arrest_time together as a valid datetime format.

Key points to check and address:

  1. Check the format of arrest_date and arrest_time:

    • arrest_date should be in mm/dd/yyyy format.
    • arrest_time should be in hh:mm AM/PM format (or HH:mm for 24-hour format).

    If either of these is in a different format, mdy_hm() may not be able to parse them correctly.

  2. Ensure the arrest_time column is in the correct format:

    • Sometimes the time column (arrest_time) might contain leading/trailing spaces or be in an unexpected format.
    • It may also be in 12-hour format with AM/PM, so if arrest_time is something like “3:30 PM”, you need to make sure that it is properly parsed. If needed, you could clean it before using it.
  3. Debugging the input: You can inspect a few values from arrest_date and arrest_time to check their format. Try printing the first few rows:

    head(arrests$arrest_date)
    head(arrests$arrest_time)
  4. Fixing the parsing issue: If the date or time columns have unexpected formats, you might need to clean them first. For instance, to fix possible issues with time formatting, you could standardize arrest_time as follows:

    arrests <- arrests %>%
      mutate(arrest_time = str_trim(arrest_time),  # Remove any extra spaces
             arrest_date_time = mdy_hm(paste(arrest_date, arrest_time)))
  5. Alternative parsing approach: If mdy_hm() still doesn’t work for parsing the arrest_date and arrest_time, you might want to split the operations into separate steps. For example:

    arrests <- arrests %>%
      mutate(
        arrest_date = mdy(arrest_date),
        arrest_time = hms(arrest_time),  # Assuming arrest_time is in 'hh:mm:ss' format
        arrest_date_time = arrest_date + arrest_time
      )
  6. Handle NA values properly: If there’s missing or malformed data, it might result in NA values for the entire column. Use summary() or table() to check for missing values:

    summary(arrests$arrest_date)
    summary(arrests$arrest_time)

By addressing the potential issues with date and time formatting, you should be able to resolve the NA values and get meaningful results in the summary.

These are all great suggestions, but ChatGPT is having the same issue as a friendly forum user would have: It doesn’t have access to the arrests dataset, so it can’t fully diagnose the problem.

So, there’s still value in making a reprex, even if you plan to ask a chatbot for help. Here is ChatGPT’s response when the complete reprex is included in the question:

The issue here is related to how the mdy_hm() function interprets the combined date and time format. Specifically, the arrest_date is being correctly converted to a Date format, but when you try to use mdy_hm() to parse the paste(arrest_date, arrest_time), it’s failing to interpret the time portion due to the lack of a time separator (e.g., a colon : between hours and minutes).

By default, mdy_hm() expects the time to be in a format like "13:08", with a colon separating the hour and minute.

To fix this, you can first adjust the arrest_time column to include the colon separator, and then apply mdy_hm():

library(tidyverse)

arrests_sample <- tribble(
  ~arrest_date, ~arrest_time,
  "8/26/2020", "13 08",
  "12/17/2020", "23 12",
  "7/7/2020", "00 34"
)

arrests_sample |> 
  mutate(
    arrest_date = mdy(arrest_date),
    arrest_time = gsub(" ", ":", arrest_time),  # Fix the time format by adding a colon
    arrest_date_time = mdy_hm(paste(arrest_date, arrest_time))
  )

This will produce the expected arrest_date_time column with the correct datetime values. Here’s what’s happening:

  1. gsub(" ", ":", arrest_time) replaces the space between the hour and minute with a colon (:).
  2. mdy_hm(paste(arrest_date, arrest_time)) now works correctly because the time format is what mdy_hm() expects.

This will give you the following output:

# A tibble: 3 × 3
  arrest_date arrest_time arrest_date_time      
  <date>      <chr>       <dttm>                
1 2020-08-26  13:08       2020-08-26 13:08:00  
2 2020-12-17  23:12       2020-12-17 23:12:00  
3 2020-07-07  00:34       2020-07-07 00:34:00  

This looks great—thanks ChatGPT!

arrests_sample <- tribble(
  ~arrest_date, ~arrest_time,
  "8/26/2020", "13 08",
  "12/17/2020", "23 12",
  "7/7/2020", "00 34"
  )

arrests_sample |> 
  mutate(
    arrest_date = mdy(arrest_date),
    arrest_time = gsub(" ", ":", arrest_time),  # Fix the time format by adding a colon
    arrest_date_time = mdy_hm(paste(arrest_date, arrest_time))
  )
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `arrest_date_time = mdy_hm(paste(arrest_date, arrest_time))`.
Caused by warning:
! All formats failed to parse. No formats found.
# A tibble: 3 × 3
  arrest_date arrest_time arrest_date_time
  <date>      <chr>       <dttm>          
1 2020-08-26  13:08       NA              
2 2020-12-17  23:12       NA              
3 2020-07-07  00:34       NA              

Actually, there are still NAs here, so ChatGPT didn’t figure it out either, even though it confidently reported how to fix the error.

The real solution doesn’t have to do with a missing colon. The function to create arrest_date_time should be ymd_hm(paste(arrest_date, arrest_time)) instead of mdy_hm(paste(arrest_date, arrest_time)). In the line of code above this, arrest_date is converted from the month/day/year format in the original data into year-month-day format. So, to parse the arrest_date and arrest_time columns correctly, you need to use ymd_hm.

arrests_sample <- tribble(
  ~arrest_date, ~arrest_time,
  "8/26/2020", "13 08",
  "12/17/2020", "23 12",
  "7/7/2020", "00 34"
)

arrests_sample |> 
  mutate(
    arrest_date = mdy(arrest_date),
    arrest_date_time = ymd_hm(paste(arrest_date, arrest_time)),
  )
# A tibble: 3 × 3
  arrest_date arrest_time arrest_date_time   
  <date>      <chr>       <dttm>             
1 2020-08-26  13 08       2020-08-26 13:08:00
2 2020-12-17  23 12       2020-12-17 23:12:00
3 2020-07-07  00 34       2020-07-07 00:34:00

ChatGPT was able to give some good ideas for how to solve the problem, but in the end, this required debugging from a human! Before you use ChatGPT or other generative AI model to help with coding problems, make sure you follow your agency’s policy on data privacy and security.

Resources