Check Data — check_data • rtemis

Check Data

Usage

check_data(
  x,
  name = NULL,
  get_duplicates = TRUE,
  get_na_case_pct = FALSE,
  get_na_feature_pct = FALSE
)

Arguments

x: tabular data: Input to be checked.
name: Character: Name of dataset.
get_duplicates: Logical: If TRUE, check for duplicate cases.
get_na_case_pct: Logical: If TRUE, calculate percent of NA values per case.
get_na_feature_pct: Logical: If TRUE, calculate percent of NA values per feature.

Value

CheckData object.

Author

EDG

Examples

n <- 1000
x <- rnormmat(n, 50, return_df = TRUE)
x$char1 <- sample(letters, n, TRUE)
x$char2 <- sample(letters, n, TRUE)
x$fct <- factor(sample(letters, n, TRUE))
x <- rbind(x, x[1, ])
x$const <- 99L
x[sample(nrow(x), 20), 3] <- NA
x[sample(nrow(x), 20), 10] <- NA
x$fct[30:35] <- NA
check_data(x)
#>   x: A data.frame with 1001 rows and 54 columns.
#> 
#>   Data types
#>   * 50 numeric features
#>   * 1 integer feature
#>   * 1 factor, which is not ordered
#>   * 2 character features
#>   * 0 date features
#> 
#>   Issues
#>   * 1 constant feature
#>   * 1 duplicate case
#>   * 3 features include 'NA' values; 46 'NA' values total
#>     * 1 factor; 2 numeric
#> 
#>   Recommendations
#>   * Consider converting character features to factors or excluding them.
#>   * Remove the constant feature.
#>   * Consider removing the duplicate case.
#>   * Consider using algorithms that can handle missingness or imputing missing values.