Calculate Statistics — calc

calc_stats provides an interface for calculating statistics/metrics on model predictions and/or observed data. Supported statistics include basic statistics on mean and standard deviation, Conditional Accuracy Functions (CAFs), Quantiles, Delta Functions, and fit statistics. Results can be aggregated across individuals.

Usage

calc_stats(object, type, ...)

# S3 method for class 'data.frame'
calc_stats(
  object,
  type,
  ...,
  conds = NULL,
  verbose = 0,
  average = FALSE,
  split_by_ID = TRUE,
  b_coding = NULL
)

# S3 method for class 'drift_dm'
calc_stats(object, type, ..., conds = NULL)

# S3 method for class 'fits_ids_dm'
calc_stats(object, type, ..., verbose = 1, average = FALSE)

# S3 method for class 'stats_dm'
print(
  x,
  ...,
  round_digits = drift_dm_default_rounding(),
  print_rows = 10,
  some = FALSE,
  show_header = TRUE,
  show_note = TRUE
)

# S3 method for class 'stats_dm_list'
print(x, ...)

Arguments

object: an object for which statistics are calculated. This can be a data.frame of observed data, a drift_dm object, or a fits_ids_dm object (see estimate_model_ids).
type: a character vector, specifying the statistics to calculate. Supported values include "basic_stats", "cafs", "quantiles", "delta_funs", and "fit_stats".
...: additional arguments passed to the respective method and the underlying calculation functions (see Details for mandatory arguments).
conds: optional character vector specifying conditions to include. Conditions must match those found in the object.
verbose: integer, indicating if information about the progress should be displayed. 0 -> no information, 1 -> a progress bar. Default is 0.
average: logical. If TRUE, averages the statistics across individuals where applicable. Default is FALSE.
split_by_ID: logical. If TRUE, statistics are calculated separately for each individual ID in object (when object is a data.frame). Default is TRUE.
b_coding: a list for boundary coding (see b_coding). Only relevant when object is a data.frame. For other object types, the b_coding of the object is used.
x: an object of type stats_dm or stats_dm_list, as returned by the function calc_stats().
round_digits: integer, controls the number of digits shown. Default is 3.
print_rows: integer, controls the number of rows shown.
some: logical. If TRUE, a subset of randomly sampled rows is shown.
show_header: logical. If TRUE, a header specifying the type of statistic will be displayed.
show_note: logical. If TRUE, a footnote is displayed indicating that the underlying data.frame can be accessed as usual.

Value

If type is a single character string, then a subclass of data.frame is returned, containing the respective statistic. Objects of type sum_dist will have an additional attribute storing the boundary encoding (see also b_coding). The reason for returning subclasses of data.frame is to provide custom plot() methods (e.g., plot.cafs). To get rid of the subclass label and additional attributes (i.e., to get just the plain underlying data.frame, users can use unpack_obj()).

If type contains multiple character strings (i.e., is a character vector) a subclass of list with the calculated statistics is returned. The list will be of type stats_dm_list (to easily create multiple panels using the respective plot.stats_dm_list() method).

The print methods print.stats_dm() and print.stats_dm_list() each invisibly return the supplied object x.

Details

calc_stats is a generic function to handle the calculation of different statistics/metrics for the supported object types. Per default, it returns the requested statistics/metrics.

Basic Statistics

With "basic statistics", we refer to a summary of the mean and standard deviation of response times, including a proportion of response choices.

Conditional Accuracy Function (CAFs)

CAFs are a way to quantify response accuracy against speed. To calculate CAFs, RTs (whether correct or incorrect) are first binned and then the percent correct responses per bin is calculated.

When calculating model-based CAFs, a joint CDF combining both the pdf of correct and incorrect responses is calculated. Afterwards, this CDF is separated into even-spaced segments and the contribution of the pdf associated with a correct response relative to the joint CDF is calculated.

The number of bins can be controlled by passing the argument n_bins. The default is 5.

Quantiles

For observed response times, the function stats::quantile is used with default settings.

Which quantiles are calcuated can be controlled by providing the probabilites, probs, with values in \([0, 1]\). Default is seq(0.1, 0.9, 0.1).

Delta Functions

Delta functions calculate the difference between quantiles of two conditions against their mean:

\(Delta_i = Q_{i,j} - Q_{i,k}\)
\(Avg_i = 0.5 \cdot Q_{i,j} + 0.5 \cdot Q_{i,k}\)

With i indicating a quantile, and j and k two conditions.

To calculate delta functions, users have to specify:

minuends: character vector, specifying condition(s) j. Must be in conds(drift_dm_obj).
subtrahends: character vector, specifying condition(s) k. Must be in conds(drift_dm_obj)
dvs: character, indicating which quantile columns to use. Default is "Quant_<u_label>". If multiple dvs are provided, then minuends and subtrahends must have the same length, and matching occurs pairwise. In this case, if only one minuend/subtrahend is specified, minuend and subtrahend are recycled to the necessary length.

Fit Statistics

Calculates the Akaike and Bayesian Information Criteria (AIC and BIC). Users can provide a k argument to penalize the AIC statistic (see stats::AIC and AIC.fits_ids_dm)

Note

When a model's predicted density function integrates to a value of less than drift_dm_skip_if_contr_low(), means and quantiles return the values NA. Users can alter this by explicitly passing the argument skip_if_contr_low when calling calc_stats() (e.g., calc_stats(..., skip_if_contr_low = -Inf))

Examples

# Example 1: Calculate CAFs and Quantiles from a model ---------------------
# get a model for demonstration purpose
a_model <- ssp_dm(dx = .0025, dt = .0025, t_max = 2)
# and then calculate cafs and quantiles
some_stats <- calc_stats(a_model, type = c("cafs", "quantiles"))
print(some_stats)
#> Element 1, contains cafs
#> 
#>    Source   Cond Bin P_corr
#> 1    pred   comp   1  0.981
#> 2    pred   comp   2  0.981
#> 3    pred   comp   3  0.981
#> 4    pred   comp   4  0.981
#> 5    pred   comp   5  0.981
#> 6    pred incomp   1  0.672
#> 7    pred incomp   2  0.928
#> 8    pred incomp   3  0.960
#> 9    pred incomp   4  0.973
#> 10   pred incomp   5  0.979
#> 
#> 
#> Element 2, contains quantiles
#> 
#>    Source   Cond Prob Quant_corr Quant_err
#> 1    pred   comp  0.1      0.363     0.363
#> 2    pred   comp  0.2      0.382     0.382
#> 3    pred   comp  0.3      0.401     0.401
#> 4    pred   comp  0.4      0.421     0.421
#> 5    pred   comp  0.5      0.443     0.443
#> 6    pred   comp  0.6      0.469     0.469
#> 7    pred   comp  0.7      0.502     0.502
#> 8    pred   comp  0.8      0.547     0.547
#> 9    pred   comp  0.9      0.626     0.626
#> 10   pred incomp  0.1      0.408     0.347
#> ...
#> 
#> (extract the list's elements as usual, e.g., with $cafs)

# Example 2: Calculate a Delta Function from a data.frame ------------------
# get a data set for demonstration purpose
some_data <- ulrich_simon_data
conds(some_data) # relevant for minuends and subtrahends
#> [1] "incomp" "comp"  
some_stats <- calc_stats(
  a_model,
  type = "delta_funs",
  minuends = "incomp",
  subtrahends = "comp"
)
print(some_stats, print_rows = 5)
#> Type of Statistic: delta_funs
#> 
#>   Source Prob Quant_corr_comp Quant_corr_incomp Delta_incomp_comp
#> 1   pred  0.1           0.363             0.408             0.045
#> 2   pred  0.2           0.382             0.436             0.054
#> 3   pred  0.3           0.401             0.461             0.060
#> 4   pred  0.4           0.421             0.486             0.065
#> 5   pred  0.5           0.443             0.512             0.069
#>   Avg_incomp_comp
#> 1           0.385
#> 2           0.409
#> 3           0.431
#> 4           0.453
#> 5           0.477
#> ...
#> 
#> (access the data.frame's columns/rows as usual)


# Example 3: Calculate Quantiles from a fits_ids_dm object -----------------
# get an auxiliary fits_ids_dm object
all_fits <- get_example_fits_ids()
some_stats <- calc_stats(all_fits, type = "quantiles")
print(some_stats, print_rows = 5) # note the ID column
#> Type of Statistic: quantiles
#> 
#>   ID Source Cond Prob Quant_corr Quant_err
#> 1  1    obs comp  0.1      0.335     0.361
#> 2  1    obs comp  0.2      0.368     0.388
#> 3  1    obs comp  0.3      0.385     0.415
#> 4  1    obs comp  0.4      0.385     0.441
#> 5  1    obs comp  0.5      0.401     0.468
#> ...
#> 
#> (access the data.frame's columns/rows as usual)

# one can also request that the statistics are averaged across individuals
print(
  calc_stats(all_fits, type = "quantiles", average = TRUE)
)
#> Type of Statistic: quantiles
#> 
#>    Source   Cond Prob Quant_corr Quant_err
#> 1     obs   comp  0.1      0.324     0.311
#> 2     obs   comp  0.2      0.346     0.330
#> 3     obs   comp  0.3      0.362     0.348
#> 4     obs   comp  0.4      0.368     0.365
#> 5     obs   comp  0.5      0.385     0.381
#> 6     obs   comp  0.6      0.396     0.383
#> 7     obs   comp  0.7      0.407     0.386
#> 8     obs   comp  0.8      0.424     0.391
#> 9     obs   comp  0.9      0.470     0.396
#> 10    obs incomp  0.1      0.351     0.311
#> ...
#> 
#> (access the data.frame's columns/rows as usual)