--- title: "Welcome to psychmeta!" subtitle: "An Introductory Tutorial" author: "Jeff Dahlke & Brenton Wiernik" output: html_document: df_print: paged vignette: > %\VignetteIndexEntry{Welcome to psychmeta!} %\VignetteEngine{knitr::rmarkdown} %\usepackage[utf8]{inputenc} csl: "apa.csl" bibliography: vignette.yaml --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) library(psychmeta) ``` **`psychmeta`** is a comprehensive and open-source **R** package for computing psychometric meta-analyses, conducting follow-up analyses of meta-analytic results (e.g., cumulative meta-analyses, leave-one-out meta-analyses, bootstrapped meta-analyses, and meta-regressions [with help from **`metafor`**]), simulating data impacted by psychometric artifacts, and much more! This tutorial is meant to introduce you to some of the most critical features of **`psychmeta`** and familiarize you with workflows within the package. We'll start with the basics (installation and data import) and work our way up to the more techncial topics. # Installing and loading **`psychmeta`** Install the official CRAN version of the package: ```{r, eval=FALSE} install.packages("psychmeta") ``` Or install the GitHub verson (with incremental updates between CRAN releases): ```{r, eval=FALSE} devtools::install_github("psychmeta/psychmeta") ``` Load the package library: ```{r, eval=FALSE} library(psychmeta) ``` # Data import Recommended packages for data import: * **`rio`** (convenient import of many formats; relies on `fread()` funtion from **`data.table`** and can have isues with date variables) * **`readr`** (many efficient functions for loading delimited datafiles; e.g., CSV) * **`haven`** (imports and exports SPSS, SAS, and Stata files) * **`readxl`** (for Excel files) **`psychmeta`** requires data to be in a "long" format. This means that each effect size is coded in its own row of a matrix. For example, a long format looks like this: ```{r, echo=FALSE} # First 10 observations from the the database of Gonzalez-Mulé, Mount, and Oh (2014; JAP) head(data_r_gonzalezmule_2014[,c("Study", "n", "rxyi", "Rating source", "rxxi", "ryyi", "ux")]) ``` \pagebreak # Data conversion and cleaning Not all data come in a long format, so data manipulation tools can be helpful to convert it to a long format. Recommended packages for data manipulation: * **`dplyr`** (flexible package for fast, intuitive data manipulation) * **`tidyr`** (functions to convert between wide and long data) **`psychmeta`** can also help with reshaping data! Sometimes it can be quick to code data in a journal table-like format: ```{r, include=FALSE} dat_matrix <- data.frame(var_names = c("X", "Y", "Z"), n = c(100, 100, 100), mean = c(4, 5, 3), sd = c(2.4, 2.6, 2), rel = c(.8, .7, .85), reshape_vec2mat(cov = c(.3, .4, .5), var_names = c("X", "Y", "Z"))) rownames(dat_matrix) <- NULL ``` ```{r, eval=TRUE} dat_matrix ``` This matrix-style format can be easily converted to the long-format data **`psychmeta`** requires with the `reshape_mat2dat()` function: ```{r, eval=TRUE} reshape_mat2dat( var_names = "var_names", # Column of variable names cor_data = c("X", "Y", "Z"), # Names of correlation columns common_data = "n", # Names of columns shared among relationships unique_data = c("mean", "sd", "rel"), # Names of columns unique to relationships data = dat_matrix) ``` \pagebreak Sometimes it can also be quick to code data with multiple correlations in one row (i.e., a wide format): ```{r, include=FALSE} dat_wide <- data.frame(sample_id = c(1, 2, 3), ni = c(66, 74, 93), rxyi_X_Y = c(-.29, -.16, -.34), rxyi_X_Z = c(.18, .18, .02), rxyi_Y_Z = c(.15, .00, .00), rel_X = c(.95, .93, .97), rel_Y = c(.85, .86, .82), rel_Z = c(.91, .90, .89)) ``` ```{r, eval=TRUE} dat_wide ``` This wide format can be easily converted to the long format **`psychmeta`** requires: ```{r, eval=TRUE} common_vars <- c("sample_id") # Column names for variables common to all # relationships var_names <- c("X", "Y", "Z") es_design = matrix(NA, 3, 3) # Matrix containing the column names es_design[lower.tri(es_design)] <- # for the intercorrelations among variables c("rxyi_X_Y", "rxyi_X_Z", "rxyi_Y_Z") # in the lower triangle of the matrix rownames(es_design) <- colnames(es_design) <- var_names n_design <- "ni" # Sample size column name or es_design-like # matrix other_design <- # Matrix with variable names as row names, cbind(rel = c("rel_X", # names of long-format variables as column names, "rel_Y", # and column names of dat_wide as elements "rel_Z")) rownames(other_design) <- var_names reshape_wide2long(common_vars = common_vars, es_design = es_design, n_design = n_design, other_design = other_design, es_name = "rxyi", # Type of effect size in dat_wide data = dat_wide) ``` \pagebreak The `convert_es()` function can be used to convert a wide variety of statistics to the _r_ or _d_ metric. ```{r, eval=FALSE} convert_es(es = 1, input_es = "d", output_es = "r", n1 = 50, n2 = 50) convert_es(es = -1.3, input_es = "t", output_es = "r", n1 = 100, n2 = 140) convert_es(es = 10.3, input_es = "F", output_es = "r", n1 = 100, n2 = 150) convert_es(es = 1.3, input_es = "chisq", output_es = "r", n1 = 100, n2 = 100) convert_es(es = .021, input_es = "p.chisq", output_es = "r", n1 = 100, n2 = 100) convert_es(es = 4.37, input_es = "or", output_es = "r", n1 = 100, n2 = 100) convert_es(es = 1.47, input_es = "lor", output_es = "r", n1 = 100, n2 = 100) convert_es(es = .2, input_es = "r", output_es = "d", n1 = 50, n2 = 50) convert_es(es = -1.3, input_es = "t", output_es = "d", n1 = 100, n2 = 140) convert_es(es = 10.3, input_es = "F", output_es = "d", n1 = 100, n2 = 150) convert_es(es = 1.3, input_es = "chisq", output_es = "d", n1 = 100, n2 = 100) convert_es(es = .021, input_es = "p.chisq", output_es = "d", n1 = 100, n2 = 100) convert_es(es = 4.37, input_es = "or", output_es = "d", n1 = 100, n2 = 100) convert_es(es = 1.47, input_es = "lor", output_es = "d", n1 = 100, n2 = 100) ``` To include converted effect sizes in a meta-analysis, merge output of `convert_es()` into your data frame, at least the the columns for the converted effect sizes (`r`, `d`, `AUC`), sample size (`n`), effective sample size (`n_eff`): ```{r, eval=TRUE} convert_es(es = c(.4, .3, .25), input_es = "r", output_es = "d", n1 = c(50, 110, 65), n2 = c(50, 70, 65) ) ``` Corrections for measurement error and range restriction can be applied automatically within meta-analyses. Other corrections can be made, but they need to be applied sporadically (i.e., applied to individual effect sizes before including them in a meta-analysis). ```{r, eval=FALSE} correct_r_dich(r = c(.3, .5), px = .5, py = .5, n = 100) correct_r_split(r = c(.3, .5), pi = .2, n = 100) ``` \pagebreak # Computing meta-analyses ## `ma_r()` is the general-purpose meta-analysis function for correlations The arguments to this function are: ```{r, eval=FALSE} ma_r( #### Commonly used arguments: #### # Which data set would you like to use? data = NULL, # Specify essential effect-size information. rxyi, n, n_adj = NULL, # Differentiate sources of data. sample_id = NULL, citekey = NULL, # Specify methodological parameters. ma_method = "bb", ad_type = "tsa", correction_method = "auto", # Specify constructs and measures of constructs. construct_x = NULL, construct_y = NULL, measure_x = NULL, measure_y = NULL, # Weighting method wt_type = "sample_size", # Correct for small-sample bias? correct_bias = TRUE, # Correct for measurement error? correct_rxx = TRUE, correct_ryy = TRUE, # Correct for range restriction? correct_rr_x = TRUE, correct_rr_y = TRUE, # Is the range restriction indirect (vs. direct) in nature? indirect_rr_x = TRUE, indirect_rr_y = TRUE, # What are your reliability coefficients? rxx = NULL, ryy = NULL, # Are your reliability coefficients range restricted? rxx_restricted = TRUE, ryy_restricted = TRUE, # What types of reliability estimates are you using? rxx_type = "alpha", ryy_type = "alpha", # What are your range-restriction u (SD) ratios? ux = NULL, uy = NULL, # Are your u ratios computed from observed (vs. true-score) SDs? ux_observed = TRUE, uy_observed = TRUE, # What are your moderators and which ones are categorical? moderators = NULL, cat_moderators = TRUE, # What type of moderator analysis do you want (simple vs. hierarchical)? moderator_type = "simple", # If correcting for bivariate indirect RR, how do constructs correlate # with selection mechanisms? (only need to specify the +1 or -1 sign of the relationships) sign_rxz = 1, sign_ryz = 1, # Do you have any artifact distributions to include that are not in your database? supplemental_ads = NULL, #### Other arguments to know about: #### # Specify the order in which constructs should be displayed in output. construct_order = NULL, # If analyzing multiple relationships, how should each constructs's measurement # error be handled? correct_rel = NULL, # If analyzing multiple relationships, how should each construct's # range restriction be handled? correct_rr = NULL, # If analyzing multiple relationships, which constructs are affected by indirect # range restriction? indirect_rr = NULL, # If analyzing multiple relationships, how does each construct correlate with # the selection mechanism? sign_rz = NULL, # Additional methological parameters can be modified using the "control" argument. control = control_psychmeta() ) ``` \pagebreak In addition to the arguments used with `ma_r()` directly, the `control` argument allows users to modify methological parameters of their analyses and interact with some of the more specialized features of **`psychmeta`**'s meta-analysis functions. The `control` argument in meta-analysis functions accepts the output of the `control_psychmeta()` function. ```{r, eval=FALSE} control_psychmeta( # Should the mean or the sample-specific effect sizes be used to estimate error variance? error_type = c("mean", "sample"), # What proportion of the distribution of means should be included in confidence intervals? conf_level = .95, # What proportion of the residual distribution of observations should be included in credibility intervals? cred_level = .8, # How should confidence and credibility intervals be computed? With the t or normal distribution? conf_method = c("t", "norm"), cred_method = c("t", "norm"), # Should weighted variance estimates be computed as unbiased (i.e., multiplied by k / [k-1])? var_unbiased = TRUE, # Should overall artifaction distributions be computed for each construct (pairwise_ads == FALSE) or should # artifact distributions be computed separately for each construct pair (pairwise_ads == TRUE)? pairwise_ads = FALSE, # Should artifact distributions be computed by collapsing across moderator # levels (moderated_ads == FALSE) or should artifact distributions be computed # separately for each moderator combination (moderated_ads == TRUE)? moderated_ads = FALSE, # Should artifact-distribution corrections be computed using artifact distributions # that have had sampling error removed (residual_ads == TRUE) or should the observed # distributions be used (residual_ads == TRUE)? residual_ads = TRUE, # Should dependent observations (i.e., multiple observations of the same relationship in a single sample) be # consolidated so that there is just one independent observation per sample? check_dependence = TRUE, # If check_dependence is TRUE, how should dependency be resolved? Options are to compute a composite # effect size (default), compute the average of the effect sizes, or to stop with an error message. collapse_method = c("composite", "average", "stop"), # The intercor argument uses the control_intercor() function to control how the intercorrelations # among variables are handled with collapse_method == "composite" intercor = control_intercor(), # Should artifact information be cleaned to resolve discrepancies among values recorded for multiple # relationsips involving a given construct in a given sample? clean_artifacts = TRUE, # Should missing artifact information be imputed? (For use with individual-correction method only) impute_artifacts = TRUE, # If impute_artifacts is TRUE, how should imputation be performed? See the documentation for the # control_psychmeta() function for descriptions of the available options. impute_method = c("bootstrap_mod", "bootstrap_full", "simulate_mod", "simulate_full", "wt_mean_mod", "wt_mean_full", "unwt_mean_mod", "unwt_mean_full", "replace_unity", "stop"), # What seed value should be set for the imputation process? (This makes the imputation reproducible) seed = 42, # Should artifact information from observations not included in meta-analyses be harvested from "data" # and included in artifact distributions? use_all_arts = TRUE, # For meta-analyses of d values, should the proportionality of membership in the unrestricted sample # be estimated from the range-restricted proportions and the range-restriction correction factor? estimate_pa = FALSE, # To what number of decimal places should interactive artifact distributions be rounded prior to use? # Rounding reduces the computational burden of creating multi-dimensional arrays of artifact information. decimals = 2, # Should the "Hunter-Schmidt" override settings be used? When TRUE, this will override settings for: # - wt_type will set to "sample_size" # - error_type will set to "mean" # - correct_bias will set to TRUE # - conf_method will set to "norm" # - cred_method will set to "norm" # - var_unbiased will set to FALSE # - residual_ads will be set to FALSE # - use_all_arts will set to FALSE hs_override = FALSE ) ``` \pagebreak ## Bare-bones meta-analyses At a minimum, simply specify correlations and sample sizes. You can specify vectors containing correlations, sample sizes, and other input data for each study in three different ways (and any combination of the three): 1. As vectors, with a vector manually passed to each argument. ```{r, eval=FALSE} ma_r(rxyi = data_r_meas_multi$rxyi, n = data_r_meas_multi$n) ``` 2. As string-format (quoted) column names corresponding to the columns of `data`. ```{r, eval=FALSE} ma_r(rxyi = "rxyi", n = "n", data = data_r_meas_multi) ``` 3. As unquoted column names corresponding to the columns of `data`. ```{r, eval=FALSE} ma_r(rxyi = rxyi, n = n, data = data_r_meas_multi) ``` ### Moderator analyses Moderator analyses are easy—just identify a column or matrix of moderator information. ```{r, eval=FALSE} ma_r(rxyi = rxyi, n = n, moderators = moderator, data = data_r_meas_multi) ``` To use multiple moderators, supply the column names as a string vector. ```{r, eval=FALSE} ma_r(rxyi = rxyi, n = n, moderators = c("Rating source", "Published", "Complexity"), data = data_r_gonzalezmule_2014) ``` ### Multi-construct analyses On top of moderators, **`psychmeta`** can also meta-analyze multiple construct relationships in one function call. ```{r, eval=FALSE} ma_r(rxyi = rxyi, n = n, construct_x = x_name, construct_y = y_name, data = data_r_meas_multi) ``` ```{r, eval=FALSE} ma_r(rxyi = rxyi, n = n, construct_x = x_name, construct_y = y_name, moderators = moderator, data = data_r_meas_multi) ``` \pagebreak ## Individual-correction meta-analyses **`psychmeta`** can automatically make corrections for reliability and range restriction as it computes meta-analyses. To do so, specify vectors of data for each artifact your want to correct. To apply corrections individually to each study, specify `ma_method = "ic"`. The `clean_artifacts` and `impute_artifacts` arguments are used to impute missing artifact data and check for various errors or inconsistencies in the artifacts. You should generally leave these set to `TRUE`; they are only `FALSE` here to speed up computations during this tutorial, as our data are clean. ```{r, eval=FALSE} ma_obj <- ma_r(ma_method = "ic", rxyi = rxyi, n = n, construct_x = x_name, construct_y = y_name, rxx = rxxi, ryy = ryyi, moderators = moderator, clean_artifacts = FALSE, impute_artifacts = FALSE, data = data_r_meas_multi) ``` ## Artifact-distribution meta-analyses To apply corrections to meta-analysis results overall, rather than to individual studies, specify `ma_method = "ad"`. Artifact distribution methods are useful when you have missing data (they make imputating missing values unnecessary). They can also be used to supplement observed artifacts with values from other sources (reducing artifact sampling error) and for applying corrections to artifact values to yield more accurate meta-analysis results [e.g., @DahlkeNotrestrictedselection2019]. **`psychmeta`** can apply a variety of artifact correction models. You can specify which method to use with the `correction_method` argument. The default is "auto", and **`psychmeta`** will choose the best method based on available artifact data and settings for these arguments: * `correct_bias` * `correct_rxx`, `correct_ryy` * `correct_rr_x`, `correct_rr_y` * `indirect_rr_x`, `indirect_rr_y` * `correct_rel`, `correct_rr`, `indirect_rr` ```{r, eval=FALSE} ma_r(ma_method = "ad", rxyi = rxyi, n = n, construct_x = x_name, construct_y = y_name, rxx = rxxi, ryy = ryyi, correct_rr_x = FALSE, correct_rr_y = FALSE, moderators = moderator, data = data_r_meas_multi) ``` You can also convert a meta-analysis previously calculated using barebones or individual-correction methods to an artifact-distribution meta-analysis: ```{r, eval=FALSE} ma_r_ad(ma_obj, correct_rr_x = FALSE, correct_rr_y = FALSE) ``` \pagebreak ## Handling dependency Psychometric meta-analysis assumes that each effect size for a particular construct relationship comes from an independent sample. If one sample provides multiple effect sizes for the same relationship, **`psychmeta`** can automatically consolidate them as either composites (recommended) or averages. ```{r, eval=FALSE} ma_r(rxyi = rxyi, n = n, sample_id = sample_id, collapse_method = "composite", data = data_r_meas_multi) ma_r(rxyi = rxyi, n = n, sample_id = sample_id, collapse_method = "average", data = data_r_meas_multi) ``` # Navigating meta-analysis output objects The results of **`psychmeta`**'s meta-analysis functions take the form of nested tibbles. A tibble is a type of tabular array that is a modernized version on the data frame (if you're new to tibbles, a vignette on the topic is available from the **`tibble`** package's [CRAN documentation](https://CRAN.R-project.org/package=tibble)). The "nesting" of results means that the some cells of the tibble contain complex objects, such as data frames and lists (i.e., objects that have their own indexing structures, hence the nesting). This nested structure allows a lot of information to be organized in a very concise way so that an overview of even a very complicated meta-analysis can fit in your console window. As an example, below are some results computed using the `data_r_gonzalezmule_2014` data set included with **`psychmeta`**, which is an updated version of the database reported by @Gonzalez-Mulemetaanalysisrelationshipgeneral2014 in their meta-analysis of the relationship between general mental ability (GMA) and organizational citizenship behaviors (i.e., prosocial behaviors exhibited in the workplace; OCBs). Gonzalez-Mulé et al. explored a number of moderators, but for this analysis we have chosen to use only two: `Rating source` (OCBs rated by self vs. supervisor) and `Type` (the target of the OCBs; OCB-I = targeted toward individuals, OCB-O = targeted toward the organization, OCB-CH = targeted toward change). To demonstrate the structure of meta-analysis objects, we have chosen to run this example using hierarchical moderator analyses (i.e., `moderator_type = "hierarchical"`), which means we will be able to see the main effects of each moderator as well as the results from individual combinations of the levels of the moderators. **Note**: This table of results is larger than can fit in most HTML viewers. Use the arrow buttons in the header to scroll horizonally through the content and use the page numbers in the lower-right corner of the object to scroll vertically. ```{r, eval=TRUE, echo=TRUE} (gonzalezmule <- ma_r(ma_method = "ic", rxyi = rxyi, n = n, construct_x = "GMA", construct_y = "OCB", rxx = rxxi, ryy = ryyi, ux = ux, indirect_rr_x = TRUE, moderators = c("Rating source", "Type", "Published"), moderator_type = "hierarchical", control = control_psychmeta(hs_override = TRUE), data = data_r_gonzalezmule_2014)) ``` From this tibble, we see that there are a total of 13 meta-analyses (each row of the tibble is a distinct analysis representing a unique set of construct pairings and/or combinations of moderator levels) and these meta-analyses are indexed by a unique number in the `analysis_id` column. We also see that the `pair_id` column indexes the individual construct pairs (in this example, there is only one contstruct pair: GMA and OCB), with the names of the constructs displayed in the `construct_x` and `construct_y` columns. In terms of moderator analyses, the `analysis_type` column tells us how moderator information was treated in each meta-analysis. Analyses labeled as "Overall" represent meta-analyses of all information, with all observations collapsed across moderator levels (when moderator levels are collapsed, it is indicated by "All Levels" in the columns bearing the moderator names). Analyses labeled as "Simple Moderator" represent meta-analyses of a particular level of a single moderator, collapsing across all of the other moderators (the moderator levels used in these analyses are indicated in the columns bearing the moderator names). Finally, analyses labled as "Fully Hierarchical Moderator" represent intersections of the categories of the moderators. Note that this database is smaller than what would be needed to do a good hierarchical moderator analysis and we have run this analysis for purposes of illustration only. When hierarchical moderator analyses are run with more than two moderators, results will also be diplayed for "Partial Hierarchical Moderator" analyses, which means that the results represent intersections among only a subset of the moderators while levels of the other moderators are collapsed. To the right of the moderator columns are columns that contain data and results from the meta-analyses. The `meta_tables` column contains the meta-analysis tables for each analysis, the `escalc` column contains the data from each meta-analysis (these data are in the format of "escalc" class objects that can be used with the **metafor** package's functions), and the `ad` column contains artifact distributions from psychometric meta-analyses (each cell contains artifact distributions from individual-correction analyses, if present, and from artifact-distribution analyses, if present). To see a summary view of the statistical results of the meta-analysis, use the `summary()` function. ```{r, eval=FALSE} summary(gonzalezmule) ``` To retrieve tables from all meta-analyses in the object, you can use the `get_metatab()` function (this information is also included in the output object from the `summary()` function). For example, below is the true-score correlation table from our meta-analysis of Gonzalez-Mulé et al.'s (2014) data. This table communicates the bare-bones and fully corrected results for all of the relationships in the meta-analysis object. ```{r, eval=TRUE} metatab_list <- get_metatab(gonzalezmule) metatab_list$individual_correction$true_score ``` Of course, it is also possible to directly pull the results of a particular analysis out of the meta-analysis object, but the tables retrieved with this method will not contain the `analysis_id`, `pair_id`, construct, or moderator information included in the tables from `get_metatab()`. ```{r, eval=TRUE} gonzalezmule$meta_tables$`analysis_id: 1`$individual_correction$true_score ``` # Follow-up analyses `ma_r()` will automatically compute the primary results that should be reported for meta-analyses, including total sample size (_N_), number of studies (_k_), average effect sizes (_mean_r_ and _mean_rho_), observed (_sd_r_ and _sd_r_c_) and residual (_sd_res_ and _sd_rho_) standard deviations, and confidence intervals and credibility intervals. These results are sufficient for evaluating average effect sizes, heterogeneity of effects, and precision or uncertainty in the results. **`psychmeta`** also provides a variety of follow-up analyses to supplement the primary meta-analysis results. ## `sensitivity()` `sensitivity()` computes several different types of sensitivity analyses for meta-analytic results: * `bootstrap` (calculate bootstrap confidence intervals on all meta-analysis results) * `cumulative` (calculate cumulative meta-analyses—a form of publication bias analysis) * `leave1out` (compute meta-anlayses leaving out 1 sample at a time—for outlier/influential point detection) You can either call all sensitivity analyses at once (`sensitivity()`) or call each type of analysis individually (`sensitivity_bootstrap()`, `sensitivity_cumulative()`, `sensitivity_leave1out()`): ```{r, eval=FALSE} ma_obj <- sensitivity(ma_obj, leave1out = TRUE, bootstrap = TRUE, cumulative = TRUE, sort_method = "weight", boot_iter = 100, boot_conf_level = 0.95, boot_ci_type = "norm") ``` \pagebreak ## `metareg()` `metareg()` computes meta-regression results using both categorical and continuous moderators via the `rma()` function from the **`metafor`** package. ```{r, eval=FALSE} ma_obj_gm <- ma_r(ma_method = "ic", rxyi = rxyi, n = n, rxx = rxxi, ryy = ryyi, moderators = c("Rating source", "Complexity"), data = data_r_gonzalezmule_2014) ma_obj_gm <- metareg(ma_obj_gm) get_metareg(ma_obj = ma_obj_gm)[[1]]$individual_correction$true_score ma_obj_mc <- ma_r(ma_method = "ic", rxyi = r, n = N, moderators = c("Parent", "Age"), cat_moderators = c(TRUE, FALSE), data = data_r_mcleod_2007) ma_obj_mc <- metareg(ma_obj_mc, formula_list = list("Parent*Age" = yi ~ Parent*Age)) get_metareg(ma_obj_mc)[[1]]$individual_correction$true_score ``` Note that the relevant sample size for meta-regression is the number of studies (_k_). Meta-regression results should be interpreted cautiously when _k_ is small or when there is low variability in moderator levels across studies [@SchmidtMetaRegression2017]. In these and other cases, hierarhical subgroup moderator analyses can yield more reliable results: ```{r, eval=FALSE} ma_obj_gm <- ma_r(ma_method = "ic", rxyi = rxyi, n = n, rxx = rxxi, ryy = ryyi, moderators = c("Rating source", "Complexity"), moderator_type = "hierarchical", data = data_r_gonzalezmule_2014) ma_obj_gm ``` ## `heterogeneity()` **`psychmeta`** automatically estimates residual standard deviations (_SD~res~_ and _SD~$\rho$~_) and credibility intervals. These statistics are the best way to examine heterogeneity in meta-anlayses [@WiernikEmpiricalBenchmarks2017]. The `heterogeneity()` function computes additional heterogeneity statistics (e.g., % Variance Accounted For, _Q_, _I_^2^) that may be requested by some journals or reviewers; we don't recommend these statistics for regular use. ```{r, eval=FALSE} ma_obj <- heterogeneity(ma_obj) ``` \pagebreak # Results presentation **`psychmeta`** has several functions to access meta-analysis results and quickly prepare them for publication. The most basic way to access results is to print the `ma_obj` results object: ```{r, eval=FALSE} ma_obj <- ma_r(ma_method = "ic", sample_id = sample_id, rxyi = rxyi, n = n, construct_x = x_name, construct_y = y_name, rxx = rxxi, ryy = ryyi, moderators = moderator, clean_artifacts = FALSE, impute_artifacts = FALSE, data = data_r_meas_multi) ma_obj ``` You can limit printing to specific constructs and/or construct pairs using the `filter_ma()` function: ```{r, eval=FALSE} filter_ma(ma_obj, analyses = list(construct = "Y") ) filter_ma(ma_obj, analyses = list(construct_pair = list(c("X", "Z"))) ) ``` Filtering on other analysis features, such as moderator levels or minimum _N_ and/or _k_, is also possible. **`psychmeta`** also includes a variety of `get_stuff()` functions to retrive specific results from an `ma_obj` object. For example: ```{r, eval=FALSE} # Heterogeneity statistics ma_obj <- heterogeneity(ma_obj) out_heterogeneity <- get_heterogeneity(ma_obj) out_heterogeneity$`analysis id: 1`$barebones # Sensitivity analyses ma_obj <- sensitivity(ma_obj, bootstrap = FALSE) out_leave1out <- get_leave1out(ma_obj) out_leave1out$`analysis id: 1`$barebones out_cumulative <- get_cumulative(ma_obj) out_cumulative$`analysis id: 1`$barebones # Meta-regression analyses ma_obj <- metareg(ma_obj) out_metareg <- get_metareg(ma_obj) out_metareg$`analysis id: 1`$barebones # A summary of artifact distributions get_ad(ma_obj, ma_method = "ic") ``` \pagebreak ## Writing formatted results tables to Word, PDF, Markdown, or other formats **`psychmeta`** can automatically produce publication-ready tables of meta-analysis results that can be copied into your word processor or inserted into an RMarkdown document. ```{r, eval=FALSE} # Write meta-analysis tables to a Word file called "Meta-analysis table.docx" metabulate(ma_obj, file = "Meta-analysis table.docx", output_format = "word", output_dir = tempdir()) ``` ## Generating plots **`psychmeta`** can produce both funnel and forest plots for visually presenting meta-analytic results. ```{r, eval=FALSE} # Create funnel and forest plots ma_obj <- plot_funnel(ma_obj = ma_obj) ma_obj <- plot_forest(ma_obj = ma_obj, ma_facetname = "MA") # Extract plots for viewing out_plots <- get_plots(ma_obj) out_plots$funnel$`analysis id: 1`$barebones out_plots$forest$`analysis id: 1`$moderated$barebones # The sensitivity_cumulative() and sensitivity_leave1out() functions also produce plots. out_plots$leave1out$`analysis id: 1`$barebones$plots out_plots$cumulative$`analysis id: 1`$barebones$plots ``` ```{r, eval=TRUE, echo=FALSE, results = "hide", fig.keep="high"} ma_obj_print <- ma_r(ma_method = "ic", sample_id = sample_id, rxyi = rxyi, n = n, construct_x = x_name, construct_y = y_name, rxx = rxxi, ryy = ryyi, moderators = moderator, clean_artifacts = FALSE, impute_artifacts = FALSE, data = data_r_meas_multi) ma_obj_print <- plot_funnel(ma_obj = ma_obj_print) ma_obj_print$funnel$`analysis id: 1`$barebones ``` ## Generating reference lists **`psychmeta`** can automatically generate a formatted reference list using a BibTeX or Zotero library. First, specify the `citekey` argument when running `ma_r()`. Then use the `generate_bib()` function: ```{r, eval=FALSE} ma_obj <- ma_r(ma_method = "ic", rxyi = rxyi, n = n, construct_x = x_name, construct_y = y_name, rxx = rxxi, ryy = ryyi, moderators = moderator, clean_artifacts = FALSE, impute_artifacts = FALSE, citekey = citekey, data = data_r_meas_multi) generate_bib(ma_obj, # The location of your .bib file (normally should be in the same directory # as the analysis script). bib = system.file("templates/sample_bibliography.bib", package="psychmeta"), # Your citation style. Must be the style ID for a style hosted at # http://zotero.org/styles/ style = "apa", # What format to write output as (options are: "word", "html", "pdf", "text", # "Rmd", "biblatex", "citekeys")? output_format = "word", # What filename to output to (use "console" to print to the R console). file = "sources.docx" ) ``` You can also generate a bibliography containing only sources contributing to a specific subset of the meta-analyses: ```{r, eval=FALSE} generate_bib(ma_obj, bib = system.file("templates/sample_bibliography.bib", package="psychmeta"), analyses = list(construct_pair = list(c("X", "Y"))) ) ``` \pagebreak # Supplemental tools ## Computing meta-analyses with pre-existing artifact distributions Even if no studies in your meta-analytic database provide artifact information, pre-specified artifact distributions from previous meta-analyses can still be used in **`psychmeta`**! There are several ways to use pre-specified distributions, ranging from using them on their on their own to combining them with observed artifact information. Below are four potential methods, but **`psychmeta`** allows many other ways to use artifact distributions. 1. You can use the `create_ad()` function to create artifact distribution objects, which can then be used with the `ma_r_ad()` and `ma_d_ad()` functions to correct bare-bones meta-analyses. ```{r, eval=FALSE} # Create artifact-distribution objects for X, Y, and Z. ad_x <- create_ad(mean_qxi = 0.8927818, var_qxi = 0.0008095520, k_qxi = 40, mean_n_qxi = 11927 / 40, qxi_dist_type = "alpha") ad_y <- create_ad(mean_qxi = 0.8927818, var_qxi = 0.0008095520, k_qxi = 40, mean_n_qxi = 11927 / 40, qxi_dist_type = "alpha") ad_z <- create_ad(mean_qxi = 0.8962108, var_qxi = 0.0007840593, k_qxi = 40, mean_n_qxi = 11927 / 40, qxi_dist_type = "alpha") # Compute a meta-analysis of X and Y. ma_bb_xy <- ma_r_bb(r = rxyi, n = n, data = filter(data_r_meas_multi, x_name == "X", y_name == "Y")) # Correct the meta-analysis for measurement error. ma_r_ad(ma_obj = ma_bb_xy, ad_obj_x = ad_x, ad_obj_y = ad_y, correct_rr_x = FALSE, correct_rr_y = FALSE) ``` 2. Rather than applying a two-step process (first compute bare-bones, then apply corrections), you can use the `ma_r()` function to automate your workflow! Just supply the artifact distributions as a list with the `supplemental_ads` argument. ```{r, eval=FALSE} ma_r(ma_method = "ad", rxyi = rxyi, n = n, correct_rr_x = FALSE, correct_rr_y = FALSE, construct_x = x_name, construct_y = y_name, sample_id = sample_id, clean_artifacts = FALSE, impute_artifacts = FALSE, moderators = moderator, data = data_r_meas_multi, # The "supplemental_ads" argument can take a list of artifact-distribution objects. supplemental_ads = list(X = ad_x, Y = ad_y, Z = ad_z)) ``` \pagebreak 3. You can bypass the manual creation of artifact distribution objects and have **`psychmeta`** do it for you. Instead of running `create_ad()`, you can create a list of artifact information to use as the `supplemental_ads` argument in `ma_r()` or `ma_d()`. This list contains artifact information for each construct you want to supply information for. This is a "list of lists" - each construct for which you provide information gets its own list labeled with the construct name. Each construct's list should contain entries that are named like the arguments to the `create_ad()` function. ```{r, eval=FALSE} ma_r(ma_method = "ad", rxyi = rxyi, n = n, correct_rr_x = FALSE, correct_rr_y = FALSE, construct_x = x_name, construct_y = y_name, sample_id = sample_id, clean_artifacts = FALSE, impute_artifacts = FALSE, moderators = moderator, data = data_r_meas_multi, supplemental_ads = # The "supplemental_ads" argument can also take raw artifact values. # (These values are unrounded so our examples all produce identical results.) list(X = list(mean_qxi = 0.8927818, var_qxi = 0.0008095520, k_qxi = 40, mean_n_qxi = 11927 / 40, qxi_dist_type = "alpha"), Y = list(mean_qxi = 0.8941266, var_qxi = 0.0009367234, k_qxi = 40, mean_n_qxi = 11927 / 40, qxi_dist_type = "alpha"), Z = list(mean_qxi = 0.8962108, var_qxi = 0.0007840593, k_qxi = 40, mean_n_qxi = 11927 / 40, qxi_dist_type = "alpha"))) ``` 4. You can also create a list of artifact distributions using the `create_ad_list()` function. This list can then be passed to `ma_r()` as before. ```{r, eval=FALSE} ad_list <- create_ad_list(n = n, rxx = rxxi, ryy = ryyi, construct_x = x_name, construct_y = y_name, sample_id = sample_id, data = data_r_meas_multi) ma_r(ma_method = "ad", rxyi = rxyi, n = n, correct_rr_x = FALSE, correct_rr_y = FALSE, construct_x = x_name, construct_y = y_name, sample_id = sample_id, clean_artifacts = FALSE, impute_artifacts = FALSE, moderators = moderator, data = data_r_meas_multi, # The "supplemental_ads" argument can take the output of create_ad_list(). supplemental_ads = ad_list) ``` 5. It's possible to mix-and-match any of the above methods, as well as use artifacts from your database. ```{r, eval=FALSE} # For purposes of illustration, delete all reliability values not associated with "Z": dat_zarts <- data_r_meas_multi dat_zarts[,"rxxi"] <- dat_zarts[dat_zarts$y_name != "Z","ryyi"] <- NA # Compute a meta-analysis using three different types of artifact formats: ma_r(ma_method = "ad", rxyi = rxyi, n = n, # Observed artifacts can be provided along with the "supplemental_ads" argument. ryy = ryyi, correct_rr_x = FALSE, correct_rr_y = FALSE, construct_x = x_name, construct_y = y_name, sample_id = sample_id, clean_artifacts = FALSE, impute_artifacts = FALSE, moderators = moderator, data = dat_zarts, supplemental_ads = list(X = ad_x, Y = list(mean_qxi = 0.8941266, var_qxi = 0.0009367234, k_qxi = 40, mean_n_qxi = 11927 / 40, qxi_dist_type = "alpha"))) ``` # Simulations In addition to its collection of tools for computing meta-analyses, **`psychmeta`** also boasts a robust suite of simulation functions. These functions are intended to support high-quality meta-analytic simulations by correctly introducing sampling error, measurement error, and range restriction into a set of parameter values. These functions can be valuable for both research and pedagogical applications. ## Psychometric data sets You can generate data impacted by psychometric artifacts for a single sample by using the `simulate_psych()` function. This function reports three datasets per sample: Observed scores, true scores, and error scores. It can also perform selection and indicate (1) which cases would be selected on the basis of observed scores (i.e., actual selection), (2) which cases would be selected on the basis of true scores (i.e., error-free selection), and (3) which cases would be selected on the basis of error scores (i.e., random selection). ```{r, eval=FALSE} sim_dat <- simulate_psych(n = 1000, rho_mat = reshape_vec2mat(.5), sigma_vec = c(1, 1), sr_vec = c(1, .5), rel_vec = c(.8, .8), var_names = c("Y", "X")) ``` ```{r, eval=FALSE, echo=FALSE} sim_dat cor(sim_dat$observed[,1:2]) cor(sim_dat$true[,1:2]) cor(sim_dat$error[,1:2]) ``` ## Psychometric correlations You can automate the process of analyzing a simulated sample when you are interested in effect sizes and artifact estimates. For example, you can generate correlations impacted by psychometric artifacts for a single sample of data using the `simulate_r_sample()` function. This function generates sample statistics when the `n` argument is a finite value, but it can also analyze parameters when `n = Inf`! ```{r, eval=FALSE} simulate_r_sample(n = 1000, # Correlation parameter matrix rho_mat = reshape_vec2mat(.5, order = 5), # Reliability parameter vector rel_vec = rep(.8, 5), # Selection ratio vector sr_vec = c(1, 1, 1, 1, .5), # Number of items in each scale k_items_vec = 1:5, # Matrix of weights to use in creating composites wt_mat = cbind(c(0, 0, 0, .3, 1), c(1, .3, 0, 0, 0)), # Selection ratios for composites sr_composites = c(.7, .5)) simulate_r_sample(n = Inf, rho_mat = reshape_vec2mat(.5, order = 5), rel_vec = rep(.8, 5), sr_vec = c(1, 1, 1, 1, .5), k_items_vec = 1:5, wt_mat = cbind(c(0, 0, 0, .3, 1), c(1, .3, 0, 0, 0)), sr_composites = c(.7, .5)) ``` \pagebreak One of the most powerful simulation functions in **`psychmeta`** is `simulate_r_database()`, which can generate entire databases of correlations for use in meta-analysis simulations. This function is a wrapper for `simulate_r_sample()` and it can sample parameter values from whichever distributions you specify. ```{r, eval=FALSE} # Note the varying methods for defining parameters: simulate_r_database(k = 10, # Sample-size parameters # Parameters can be defined as functions. n_params = function(n) rgamma(n, shape = 100), # Correlation parameters (one parameter distribution per correlation) # Parameters can also be defined as vectors... rho_params = list(c(.1, .3, .5), # ...as a mean + a standard deviation... c(mean = .3, sd = .05), # ...or as a matrix of values and weights. rbind(value = c(.1, .3, .5), weight = c(1, 2, 1))), # Reliability parameters rel_params = list(c(.7, .8, .9), c(mean = .8, sd = .05), rbind(value = c(.7, .8, .9), weight = c(1, 2, 1))), # Selection-ratio parameters sr_params = list(1, 1, c(.5, .7)), # Measure-length parameters k_items_params = list(5, 8, 10), # Composite weight parameters wt_params = list(list(1, 1, 0)), # Selection-ratio parameters for composites sr_composite_params = list(1), # Variable names var_names = c("X", "Y", "Z")) ``` \pagebreak ## Psychometric *d* values **`psychmeta`** includes *d*-value simulation functions that parallel the correlation simulation functions described above. These functions automate the simulation of subgroup comparisons and can introduce sampling error, measurement error, and range-restriction artifacts into *d*-value parameters. To simulate a single sample of results, use the `simulate_d_sample()` function. This function can generate data for any number of variables with any number of groups and it can form composite variables and perform selection. ```{r simulate_d_sample stats, eval=FALSE} ## Simulate statistics by providing integers as "n_vec": simulate_d_sample(n_vec = c(200, 100), # List of rho matrices - one per group rho_mat_list = list(reshape_vec2mat(.5), reshape_vec2mat(.4)), # Matrix of group means (groups on rows, variables on columns) mu_mat = rbind(c(1, .5), c(0, 0)), # Matrix of group SDs (groups on rows, variables on columns) sigma_mat = rbind(c(1, 1), c(1, 1)), # Matrix of group reliabilities (groups on rows, variables on columns) rel_mat = rbind(c(.8, .7), c(.7, .7)), # Vector of selection ratios sr_vec = c(1, .5), # Number of items in each scale k_items_vec = c(5, 10), # Group names group_names = c("A", "B")) ``` Like `simulate_r_sample()`, `simulate_d_sample()` can also analyze parameters. To get parameter estimates, simply define the sample sizes as proportions (with the `n_vec` argument) rather than as integers. ```{r simulate_d_sample params, eval=FALSE} simulate_d_sample(n_vec = c(2/3, 1/3), rho_mat_list = list(reshape_vec2mat(.5), reshape_vec2mat(.4)), mu_mat = rbind(c(1, .5), c(0, 0)), sigma_mat = rbind(c(1, 1), c(1, 1)), rel_mat = rbind(c(.8, .7), c(.7, .7)), sr_vec = c(1, .5), k_items_vec = c(5, 10), group_names = c("A", "B")) ``` \pagebreak The `simulate_d_database()` function is the *d*-value counterpart to `simulate_r_database()`. However, each parameter argument now requires a set of parameters for each group (e.g., a distribution of correlations, means, reliabilities, etc. must be specified for each group). ```{r simulate_d_database, eval=FALSE} simulate_d_database(k = 5, # "Group1" and "Group2" labels are not required for parameter arguments: # They are shown only for enhanced interpretability. # Sample-size parameter distributions n_params = list(Group1 = c(mean = 200, sd = 20), Group2 = c(mean = 100, sd = 20)), # Correlation parameter distributions rho_params = list(Group1 = list(c(.3, .4, .5)), Group2 = list(c(.3, .4, .5))), # Mean parameter distributions mu_params = list(Group1 = list(c(mean = .5, sd = .5), c(-.5, 0, .5)), Group2 = list(c(mean = 0, sd = .5), c(-.2, 0, .2))), # Reliability parameter distributions rel_params = list(Group1 = list(.8, .8), Group2 = list(c(.7, .8, .8), c(.7, .8, .8))), # Group names group_names = c("Group1", "Group2"), # Variable names var_names = c("Y", "Z")) ``` # Multivariate range-restriction corrections In addition to its array of univariate and bivariate range-restriction corrections, **`psychmeta`** includes functions to correct data for multivariate range restriction. The `correct_matrix_mvrr()` corrects covariance matrices for range restriction and the `correct_means_mvrr()` function corrects vectors of means for range restriction. ```{r correct mvrr, eval=FALSE} # Create example matrices with different variances and covariances: Sigma_i <- reshape_vec2mat(cov = .2, var = .8, order = 4) Sigma_xx_a <- reshape_vec2mat(cov = .5, order = 2) # Correct "Sigma_i" for range restriction in variables 1 & 2: correct_matrix_mvrr(Sigma_i = Sigma_i, Sigma_xx_a = Sigma_xx_a, x_col = 1:2) # Correct the means of variables 1 & 2 for range restriction: correct_means_mvrr(Sigma = Sigma_i, means_i = c(2, 2, 1, 1), means_x_a = c(1, 1), x_col = 1:2) ``` \pagebreak # Composites ## Spearman-Brown reliability estimates The Spearman-Brown formula can be used to estimate the reliability of a measure that has been lengthened (or shortened) *k* times. This formula is implemented in the `estimate_rel_sb()` function and its inverse (which estimates the number of times a measure needs to be lengthened to achieve a desired reliability) is implemented in the `estimate_length_sb()` function. ```{r spearman-brown, eval=FALSE} estimate_rel_sb(rel_initial = .5, k = 4) estimate_length_sb(rel_initial = .5, rel_desired = .8) ``` ## Correlations between a variable and a composite or between two composites The `composite_r_matrix()` and `composite_r_scalar()` functions can compute composite correlations using matrix and scalar inputs, respectively. Both functions are capable of computing correlations between a composite and a single variable or between two composites. ```{r single composite r, eval=FALSE} # Correlation between a variable and a composite composite_r_matrix(r_mat = reshape_vec2mat(.4, order = 5), x_col = 2:5, y_col = 1) composite_r_scalar(mean_rxy = .4, k_vars_x = 4, mean_intercor_x = .4) # Correlation between two composites composite_r_matrix(r_mat = reshape_vec2mat(.3, order = 5), x_col = 1:3, y_col = 4:5) composite_r_scalar(mean_rxy = .3, k_vars_x = 3, mean_intercor_x = .3, k_vars_y = 2, mean_intercor_y = .3) ``` ## *d* values of composites The `composite_d_matrix()` and `composite_d_scalar()` functions can compute composite *d* values using matrix and scalar inputs, respectively. ```{r, eval=FALSE} composite_d_matrix(d_vec = c(1, 1), r_mat = reshape_vec2mat(.7), wt_vec = c(1, 1), p = .5) composite_d_scalar(mean_d = 1, mean_intercor = .7, k_vars = 2, p = .5) ``` ## Reliability of composites The `composite_rel_matrix()` and `composite_rel_scalar()` functions can compute composite reliabilities (i.e., Mosier reliabilities) using matrix and scalar inputs, respectively. ```{r, eval=FALSE} composite_rel_matrix(rel_vec = c(.8, .8), r_mat = reshape_vec2mat(.4), sd_vec = c(1, 1)) composite_rel_scalar(mean_rel = .8, mean_intercor = .4, k_vars = 2) ``` \pagebreak # Matrix regression The `lm_mat()` function mimics the structure and appearance of the `lm()` function from the **`stats`** package, but uses correlation/covariance matrices to compute regression models instead of raw data sets. ```{r, eval=FALSE} # For purposes of demonstration, generate a dataset from the following matrix: S <- reshape_vec2mat(cov = c(.3 * 2 * 3, .4 * 2 * 4, .5 * 3 * 4), var = c(2, 3, 4)^2, var_names = c("X", "Y", "Z")) mean_vec <- setNames(c(1, 2, 3), colnames(S)) dat <- data.frame(MASS::mvrnorm(n = 100, mu = mean_vec, Sigma = S, empirical = TRUE)) ``` The same formulas can be used with `lm_mat()` as with `lm()`, you just have to provide `cov_mat` and `n` arguments (and an optional `mean_vec` argument) instead of `data`. However, `lm_mat()` can only compute moderated regressions if the product term is already in `cov_mat`. ```{r, eval=FALSE} # Compute regression models using one predictor: lm_out1 <- lm(formula = Y ~ X, data = dat) lm_mat_out1 <- lm_mat(formula = Y ~ X, cov_mat = S, mean_vec = mean_vec, n = nrow(dat)) # Compute regression models using two predictors: lm_out2 <- lm(formula = Y ~ X + Z, data = dat) lm_mat_out2 <- lm_mat(formula = Y ~ X + Z, cov_mat = S, mean_vec = mean_vec, n = nrow(dat)) ``` The `summary()` method works for both `lm()` and `lm_mat()` results and the printouts look identical. ```{r, eval=FALSE} summary(lm_out1) summary(lm_mat_out1) ``` `lm_mat()` results can also be used with functions such as `anova()`, `predict()`, and `confint()` to yield output comparable to what one would get with `lm()` results. ```{r, eval=FALSE} anova(lm_out1, lm_out2) anova(lm_mat_out1, lm_mat_out2) ``` # Matrix regression applied to meta-analytic correlations The `get_matrix()` function can extract correlation matrices from meta-analysis objects for use in subsequent multivariate analyses. ```{r, eval=FALSE} mat_array <- get_matrix(ma_obj = ma_obj) R <- mat_array$individual_correction$`moderator_comb: 1`$true_score$mean_rho n <- mat_array$individual_correction$`moderator_comb: 1`$true_score$N[1,3] ``` The `lm_mat()` function can be used to compute regressions from meta-analytic matrices, using the mean *N* or harmonic mean *N* as a sample size. ```{r, eval=FALSE} lm_x <- lm_mat(formula = Y ~ X, cov_mat = R, n = n) lm_xz <- lm_mat(formula = Y ~ X + Z, cov_mat = R, n = n) ``` Meta-analytic incremental validity analyses are simple with the `anova()` method applied to `lm_mat()` results. ```{r, eval=FALSE} summary(lm_x) summary(lm_xz) anova(lm_x, lm_xz) ``` \pagebreak # Best practices for publishing with **`psychmeta`**: 1. Include your **`R`** script as an appendix or online supplement. 2. Include your data table as a CSV in an online supplement. 3. Cite **`psychmeta`**. - To cite **`psychmeta`**, please reference our in-press software announcement in *Applied Psychological Measurement* [@psychmetaAPM, see the bibliography below for reference information] or run `citation("psychmeta")`. # Bibliography