---
title: "Choosing the right test (and mixed models)"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Choosing the right test (and mixed models)}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

# The mixed-model fits and their reporters need packages that live in
# Suggests. Guard the dependent chunks so the vignette still builds without
# them; the classify_outcome()/recommend_test() chunks need none of these.
has_mixed <- requireNamespace("lme4", quietly = TRUE) &&
  requireNamespace("ordinal", quietly = TRUE) &&
  requireNamespace("parameters", quietly = TRUE)
```

```{r}
library(colleyRstats)
```

Picking a statistical test is not a matter of taste: it follows from properties
of the data. `colleyRstats` makes that reasoning explicit. This vignette walks
through the decision helpers and then shows how to fit and report the models
they recommend.

## The idea: scale x dependence x assumptions

A principled model choice is the product of three questions, in order:

1. **What is the outcome's measurement scale?** This fixes the model *family*
   (Gaussian, binomial, Poisson, cumulative-link) before anything else.
2. **Are the observations independent or clustered?** Repeated measures or any
   grouped structure needs random effects, i.e. a *mixed* model.
3. **For a continuous outcome, do the parametric assumptions hold?** Group-wise
   normality (and, between subjects, homogeneity of variance) decides between a
   parametric and a rank-based method.

`classify_outcome()` answers the first question. It maps a variable to one of
`"continuous"`, `"ordinal"`, `"binary"`, `"count"`, or `"nominal"` using simple,
transparent rules (ordered factor -> ordinal; two distinct values -> binary; a
few-valued integer -> ordinal/Likert; a non-negative integer with more values ->
count; anything else numeric -> continuous).

```{r}
set.seed(1)
n_id <- 24
d <- data.frame(
  id   = factor(rep(seq_len(n_id), each = 3)),
  cond = factor(rep(c("A", "B", "C"), times = n_id))
)

# Give condition a genuine effect so the example models are well identified.
step <- c(A = 0, B = 1.3, C = 2.4)[as.character(d$cond)]
d$score   <- as.numeric(step + rnorm(nrow(d)))
d$rating  <- ordered(pmin(5L, pmax(1L, round(step + rnorm(nrow(d), sd = 0.7) + 2))))
d$correct <- rbinom(nrow(d), 1, plogis(step - 1))

classify_outcome(d$score)    # continuous
classify_outcome(d$rating)   # ordinal (ordered factor)
classify_outcome(d$correct)  # binary (two distinct values)
```

Getting the scale right matters because it, not the analyst's habit, dictates
the family: a 1-5 rating is not an interval score, and a 0/1 accuracy is not
Gaussian. When a heuristic is genuinely ambiguous (a wide Likert item versus a
small count) you can override it via the `outcome_type` argument of
`recommend_test()`.

## `recommend_test()` as the decision helper

`recommend_test()` runs all three questions and returns a
`"colley_recommendation"` object. It carries the fields that let you act on the
advice: `recommendation` (a human-readable label), `model_function` (the R
function to call), `reporter` (the matching colleyRstats reporter), `fit_call`
(a ready-to-edit call), `rationale` (why), and `methods_text` (an APA-style
sentence). A `print` method summarises it.

The same outcome routes to different models depending on scale and dependence.
An **ordinal** outcome measured repeatedly within `id` gives a cumulative link
mixed model (CLMM):

```{r}
rec_clmm <- recommend_test(d, outcome = "rating", predictors = "cond", cluster = "id")
rec_clmm
```

A **binary** outcome with the same clustering gives a binomial generalized
linear mixed model (GLMM):

```{r}
rec_glmm <- recommend_test(d, outcome = "correct", predictors = "cond", cluster = "id")
rec_glmm
```

A **continuous** outcome compared **between** subjects (no `cluster`) triggers
the assumption checks and lands on ANOVA or its rank-based fallback:

```{r}
rec_anova <- recommend_test(d, outcome = "score", predictors = "cond")
rec_anova
```

Each recommendation also exposes machine-usable fields and a paste-ready methods
sentence:

```{r}
rec_clmm$model_function
rec_clmm$reporter
rec_clmm$fit_call
cat(rec_glmm$methods_text)
```

## Fitting and reporting the recommended models

Once a model is fitted, the reporters turn it into manuscript-ready LaTeX/APA
sentences, one per fixed-effect term, with the effect size, its confidence
interval, the test statistic, and the p-value. `reportGLMM()` handles
`lme4::lmer` / `lme4::glmer` / `glmmTMB::glmmTMB` and plain `lm`/`glm`;
`reportCLMM()` handles `ordinal::clmm` / `ordinal::clm`. The reporters pick the
effect-size scale from the family: **odds ratios** for binomial and
cumulative-link models, **incidence-rate ratios** for counts, and raw
coefficients (`b`) with a `t`/`z` statistic for Gaussian fits.

The fits below require **lme4**, **ordinal**, and **parameters** (all in
Suggests), so the chunk is guarded by `has_mixed`; the vignette still builds
without them.

For the **ordinal** recommendation, fit the CLMM and report it with
`reportCLMM()`. Because the family is cumulative-link, the effects are reported
as odds ratios (the multiplicative change in the odds of a higher rating):

```{r, eval = has_mixed}
m_clmm <- ordinal::clmm(rating ~ cond + (1 | id), data = d)
reportCLMM(m_clmm, dv = "rating")
```

For the **binary** recommendation, fit the binomial GLMM and report it with
`reportGLMM()`; the binomial family is likewise exponentiated to odds ratios,
with a `z` statistic:

```{r, eval = has_mixed}
m_glmm <- lme4::glmer(correct ~ cond + (1 | id), data = d, family = binomial)
reportGLMM(m_glmm, dv = "accuracy")
```

`reportGLMM()` also covers **Gaussian** mixed and ordinary models. Here a linear
mixed model is reported on the raw coefficient scale, with `b` and a `t(df)`
statistic rather than an odds ratio -- the reporter adapts the wording to the
family automatically:

```{r, eval = has_mixed}
m_lmm <- lme4::lmer(score ~ cond + (1 | id), data = d)
reportGLMM(m_lmm, dv = "score")
```

Both reporters return the sentences invisibly (and emit them via `message()`),
and can optionally copy to the clipboard (`write_to_clipboard = TRUE`) or write
a `.tex` file to `\input{}` in a manuscript (`sink_to =`). The LaTeX uses the
`\p` / `\pminor` macros from `latex_preamble()`.

## Next steps

- Override an ambiguous scale with `recommend_test(..., outcome_type = ...)`.
- For non-parametric routes surfaced by the continuous branch, see
  `reportDunnTest()`, `reportArtCon()`, and `reportNparLD()`.
- Use `rec$methods_text` as a first draft of the Methods paragraph.