Numerical core¶

These modules implement the numerical primitives behind RaschModel. Most users will not need to call them directly.

CML estimator¶

pyfies.core.cml ¶

Weighted Conditional Maximum Likelihood estimator for the dichotomous Rasch model.

Estimates item severity parameters :math:\beta_1, \ldots, \beta_k by maximizing the conditional likelihood of the response patterns given the raw scores. The conditional likelihood is invariant under a uniform shift of all severities; we resolve this with a sum-to-zero identification constraint (the same convention used by FAO's RM.weights and the global standard itself). Rows with any missing item response are dropped before fitting.

The negative conditional log-likelihood is convex in :math:\beta, so a quasi-Newton method (L-BFGS-B with analytic gradient) converges to the unique MLE.

References

Cafiero, C., Viviani, S., & Nord, M. (2018). Food security measurement in a global context: The Food Insecurity Experience Scale. Measurement, 116, 146-152.

CMLFit `dataclass` ¶

Result of a weighted CML fit.

Attributes:

Name	Type	Description
`beta`	`NDArray[float64]`	Item severities (sum-to-zero), shape `(k,)`.
`se_beta`	`NDArray[float64]`	Asymptotic standard errors of `beta`, shape `(k,)`.
`n_complete`	`int`	Number of complete cases used for estimation (rows with no missing responses, any raw score).
`n_complete_non_extreme`	`int`	Number of complete cases with non-extreme raw scores (1 <= r <= k-1). Only these contribute to the CML log-likelihood; matches the `n.compl` field reported by RM.weights.
`n_total`	`int`	Total rows in the input.
`weighted_raw_score_counts`	`NDArray[float64]`	Weighted count :math:`N_r` for each raw score r = 0, ..., k, shape `(k+1,)`.
`weighted_item_totals`	`NDArray[float64]`	Weighted endorsement count :math:`T_i` for each item, shape `(k,)`. Computed over complete cases only.
`loglik`	`float`	Final conditional log-likelihood at the MLE.
`converged`	`bool`	Whether the optimizer reported convergence.
`n_iter`	`int`	Number of optimizer iterations.

Source code in src/pyfies/core/cml.py

@dataclass
class CMLFit:
    """Result of a weighted CML fit.

    Attributes:
        beta: Item severities (sum-to-zero), shape ``(k,)``.
        se_beta: Asymptotic standard errors of ``beta``, shape ``(k,)``.
        n_complete: Number of complete cases used for estimation
            (rows with no missing responses, any raw score).
        n_complete_non_extreme: Number of complete cases with non-extreme
            raw scores (1 <= r <= k-1). Only these contribute to the CML
            log-likelihood; matches the ``n.compl`` field reported by
            RM.weights.
        n_total: Total rows in the input.
        weighted_raw_score_counts: Weighted count :math:`N_r` for each
            raw score *r* = 0, ..., *k*, shape ``(k+1,)``.
        weighted_item_totals: Weighted endorsement count :math:`T_i` for each
            item, shape ``(k,)``. Computed over complete cases only.
        loglik: Final conditional log-likelihood at the MLE.
        converged: Whether the optimizer reported convergence.
        n_iter: Number of optimizer iterations.
    """

    beta: NDArray[np.float64]
    se_beta: NDArray[np.float64]
    n_complete: int
    n_complete_non_extreme: int
    n_total: int
    weighted_raw_score_counts: NDArray[np.float64]
    weighted_item_totals: NDArray[np.float64]
    loglik: float
    converged: bool
    n_iter: int

fit_cml ¶

fit_cml(data: NDArray[int_], weights: NDArray[float64] | None = None, max_iter: int = 100, tol: float = 1e-08) -> CMLFit

Fit the dichotomous Rasch model by weighted CML.

Parameters:

Name	Type	Description	Default
`data`	`NDArray[int_]`	Response matrix of shape `(n, k)` with values in {0, 1} or NaN for missing. Affirmative responses are coded 1.	required
`weights`	`NDArray[float64] \| None`	Optional sampling weights of shape `(n,)`. If omitted, equal weights are used. Weights are renormalized so they sum to n.	`None`
`max_iter`	`int`	Maximum optimizer iterations.	`100`
`tol`	`float`	Optimizer convergence tolerance on the gradient infinity norm.	`1e-08`

Returns:

Name	Type	Description
`A`	`CMLFit`	class:`CMLFit` with item severities, standard errors, and summary
	`CMLFit`	statistics.

Raises:

Type	Description
`ValueError`	If the matrix has fewer than 2 items, no complete cases with non-extreme raw scores, or invalid weights.

Source code in src/pyfies/core/cml.py

def fit_cml(
    data: NDArray[np.int_],
    weights: NDArray[np.float64] | None = None,
    max_iter: int = 100,
    tol: float = 1e-8,
) -> CMLFit:
    """Fit the dichotomous Rasch model by weighted CML.

    Args:
        data: Response matrix of shape ``(n, k)`` with values in {0, 1} or NaN
            for missing. Affirmative responses are coded 1.
        weights: Optional sampling weights of shape ``(n,)``. If omitted, equal
            weights are used. Weights are renormalized so they sum to *n*.
        max_iter: Maximum optimizer iterations.
        tol: Optimizer convergence tolerance on the gradient infinity norm.

    Returns:
        A :class:`CMLFit` with item severities, standard errors, and summary
        statistics.

    Raises:
        ValueError: If the matrix has fewer than 2 items, no complete cases
            with non-extreme raw scores, or invalid weights.
    """
    arr = np.asarray(data, dtype=np.float64)
    if arr.ndim != 2:
        raise ValueError("data must be a 2-D array")
    n, k = arr.shape
    if k < 2:
        raise ValueError("need at least 2 items to identify a Rasch model")

    w = _normalize_weights(weights, n)

    complete = ~np.isnan(arr).any(axis=1)
    n_complete = int(complete.sum())
    if n_complete == 0:
        raise ValueError("no complete cases; cannot fit Rasch model")

    X = arr[complete].astype(np.int8)
    w_complete = w[complete]
    raw = X.sum(axis=1)

    # Weighted item totals T_i and weighted raw-score counts N_r.
    item_totals = (X.T @ w_complete).astype(np.float64)
    raw_score_counts = np.zeros(k + 1, dtype=np.float64)
    for r in range(k + 1):
        raw_score_counts[r] = float(w_complete[raw == r].sum())

    # Extreme raw scores (0 and k) carry no information about item severities
    # under CML; they enter the prevalence calculation later but not the fit.
    informative = raw_score_counts.copy()
    informative[0] = 0.0
    informative[k] = 0.0

    # Initial guess: mild informative prior centered at zero.
    p_endorse = np.clip(item_totals / max(w_complete.sum(), 1.0), 1e-3, 1.0 - 1e-3)
    init_beta = -np.log(p_endorse / (1.0 - p_endorse))
    init_beta -= init_beta.mean()  # enforce sum-to-zero
    init_free = init_beta[:-1].copy()

    result = minimize(
        _negative_loglik_and_grad,
        init_free,
        args=(item_totals, informative),
        method="L-BFGS-B",
        jac=True,
        options={"maxiter": max_iter, "gtol": tol},
    )

    beta = _expand_to_full(result.x)

    # Standard errors: invert the Hessian on the (k-1)-dim free parametrization,
    # then map back. The Hessian on the full β is rank k-1 (singular by the
    # sum-to-zero constraint); we project it.
    H_full = _hessian_full(beta, informative)
    # Apply sum-to-zero constraint via the (k, k-1) Jacobian J where the i-th
    # free param maps to β_i and the last full param is -sum.
    J = np.vstack([np.eye(k - 1), -np.ones((1, k - 1))])  # (k, k-1)
    H_free = J.T @ H_full @ J
    cov_free = np.linalg.inv(H_free)
    cov_full = J @ cov_free @ J.T
    se_beta = np.sqrt(np.maximum(np.diag(cov_full), 0.0))

    n_non_extreme = int(((raw >= 1) & (raw <= k - 1)).sum())

    return CMLFit(
        beta=beta,
        se_beta=se_beta,
        n_complete=n_complete,
        n_complete_non_extreme=n_non_extreme,
        n_total=n,
        weighted_raw_score_counts=raw_score_counts,
        weighted_item_totals=item_totals,
        loglik=-float(result.fun),
        converged=bool(result.success),
        n_iter=int(result.nit),
    )

Elementary symmetric functions¶

pyfies.core.gamma ¶

Elementary symmetric functions of item easinesses, in log-space.

The Conditional Maximum Likelihood (CML) estimator for the Rasch model relies on the elementary symmetric functions

.. math:: \gamma_r(\varepsilon) = \sum_{|J|=r} \prod_{j \in J} \varepsilon_j

where :math:\varepsilon_i = \exp(-\beta_i) is the easiness of item i parametrized via its severity :math:\beta_i. The CML likelihood is

.. math:: L(\beta) = - \sum_i T_i \beta_i - \sum_r N_r \log \gamma_r(\varepsilon),

where :math:T_i is the (weighted) number of affirmative responses to item i and :math:N_r is the (weighted) number of respondents with raw score r.

For numerical stability we always compute :math:\log \gamma_r via the Andersen / Verhelst recursion combined with logaddexp.

References

Andersen, E. B. (1972). The numerical solution of a set of conditional estimation equations. J. Roy. Statist. Soc. B, 34, 42-54.

Verhelst, N. D., Glas, C. A. W., & van der Sluis, A. (1984). Estimation problems in the Rasch model: The basic symmetric functions. Computational Statistics Quarterly, 1, 245-262.

log_gamma ¶

log_gamma(beta: NDArray[float64]) -> NDArray[np.float64]

Compute :math:\log \gamma_r for r = 0, ..., k.

Parameters:

Name	Type	Description	Default
`beta`	`NDArray[float64]`	Item severity parameters, shape `(k,)`.	required

Returns:

Type	Description
`NDArray[float64]`	Array of shape `(k+1,)` whose r-th entry is
`NDArray[float64]`	math:`\log \gamma_r(\exp(-\beta))`.

Source code in src/pyfies/core/gamma.py

def log_gamma(beta: NDArray[np.float64]) -> NDArray[np.float64]:
    """Compute :math:`\\log \\gamma_r` for *r* = 0, ..., *k*.

    Args:
        beta: Item severity parameters, shape ``(k,)``.

    Returns:
        Array of shape ``(k+1,)`` whose *r*-th entry is
        :math:`\\log \\gamma_r(\\exp(-\\beta))`.
    """
    k = beta.shape[0]
    log_eps = -beta.astype(np.float64, copy=False)
    g = np.full(k + 1, -np.inf, dtype=np.float64)
    g[0] = 0.0
    for j in range(k):
        # Update in reverse so that g[r-1] is still the j-1 value when used.
        for r in range(j + 1, 0, -1):
            g[r] = np.logaddexp(g[r], log_eps[j] + g[r - 1])
    return g

log_gamma_minus_one ¶

log_gamma_minus_one(beta: NDArray[float64]) -> NDArray[np.float64]

Compute :math:\log \gamma_r^{(-i)} for every item i and order r.

Returns the elementary symmetric functions of the easinesses with item i excluded. Used to evaluate the CML score and the conditional probability that a respondent with raw score r endorses item i:

.. math:: P(X_i = 1 \mid R = r) = \varepsilon_i \, \gamma_{r-1}^{(-i)} / \gamma_r .

Parameters:

Name	Type	Description	Default
`beta`	`NDArray[float64]`	Item severity parameters, shape `(k,)`.	required

Returns:

Type	Description
`NDArray[float64]`	Array of shape `(k, k+1)`. Entry `[i, r]` is
`NDArray[float64]`	math:`\log \gamma_r^{(-i)}`. Entry `[i, k]` is :math:`-\infty`
`NDArray[float64]`	(cannot form a subset of size k from k-1 items).

Source code in src/pyfies/core/gamma.py

def log_gamma_minus_one(beta: NDArray[np.float64]) -> NDArray[np.float64]:
    """Compute :math:`\\log \\gamma_r^{(-i)}` for every item *i* and order *r*.

    Returns the elementary symmetric functions of the easinesses with item *i*
    excluded. Used to evaluate the CML score and the conditional probability
    that a respondent with raw score *r* endorses item *i*:

    .. math::
        P(X_i = 1 \\mid R = r) = \\varepsilon_i \\,
            \\gamma_{r-1}^{(-i)} / \\gamma_r .

    Args:
        beta: Item severity parameters, shape ``(k,)``.

    Returns:
        Array of shape ``(k, k+1)``. Entry ``[i, r]`` is
        :math:`\\log \\gamma_r^{(-i)}`. Entry ``[i, k]`` is :math:`-\\infty`
        (cannot form a subset of size *k* from *k-1* items).
    """
    k = beta.shape[0]
    log_eps = -beta.astype(np.float64, copy=False)
    out = np.full((k, k + 1), -np.inf, dtype=np.float64)
    for i in range(k):
        g = np.full(k + 1, -np.inf, dtype=np.float64)
        g[0] = 0.0
        count = 0  # how many items have been folded in so far
        for j in range(k):
            if j == i:
                continue
            count += 1
            for r in range(count, 0, -1):
                g[r] = np.logaddexp(g[r], log_eps[j] + g[r - 1])
        out[i, :] = g
    return out

conditional_endorsement_prob ¶

conditional_endorsement_prob(beta: NDArray[float64]) -> NDArray[np.float64]

Probability of endorsing each item conditional on each raw score.

Returns pi[r, i] = P(X_i = 1 | R = r) for raw scores r = 0, ..., k and items i = 0, ..., k - 1.

By construction pi[0, :] = 0 and pi[k, :] = 1: a respondent with raw score 0 endorses no items, one with raw score k endorses all of them.

Source code in src/pyfies/core/gamma.py

def conditional_endorsement_prob(beta: NDArray[np.float64]) -> NDArray[np.float64]:
    """Probability of endorsing each item conditional on each raw score.

    Returns ``pi[r, i] = P(X_i = 1 | R = r)`` for raw scores
    *r* = 0, ..., *k* and items *i* = 0, ..., *k - 1*.

    By construction ``pi[0, :] = 0`` and ``pi[k, :] = 1``: a respondent with
    raw score 0 endorses no items, one with raw score *k* endorses all of them.
    """
    k = beta.shape[0]
    log_eps = -beta.astype(np.float64, copy=False)
    log_g = log_gamma(beta)  # (k+1,)
    log_g_minus = log_gamma_minus_one(beta)  # (k, k+1)
    pi = np.zeros((k + 1, k), dtype=np.float64)
    # Raw scores 1..k-1: pi[r, i] = exp(log_eps[i] + log_g_minus[i, r-1] - log_g[r])
    # log_g[r] is finite for 0 <= r <= k.
    for r in range(1, k):
        for i in range(k):
            log_pi = log_eps[i] + log_g_minus[i, r - 1] - log_g[r]
            pi[r, i] = float(np.exp(log_pi))
    pi[k, :] = 1.0
    return pi

Person parameters¶

pyfies.core.person ¶

Post-hoc maximum-likelihood estimation of person parameters.

Once item severities :math:\beta are estimated by CML, person parameters :math:\theta_r for each raw score r = 1, ..., k - 1 are obtained by solving the marginal score equation

.. math:: r = \sum_{i=1}^{k} \frac{1}{1 + \exp(\beta_i - \theta_r)} ,

i.e. the value of :math:\theta at which the expected raw score under the Rasch model equals the observed raw score r. The corresponding measurement error is

.. math:: \mathrm{se}(\theta_r) = \Big(\sum_{i=1}^{k} p_i (1 - p_i)\Big)^{-1/2}, \quad p_i = \frac{1}{1 + \exp(\beta_i - \theta_r)} .

Extreme raw scores (r = 0 and r = k) are undefined under standard MLE. Following RM.weights we estimate them by solving for pseudo-raw-scores :math:d_0 \in (0, 1) and :math:d_k \in (k - 1, k) (defaults: 0.5 and k - 0.5).

PersonParameters `dataclass` ¶

Person severity per raw score.

Attributes:

Name	Type	Description
`theta`	`NDArray[float64]`	Estimated person severity for each raw score r = 0, ..., k, shape `(k+1,)`. Entries 0 and k use the pseudo-raw-score assumptions in `pseudo_extreme`.
`se_theta`	`NDArray[float64]`	Measurement errors for `theta`, shape `(k+1,)`.
`pseudo_extreme`	`tuple[float, float]`	The pseudo raw scores `(d0, dk)` used for the two extreme entries.

Source code in src/pyfies/core/person.py

@dataclass
class PersonParameters:
    """Person severity per raw score.

    Attributes:
        theta: Estimated person severity for each raw score *r* = 0, ..., *k*,
            shape ``(k+1,)``. Entries 0 and *k* use the pseudo-raw-score
            assumptions in ``pseudo_extreme``.
        se_theta: Measurement errors for ``theta``, shape ``(k+1,)``.
        pseudo_extreme: The pseudo raw scores ``(d0, dk)`` used for the two
            extreme entries.
    """

    theta: NDArray[np.float64]
    se_theta: NDArray[np.float64]
    pseudo_extreme: tuple[float, float]

fit_person_parameters ¶

fit_person_parameters(beta: NDArray[float64], pseudo_extreme: tuple[float, float] | None = None, bracket: tuple[float, float] = (-20.0, 20.0), xtol: float = 1e-10) -> PersonParameters

Estimate :math:\theta_r for every raw score given item severities.

Parameters:

Name	Type	Description	Default
`beta`	`NDArray[float64]`	Item severities, shape `(k,)`.	required
`pseudo_extreme`	`tuple[float, float] \| None`	Pseudo raw scores `(d0, dk)` for the two extreme scores. Defaults to `(0.5, k - 0.5)`.	`None`
`bracket`	`tuple[float, float]`	Bracket passed to `scipy.optimize.brentq` when inverting the score equation. Wide enough for any plausible FIES fit.	`(-20.0, 20.0)`
`xtol`	`float`	Absolute tolerance on theta in the root finder.	`1e-10`

Returns:

Type	Description
`PersonParameters`	class:`PersonParameters` with `theta` and `se_theta` of shape
`PersonParameters`	`(k + 1,)`.

Source code in src/pyfies/core/person.py

def fit_person_parameters(
    beta: NDArray[np.float64],
    pseudo_extreme: tuple[float, float] | None = None,
    bracket: tuple[float, float] = (-20.0, 20.0),
    xtol: float = 1e-10,
) -> PersonParameters:
    """Estimate :math:`\\theta_r` for every raw score given item severities.

    Args:
        beta: Item severities, shape ``(k,)``.
        pseudo_extreme: Pseudo raw scores ``(d0, dk)`` for the two extreme
            scores. Defaults to ``(0.5, k - 0.5)``.
        bracket: Bracket passed to ``scipy.optimize.brentq`` when inverting
            the score equation. Wide enough for any plausible FIES fit.
        xtol: Absolute tolerance on theta in the root finder.

    Returns:
        :class:`PersonParameters` with ``theta`` and ``se_theta`` of shape
        ``(k + 1,)``.
    """
    beta = np.asarray(beta, dtype=np.float64)
    k = beta.shape[0]
    if pseudo_extreme is None:
        d0, dk = 0.5, k - 0.5
    else:
        d0, dk = pseudo_extreme
        if not (0.0 < d0 < 1.0):
            raise ValueError("pseudo extreme d0 must be in (0, 1)")
        if not (k - 1.0 < dk < k):
            raise ValueError(f"pseudo extreme dk must be in ({k - 1}, {k})")

    theta = np.zeros(k + 1, dtype=np.float64)
    se = np.zeros(k + 1, dtype=np.float64)

    # Interior raw scores: solve E[R | theta] = r exactly.
    for r in range(1, k):
        theta_r = brentq(
            lambda t, target=float(r): _expected_raw_score(t, beta) - target,
            bracket[0],
            bracket[1],
            xtol=xtol,
        )
        theta[r] = theta_r
        se[r] = _se_theta(theta_r, beta)

    # Extreme raw scores via pseudo-raw-score assumption.
    for r, target in ((0, d0), (k, dk)):
        theta_r = brentq(
            lambda t, t0=target: _expected_raw_score(t, beta) - t0,
            bracket[0],
            bracket[1],
            xtol=xtol,
        )
        theta[r] = theta_r
        se[r] = _se_theta(theta_r, beta)

    return PersonParameters(
        theta=theta, se_theta=se, pseudo_extreme=(float(d0), float(dk))
    )

Equating¶

pyfies.core.equating ¶

Equating a country FIES scale to a reference standard.

Implements the iterative scale-and-shift procedure used by RM.weights to calibrate item severities across measurement contexts. The algorithm:

Standardize the country's item severities to match the reference's mean and standard deviation on a candidate set of common items (initially all items).
Compute the absolute discrepancy of each candidate common item from the reference. If the largest exceeds tol, flag that item as unique (i.e. measuring something different in this context) and re-estimate scale and shift from the remaining common items. Repeat.
Stop when no further items would be flagged, or when the number of uniques reaches max_unique.

Final scale and shift are recomputed in one shot from the raw country severities and the converged common-items mask:

.. math:: \text{scale} = \sigma(\beta_{\text{ref}}[\text{common}]) / \sigma(\beta[\text{common}]), \quad \text{shift} = \mu(\beta_{\text{ref}}[\text{common}]) - \mu(\beta[\text{common}]) \cdot \text{scale}.

The country severities map onto the reference metric as beta_on_reference = shift + scale * beta.

EquatingResult `dataclass` ¶

Output of :func:equate.

Attributes:

Name	Type	Description
`scale`	`float`	Multiplicative factor mapping country β to the reference metric.
`shift`	`float`	Additive offset mapping country β to the reference metric.
`common`	`NDArray[bool_]`	Boolean mask of items judged common (True) vs unique (False).
`adj_thresholds`	`NDArray[float64]`	Reference thresholds mapped back onto the country metric, so prevalence can be computed without rescaling β.
`correlation`	`float`	Pearson correlation of common items between the country (after equating) and the reference scale.
`equated_beta`	`NDArray[float64]`	Country severities transformed onto the reference metric, shape `(k,)`.
`n_iter`	`int`	Number of iterations actually performed.

Source code in src/pyfies/core/equating.py

@dataclass
class EquatingResult:
    """Output of :func:`equate`.

    Attributes:
        scale: Multiplicative factor mapping country β to the reference metric.
        shift: Additive offset mapping country β to the reference metric.
        common: Boolean mask of items judged common (True) vs unique (False).
        adj_thresholds: Reference thresholds mapped back onto the country
            metric, so prevalence can be computed without rescaling β.
        correlation: Pearson correlation of common items between the country
            (after equating) and the reference scale.
        equated_beta: Country severities transformed onto the reference metric,
            shape ``(k,)``.
        n_iter: Number of iterations actually performed.
    """

    scale: float
    shift: float
    common: NDArray[np.bool_]
    adj_thresholds: NDArray[np.float64]
    correlation: float
    equated_beta: NDArray[np.float64]
    n_iter: int

equate ¶

equate(beta: NDArray[float64], reference_beta: NDArray[float64], reference_thresholds: NDArray[float64], tol: float = 0.35, max_unique: int = 3) -> EquatingResult

Equate beta to the metric defined by reference_beta.

Parameters:

Name	Type	Description	Default
`beta`	`NDArray[float64]`	Country item severity parameters, shape `(k,)`.	required
`reference_beta`	`NDArray[float64]`	Reference item severities (same item ordering), shape `(k,)`.	required
`reference_thresholds`	`NDArray[float64]`	Latent-trait thresholds on the reference metric (e.g. severities of items 5 and 8 of the FAO global standard), shape `(t,)`.	required
`tol`	`float`	Absolute discrepancy above which an item is flagged as unique.	`0.35`
`max_unique`	`int`	Maximum number of items that may be flagged as unique (also a hard cap on iterations).	`3`

Returns:

Type	Description
`EquatingResult`	class:`EquatingResult`.

Source code in src/pyfies/core/equating.py

def equate(
    beta: NDArray[np.float64],
    reference_beta: NDArray[np.float64],
    reference_thresholds: NDArray[np.float64],
    tol: float = 0.35,
    max_unique: int = 3,
) -> EquatingResult:
    """Equate ``beta`` to the metric defined by ``reference_beta``.

    Args:
        beta: Country item severity parameters, shape ``(k,)``.
        reference_beta: Reference item severities (same item ordering),
            shape ``(k,)``.
        reference_thresholds: Latent-trait thresholds on the *reference*
            metric (e.g. severities of items 5 and 8 of the FAO global
            standard), shape ``(t,)``.
        tol: Absolute discrepancy above which an item is flagged as unique.
        max_unique: Maximum number of items that may be flagged as unique
            (also a hard cap on iterations).

    Returns:
        :class:`EquatingResult`.
    """
    b1 = np.asarray(beta, dtype=np.float64)
    b_ref = np.asarray(reference_beta, dtype=np.float64)
    thres_ref = np.asarray(reference_thresholds, dtype=np.float64)
    if b1.shape != b_ref.shape:
        raise ValueError(
            f"beta has {b1.shape} but reference_beta has {b_ref.shape}"
        )
    k = b1.shape[0]
    if k < 3:
        raise ValueError("equating requires at least 3 items")

    # All items start as candidate common.
    common = np.ones(k, dtype=bool)

    # Initial standardization to common items' moments on the reference scale.
    b_st = _standardize(b1, b_ref, common)

    n_iter = 0
    for _ in range(max_unique + 1):
        n_iter += 1
        diff = np.where(common, np.abs(b_st - b_ref), -np.inf)
        worst = int(np.argmax(diff))
        if not np.isfinite(diff[worst]) or diff[worst] < tol:
            break
        if int(common.sum()) - 1 < 2:
            # Need at least 2 common items to estimate scale/shift.
            break
        common[worst] = False
        if int((~common).sum()) > max_unique:
            common[worst] = True  # revert: would exceed cap
            break
        # Re-equate using only remaining common items.
        scale1 = float(b_ref[common].std(ddof=1) / b_st[common].std(ddof=1))
        shift1 = float(b_ref[common].mean() - b_st[common].mean() * scale1)
        b_st = shift1 + scale1 * b_st

    # Final scale and shift recomputed from RAW β and the converged mask.
    scale = float(b_ref[common].std(ddof=1) / b1[common].std(ddof=1))
    shift = float(b_ref[common].mean() - b1[common].mean() * scale)
    equated_beta = shift + scale * b1
    adj_thresholds = (thres_ref - shift) / scale

    if int(common.sum()) >= 2:
        correlation = float(
            np.corrcoef(equated_beta[common], b_ref[common])[0, 1]
        )
    else:
        correlation = float("nan")

    return EquatingResult(
        scale=scale,
        shift=shift,
        common=common,
        adj_thresholds=adj_thresholds,
        correlation=correlation,
        equated_beta=equated_beta,
        n_iter=n_iter,
    )

Prevalence¶

pyfies.core.prevalence ¶

Probabilistic prevalence assignment along the latent FI trait.

Implements the Gaussian-mixture prevalence formula used by RM.weights::prob.assign: for each latent threshold t and raw score r, assume the posterior severity for respondents at raw score r is Gaussian with mean :math:\theta_r and standard deviation :math:\mathrm{se}(\theta_r). The marginal prevalence beyond t is

.. math:: P(\text{severity} > t) = \sum_{r=1}^{k} \big[ 1 - \Phi \big( (t - \theta_r) / \mathrm{se}(\theta_r) \big) \big] \cdot f_r,

where :math:f_r is the weighted proportion of respondents at raw score r, normalized over all raw scores 0, ..., k. Raw score 0 is excluded from the sum (respondents with no affirmative responses can't be food insecure beyond a threshold above the lowest item).

Defaults to :math:f_r computed from the model's own raw-score distribution, matching RM.weights' default behavior.

PrevalenceTable `dataclass` ¶

Prevalence rates beyond each latent threshold.

Attributes:

Name	Type	Description
`thresholds`	`NDArray[float64]`	Latent-trait thresholds at which prevalence was evaluated, shape `(t,)`.
`prevalence`	`NDArray[float64]`	Prevalence (in [0, 1]) beyond each threshold, shape `(t,)`. Population proportion.
`prob_per_raw_score`	`NDArray[float64]`	Conditional probability of being beyond each threshold at each raw score, shape `(k+1, t)`. Row 0 is forced to 0 to match the RM.weights convention.
`raw_score_freq`	`NDArray[float64]`	Weighted frequency :math:`f_r` of each raw score, normalized over all raw scores including 0, shape `(k+1,)`.

Source code in src/pyfies/core/prevalence.py

@dataclass
class PrevalenceTable:
    """Prevalence rates beyond each latent threshold.

    Attributes:
        thresholds: Latent-trait thresholds at which prevalence was evaluated,
            shape ``(t,)``.
        prevalence: Prevalence (in [0, 1]) beyond each threshold,
            shape ``(t,)``. Population proportion.
        prob_per_raw_score: Conditional probability of being beyond each
            threshold at each raw score, shape ``(k+1, t)``. Row 0 is forced
            to 0 to match the RM.weights convention.
        raw_score_freq: Weighted frequency :math:`f_r` of each raw score,
            normalized over all raw scores including 0, shape ``(k+1,)``.
    """

    thresholds: NDArray[np.float64]
    prevalence: NDArray[np.float64]
    prob_per_raw_score: NDArray[np.float64]
    raw_score_freq: NDArray[np.float64]

assign_prevalence ¶

assign_prevalence(theta: NDArray[float64], se_theta: NDArray[float64], raw_score_freq: NDArray[float64], thresholds: NDArray[float64]) -> PrevalenceTable

Compute population prevalence beyond each latent-trait threshold.

Parameters:

Name	Type	Description	Default
`theta`	`NDArray[float64]`	Person parameter for each raw score r = 0, ..., k, shape `(k+1,)`.	required
`se_theta`	`NDArray[float64]`	Measurement error for each `theta`, shape `(k+1,)`.	required
`raw_score_freq`	`NDArray[float64]`	Weighted relative frequency of each raw score r = 0, ..., k, shape `(k+1,)`. Should sum to (approximately) 1; will be used as-is. The entry for raw score 0 is treated as 0 contribution (population fraction at r = 0 is still subtracted from the denominator implicitly because they cannot exceed any threshold above the lowest possible severity).	required
`thresholds`	`NDArray[float64]`	Latent-trait thresholds, shape `(t,)`.	required

Returns:

Type	Description
`PrevalenceTable`	class:`PrevalenceTable`.

Source code in src/pyfies/core/prevalence.py

def assign_prevalence(
    theta: NDArray[np.float64],
    se_theta: NDArray[np.float64],
    raw_score_freq: NDArray[np.float64],
    thresholds: NDArray[np.float64],
) -> PrevalenceTable:
    """Compute population prevalence beyond each latent-trait threshold.

    Args:
        theta: Person parameter for each raw score *r* = 0, ..., *k*,
            shape ``(k+1,)``.
        se_theta: Measurement error for each ``theta``, shape ``(k+1,)``.
        raw_score_freq: Weighted relative frequency of each raw score
            *r* = 0, ..., *k*, shape ``(k+1,)``. Should sum to (approximately)
            1; will be used as-is. The entry for raw score 0 is treated as
            0 contribution (population fraction at r = 0 is still subtracted
            from the denominator implicitly because they cannot exceed any
            threshold above the lowest possible severity).
        thresholds: Latent-trait thresholds, shape ``(t,)``.

    Returns:
        :class:`PrevalenceTable`.
    """
    theta = np.asarray(theta, dtype=np.float64)
    se = np.asarray(se_theta, dtype=np.float64)
    f = np.asarray(raw_score_freq, dtype=np.float64).copy()
    thr = np.asarray(thresholds, dtype=np.float64)

    if theta.shape != se.shape:
        raise ValueError("theta and se_theta must have the same shape")
    if theta.shape != f.shape:
        raise ValueError("raw_score_freq must align with theta (length k+1)")
    if np.any(se <= 0):
        raise ValueError("se_theta must be positive")

    # Per-raw-score conditional probability of being beyond each threshold.
    # Shape: (k+1, t)
    z = (thr[None, :] - theta[:, None]) / se[:, None]
    cond_prob = 1.0 - norm.cdf(z)
    # Raw score 0 always contributes 0 (matches RM.weights f_j[1] = 0).
    cond_prob[0, :] = 0.0

    f[0] = 0.0
    prevalence = (cond_prob * f[:, None]).sum(axis=0)

    return PrevalenceTable(
        thresholds=thr,
        prevalence=prevalence,
        prob_per_raw_score=cond_prob,
        raw_score_freq=np.asarray(raw_score_freq, dtype=np.float64),
    )

Numerical core¶

CML estimator¶

pyfies.core.cml ¶

CMLFit dataclass ¶

fit_cml ¶

Elementary symmetric functions¶

pyfies.core.gamma ¶

log_gamma ¶

log_gamma_minus_one ¶

conditional_endorsement_prob ¶

Person parameters¶

pyfies.core.person ¶

PersonParameters dataclass ¶

fit_person_parameters ¶

Equating¶

pyfies.core.equating ¶

EquatingResult dataclass ¶

equate ¶

Prevalence¶

pyfies.core.prevalence ¶

PrevalenceTable dataclass ¶

assign_prevalence ¶

CMLFit `dataclass` ¶

PersonParameters `dataclass` ¶

EquatingResult `dataclass` ¶

PrevalenceTable `dataclass` ¶