Skip to content

Numerical core

These modules implement the numerical primitives behind RaschModel. Most users will not need to call them directly.

CML estimator

pyfies.core.cml

Weighted Conditional Maximum Likelihood estimator for the dichotomous Rasch model.

Estimates item severity parameters :math:\beta_1, \ldots, \beta_k by maximizing the conditional likelihood of the response patterns given the raw scores. The conditional likelihood is invariant under a uniform shift of all severities; we resolve this with a sum-to-zero identification constraint (the same convention used by FAO's RM.weights and the global standard itself). Rows with any missing item response are dropped before fitting.

The negative conditional log-likelihood is convex in :math:\beta, so a quasi-Newton method (L-BFGS-B with analytic gradient) converges to the unique MLE.

References

Cafiero, C., Viviani, S., & Nord, M. (2018). Food security measurement in a global context: The Food Insecurity Experience Scale. Measurement, 116, 146-152.

CMLFit dataclass

Result of a weighted CML fit.

Attributes:

Name Type Description
beta NDArray[float64]

Item severities (sum-to-zero), shape (k,).

se_beta NDArray[float64]

Asymptotic standard errors of beta, shape (k,).

n_complete int

Number of complete cases used for estimation (rows with no missing responses, any raw score).

n_complete_non_extreme int

Number of complete cases with non-extreme raw scores (1 <= r <= k-1). Only these contribute to the CML log-likelihood; matches the n.compl field reported by RM.weights.

n_total int

Total rows in the input.

weighted_raw_score_counts NDArray[float64]

Weighted count :math:N_r for each raw score r = 0, ..., k, shape (k+1,).

weighted_item_totals NDArray[float64]

Weighted endorsement count :math:T_i for each item, shape (k,). Computed over complete cases only.

loglik float

Final conditional log-likelihood at the MLE.

converged bool

Whether the optimizer reported convergence.

n_iter int

Number of optimizer iterations.

Source code in src/pyfies/core/cml.py
@dataclass
class CMLFit:
    """Result of a weighted CML fit.

    Attributes:
        beta: Item severities (sum-to-zero), shape ``(k,)``.
        se_beta: Asymptotic standard errors of ``beta``, shape ``(k,)``.
        n_complete: Number of complete cases used for estimation
            (rows with no missing responses, any raw score).
        n_complete_non_extreme: Number of complete cases with non-extreme
            raw scores (1 <= r <= k-1). Only these contribute to the CML
            log-likelihood; matches the ``n.compl`` field reported by
            RM.weights.
        n_total: Total rows in the input.
        weighted_raw_score_counts: Weighted count :math:`N_r` for each
            raw score *r* = 0, ..., *k*, shape ``(k+1,)``.
        weighted_item_totals: Weighted endorsement count :math:`T_i` for each
            item, shape ``(k,)``. Computed over complete cases only.
        loglik: Final conditional log-likelihood at the MLE.
        converged: Whether the optimizer reported convergence.
        n_iter: Number of optimizer iterations.
    """

    beta: NDArray[np.float64]
    se_beta: NDArray[np.float64]
    n_complete: int
    n_complete_non_extreme: int
    n_total: int
    weighted_raw_score_counts: NDArray[np.float64]
    weighted_item_totals: NDArray[np.float64]
    loglik: float
    converged: bool
    n_iter: int

fit_cml

fit_cml(data: NDArray[int_], weights: NDArray[float64] | None = None, max_iter: int = 100, tol: float = 1e-08) -> CMLFit

Fit the dichotomous Rasch model by weighted CML.

Parameters:

Name Type Description Default
data NDArray[int_]

Response matrix of shape (n, k) with values in {0, 1} or NaN for missing. Affirmative responses are coded 1.

required
weights NDArray[float64] | None

Optional sampling weights of shape (n,). If omitted, equal weights are used. Weights are renormalized so they sum to n.

None
max_iter int

Maximum optimizer iterations.

100
tol float

Optimizer convergence tolerance on the gradient infinity norm.

1e-08

Returns:

Name Type Description
A CMLFit

class:CMLFit with item severities, standard errors, and summary

CMLFit

statistics.

Raises:

Type Description
ValueError

If the matrix has fewer than 2 items, no complete cases with non-extreme raw scores, or invalid weights.

Source code in src/pyfies/core/cml.py
def fit_cml(
    data: NDArray[np.int_],
    weights: NDArray[np.float64] | None = None,
    max_iter: int = 100,
    tol: float = 1e-8,
) -> CMLFit:
    """Fit the dichotomous Rasch model by weighted CML.

    Args:
        data: Response matrix of shape ``(n, k)`` with values in {0, 1} or NaN
            for missing. Affirmative responses are coded 1.
        weights: Optional sampling weights of shape ``(n,)``. If omitted, equal
            weights are used. Weights are renormalized so they sum to *n*.
        max_iter: Maximum optimizer iterations.
        tol: Optimizer convergence tolerance on the gradient infinity norm.

    Returns:
        A :class:`CMLFit` with item severities, standard errors, and summary
        statistics.

    Raises:
        ValueError: If the matrix has fewer than 2 items, no complete cases
            with non-extreme raw scores, or invalid weights.
    """
    arr = np.asarray(data, dtype=np.float64)
    if arr.ndim != 2:
        raise ValueError("data must be a 2-D array")
    n, k = arr.shape
    if k < 2:
        raise ValueError("need at least 2 items to identify a Rasch model")

    w = _normalize_weights(weights, n)

    complete = ~np.isnan(arr).any(axis=1)
    n_complete = int(complete.sum())
    if n_complete == 0:
        raise ValueError("no complete cases; cannot fit Rasch model")

    X = arr[complete].astype(np.int8)
    w_complete = w[complete]
    raw = X.sum(axis=1)

    # Weighted item totals T_i and weighted raw-score counts N_r.
    item_totals = (X.T @ w_complete).astype(np.float64)
    raw_score_counts = np.zeros(k + 1, dtype=np.float64)
    for r in range(k + 1):
        raw_score_counts[r] = float(w_complete[raw == r].sum())

    # Extreme raw scores (0 and k) carry no information about item severities
    # under CML; they enter the prevalence calculation later but not the fit.
    informative = raw_score_counts.copy()
    informative[0] = 0.0
    informative[k] = 0.0

    # Initial guess: mild informative prior centered at zero.
    p_endorse = np.clip(item_totals / max(w_complete.sum(), 1.0), 1e-3, 1.0 - 1e-3)
    init_beta = -np.log(p_endorse / (1.0 - p_endorse))
    init_beta -= init_beta.mean()  # enforce sum-to-zero
    init_free = init_beta[:-1].copy()

    result = minimize(
        _negative_loglik_and_grad,
        init_free,
        args=(item_totals, informative),
        method="L-BFGS-B",
        jac=True,
        options={"maxiter": max_iter, "gtol": tol},
    )

    beta = _expand_to_full(result.x)

    # Standard errors: invert the Hessian on the (k-1)-dim free parametrization,
    # then map back. The Hessian on the full β is rank k-1 (singular by the
    # sum-to-zero constraint); we project it.
    H_full = _hessian_full(beta, informative)
    # Apply sum-to-zero constraint via the (k, k-1) Jacobian J where the i-th
    # free param maps to β_i and the last full param is -sum.
    J = np.vstack([np.eye(k - 1), -np.ones((1, k - 1))])  # (k, k-1)
    H_free = J.T @ H_full @ J
    cov_free = np.linalg.inv(H_free)
    cov_full = J @ cov_free @ J.T
    se_beta = np.sqrt(np.maximum(np.diag(cov_full), 0.0))

    n_non_extreme = int(((raw >= 1) & (raw <= k - 1)).sum())

    return CMLFit(
        beta=beta,
        se_beta=se_beta,
        n_complete=n_complete,
        n_complete_non_extreme=n_non_extreme,
        n_total=n,
        weighted_raw_score_counts=raw_score_counts,
        weighted_item_totals=item_totals,
        loglik=-float(result.fun),
        converged=bool(result.success),
        n_iter=int(result.nit),
    )

Elementary symmetric functions

pyfies.core.gamma

Elementary symmetric functions of item easinesses, in log-space.

The Conditional Maximum Likelihood (CML) estimator for the Rasch model relies on the elementary symmetric functions

.. math:: \gamma_r(\varepsilon) = \sum_{|J|=r} \prod_{j \in J} \varepsilon_j

where :math:\varepsilon_i = \exp(-\beta_i) is the easiness of item i parametrized via its severity :math:\beta_i. The CML likelihood is

.. math:: L(\beta) = - \sum_i T_i \beta_i - \sum_r N_r \log \gamma_r(\varepsilon),

where :math:T_i is the (weighted) number of affirmative responses to item i and :math:N_r is the (weighted) number of respondents with raw score r.

For numerical stability we always compute :math:\log \gamma_r via the Andersen / Verhelst recursion combined with logaddexp.

References

Andersen, E. B. (1972). The numerical solution of a set of conditional estimation equations. J. Roy. Statist. Soc. B, 34, 42-54.

Verhelst, N. D., Glas, C. A. W., & van der Sluis, A. (1984). Estimation problems in the Rasch model: The basic symmetric functions. Computational Statistics Quarterly, 1, 245-262.

log_gamma

log_gamma(beta: NDArray[float64]) -> NDArray[np.float64]

Compute :math:\log \gamma_r for r = 0, ..., k.

Parameters:

Name Type Description Default
beta NDArray[float64]

Item severity parameters, shape (k,).

required

Returns:

Type Description
NDArray[float64]

Array of shape (k+1,) whose r-th entry is

NDArray[float64]

math:\log \gamma_r(\exp(-\beta)).

Source code in src/pyfies/core/gamma.py
def log_gamma(beta: NDArray[np.float64]) -> NDArray[np.float64]:
    """Compute :math:`\\log \\gamma_r` for *r* = 0, ..., *k*.

    Args:
        beta: Item severity parameters, shape ``(k,)``.

    Returns:
        Array of shape ``(k+1,)`` whose *r*-th entry is
        :math:`\\log \\gamma_r(\\exp(-\\beta))`.
    """
    k = beta.shape[0]
    log_eps = -beta.astype(np.float64, copy=False)
    g = np.full(k + 1, -np.inf, dtype=np.float64)
    g[0] = 0.0
    for j in range(k):
        # Update in reverse so that g[r-1] is still the j-1 value when used.
        for r in range(j + 1, 0, -1):
            g[r] = np.logaddexp(g[r], log_eps[j] + g[r - 1])
    return g

log_gamma_minus_one

log_gamma_minus_one(beta: NDArray[float64]) -> NDArray[np.float64]

Compute :math:\log \gamma_r^{(-i)} for every item i and order r.

Returns the elementary symmetric functions of the easinesses with item i excluded. Used to evaluate the CML score and the conditional probability that a respondent with raw score r endorses item i:

.. math:: P(X_i = 1 \mid R = r) = \varepsilon_i \, \gamma_{r-1}^{(-i)} / \gamma_r .

Parameters:

Name Type Description Default
beta NDArray[float64]

Item severity parameters, shape (k,).

required

Returns:

Type Description
NDArray[float64]

Array of shape (k, k+1). Entry [i, r] is

NDArray[float64]

math:\log \gamma_r^{(-i)}. Entry [i, k] is :math:-\infty

NDArray[float64]

(cannot form a subset of size k from k-1 items).

Source code in src/pyfies/core/gamma.py
def log_gamma_minus_one(beta: NDArray[np.float64]) -> NDArray[np.float64]:
    """Compute :math:`\\log \\gamma_r^{(-i)}` for every item *i* and order *r*.

    Returns the elementary symmetric functions of the easinesses with item *i*
    excluded. Used to evaluate the CML score and the conditional probability
    that a respondent with raw score *r* endorses item *i*:

    .. math::
        P(X_i = 1 \\mid R = r) = \\varepsilon_i \\,
            \\gamma_{r-1}^{(-i)} / \\gamma_r .

    Args:
        beta: Item severity parameters, shape ``(k,)``.

    Returns:
        Array of shape ``(k, k+1)``. Entry ``[i, r]`` is
        :math:`\\log \\gamma_r^{(-i)}`. Entry ``[i, k]`` is :math:`-\\infty`
        (cannot form a subset of size *k* from *k-1* items).
    """
    k = beta.shape[0]
    log_eps = -beta.astype(np.float64, copy=False)
    out = np.full((k, k + 1), -np.inf, dtype=np.float64)
    for i in range(k):
        g = np.full(k + 1, -np.inf, dtype=np.float64)
        g[0] = 0.0
        count = 0  # how many items have been folded in so far
        for j in range(k):
            if j == i:
                continue
            count += 1
            for r in range(count, 0, -1):
                g[r] = np.logaddexp(g[r], log_eps[j] + g[r - 1])
        out[i, :] = g
    return out

conditional_endorsement_prob

conditional_endorsement_prob(beta: NDArray[float64]) -> NDArray[np.float64]

Probability of endorsing each item conditional on each raw score.

Returns pi[r, i] = P(X_i = 1 | R = r) for raw scores r = 0, ..., k and items i = 0, ..., k - 1.

By construction pi[0, :] = 0 and pi[k, :] = 1: a respondent with raw score 0 endorses no items, one with raw score k endorses all of them.

Source code in src/pyfies/core/gamma.py
def conditional_endorsement_prob(beta: NDArray[np.float64]) -> NDArray[np.float64]:
    """Probability of endorsing each item conditional on each raw score.

    Returns ``pi[r, i] = P(X_i = 1 | R = r)`` for raw scores
    *r* = 0, ..., *k* and items *i* = 0, ..., *k - 1*.

    By construction ``pi[0, :] = 0`` and ``pi[k, :] = 1``: a respondent with
    raw score 0 endorses no items, one with raw score *k* endorses all of them.
    """
    k = beta.shape[0]
    log_eps = -beta.astype(np.float64, copy=False)
    log_g = log_gamma(beta)  # (k+1,)
    log_g_minus = log_gamma_minus_one(beta)  # (k, k+1)
    pi = np.zeros((k + 1, k), dtype=np.float64)
    # Raw scores 1..k-1: pi[r, i] = exp(log_eps[i] + log_g_minus[i, r-1] - log_g[r])
    # log_g[r] is finite for 0 <= r <= k.
    for r in range(1, k):
        for i in range(k):
            log_pi = log_eps[i] + log_g_minus[i, r - 1] - log_g[r]
            pi[r, i] = float(np.exp(log_pi))
    pi[k, :] = 1.0
    return pi

Person parameters

pyfies.core.person

Post-hoc maximum-likelihood estimation of person parameters.

Once item severities :math:\beta are estimated by CML, person parameters :math:\theta_r for each raw score r = 1, ..., k - 1 are obtained by solving the marginal score equation

.. math:: r = \sum_{i=1}^{k} \frac{1}{1 + \exp(\beta_i - \theta_r)} ,

i.e. the value of :math:\theta at which the expected raw score under the Rasch model equals the observed raw score r. The corresponding measurement error is

.. math:: \mathrm{se}(\theta_r) = \Big(\sum_{i=1}^{k} p_i (1 - p_i)\Big)^{-1/2}, \quad p_i = \frac{1}{1 + \exp(\beta_i - \theta_r)} .

Extreme raw scores (r = 0 and r = k) are undefined under standard MLE. Following RM.weights we estimate them by solving for pseudo-raw-scores :math:d_0 \in (0, 1) and :math:d_k \in (k - 1, k) (defaults: 0.5 and k - 0.5).

PersonParameters dataclass

Person severity per raw score.

Attributes:

Name Type Description
theta NDArray[float64]

Estimated person severity for each raw score r = 0, ..., k, shape (k+1,). Entries 0 and k use the pseudo-raw-score assumptions in pseudo_extreme.

se_theta NDArray[float64]

Measurement errors for theta, shape (k+1,).

pseudo_extreme tuple[float, float]

The pseudo raw scores (d0, dk) used for the two extreme entries.

Source code in src/pyfies/core/person.py
@dataclass
class PersonParameters:
    """Person severity per raw score.

    Attributes:
        theta: Estimated person severity for each raw score *r* = 0, ..., *k*,
            shape ``(k+1,)``. Entries 0 and *k* use the pseudo-raw-score
            assumptions in ``pseudo_extreme``.
        se_theta: Measurement errors for ``theta``, shape ``(k+1,)``.
        pseudo_extreme: The pseudo raw scores ``(d0, dk)`` used for the two
            extreme entries.
    """

    theta: NDArray[np.float64]
    se_theta: NDArray[np.float64]
    pseudo_extreme: tuple[float, float]

fit_person_parameters

fit_person_parameters(beta: NDArray[float64], pseudo_extreme: tuple[float, float] | None = None, bracket: tuple[float, float] = (-20.0, 20.0), xtol: float = 1e-10) -> PersonParameters

Estimate :math:\theta_r for every raw score given item severities.

Parameters:

Name Type Description Default
beta NDArray[float64]

Item severities, shape (k,).

required
pseudo_extreme tuple[float, float] | None

Pseudo raw scores (d0, dk) for the two extreme scores. Defaults to (0.5, k - 0.5).

None
bracket tuple[float, float]

Bracket passed to scipy.optimize.brentq when inverting the score equation. Wide enough for any plausible FIES fit.

(-20.0, 20.0)
xtol float

Absolute tolerance on theta in the root finder.

1e-10

Returns:

Type Description
PersonParameters

class:PersonParameters with theta and se_theta of shape

PersonParameters

(k + 1,).

Source code in src/pyfies/core/person.py
def fit_person_parameters(
    beta: NDArray[np.float64],
    pseudo_extreme: tuple[float, float] | None = None,
    bracket: tuple[float, float] = (-20.0, 20.0),
    xtol: float = 1e-10,
) -> PersonParameters:
    """Estimate :math:`\\theta_r` for every raw score given item severities.

    Args:
        beta: Item severities, shape ``(k,)``.
        pseudo_extreme: Pseudo raw scores ``(d0, dk)`` for the two extreme
            scores. Defaults to ``(0.5, k - 0.5)``.
        bracket: Bracket passed to ``scipy.optimize.brentq`` when inverting
            the score equation. Wide enough for any plausible FIES fit.
        xtol: Absolute tolerance on theta in the root finder.

    Returns:
        :class:`PersonParameters` with ``theta`` and ``se_theta`` of shape
        ``(k + 1,)``.
    """
    beta = np.asarray(beta, dtype=np.float64)
    k = beta.shape[0]
    if pseudo_extreme is None:
        d0, dk = 0.5, k - 0.5
    else:
        d0, dk = pseudo_extreme
        if not (0.0 < d0 < 1.0):
            raise ValueError("pseudo extreme d0 must be in (0, 1)")
        if not (k - 1.0 < dk < k):
            raise ValueError(f"pseudo extreme dk must be in ({k - 1}, {k})")

    theta = np.zeros(k + 1, dtype=np.float64)
    se = np.zeros(k + 1, dtype=np.float64)

    # Interior raw scores: solve E[R | theta] = r exactly.
    for r in range(1, k):
        theta_r = brentq(
            lambda t, target=float(r): _expected_raw_score(t, beta) - target,
            bracket[0],
            bracket[1],
            xtol=xtol,
        )
        theta[r] = theta_r
        se[r] = _se_theta(theta_r, beta)

    # Extreme raw scores via pseudo-raw-score assumption.
    for r, target in ((0, d0), (k, dk)):
        theta_r = brentq(
            lambda t, t0=target: _expected_raw_score(t, beta) - t0,
            bracket[0],
            bracket[1],
            xtol=xtol,
        )
        theta[r] = theta_r
        se[r] = _se_theta(theta_r, beta)

    return PersonParameters(
        theta=theta, se_theta=se, pseudo_extreme=(float(d0), float(dk))
    )

Equating

pyfies.core.equating

Equating a country FIES scale to a reference standard.

Implements the iterative scale-and-shift procedure used by RM.weights to calibrate item severities across measurement contexts. The algorithm:

  1. Standardize the country's item severities to match the reference's mean and standard deviation on a candidate set of common items (initially all items).
  2. Compute the absolute discrepancy of each candidate common item from the reference. If the largest exceeds tol, flag that item as unique (i.e. measuring something different in this context) and re-estimate scale and shift from the remaining common items. Repeat.
  3. Stop when no further items would be flagged, or when the number of uniques reaches max_unique.

Final scale and shift are recomputed in one shot from the raw country severities and the converged common-items mask:

.. math:: \text{scale} = \sigma(\beta_{\text{ref}}[\text{common}]) / \sigma(\beta[\text{common}]), \quad \text{shift} = \mu(\beta_{\text{ref}}[\text{common}]) - \mu(\beta[\text{common}]) \cdot \text{scale}.

The country severities map onto the reference metric as beta_on_reference = shift + scale * beta.

EquatingResult dataclass

Output of :func:equate.

Attributes:

Name Type Description
scale float

Multiplicative factor mapping country β to the reference metric.

shift float

Additive offset mapping country β to the reference metric.

common NDArray[bool_]

Boolean mask of items judged common (True) vs unique (False).

adj_thresholds NDArray[float64]

Reference thresholds mapped back onto the country metric, so prevalence can be computed without rescaling β.

correlation float

Pearson correlation of common items between the country (after equating) and the reference scale.

equated_beta NDArray[float64]

Country severities transformed onto the reference metric, shape (k,).

n_iter int

Number of iterations actually performed.

Source code in src/pyfies/core/equating.py
@dataclass
class EquatingResult:
    """Output of :func:`equate`.

    Attributes:
        scale: Multiplicative factor mapping country β to the reference metric.
        shift: Additive offset mapping country β to the reference metric.
        common: Boolean mask of items judged common (True) vs unique (False).
        adj_thresholds: Reference thresholds mapped back onto the country
            metric, so prevalence can be computed without rescaling β.
        correlation: Pearson correlation of common items between the country
            (after equating) and the reference scale.
        equated_beta: Country severities transformed onto the reference metric,
            shape ``(k,)``.
        n_iter: Number of iterations actually performed.
    """

    scale: float
    shift: float
    common: NDArray[np.bool_]
    adj_thresholds: NDArray[np.float64]
    correlation: float
    equated_beta: NDArray[np.float64]
    n_iter: int

equate

equate(beta: NDArray[float64], reference_beta: NDArray[float64], reference_thresholds: NDArray[float64], tol: float = 0.35, max_unique: int = 3) -> EquatingResult

Equate beta to the metric defined by reference_beta.

Parameters:

Name Type Description Default
beta NDArray[float64]

Country item severity parameters, shape (k,).

required
reference_beta NDArray[float64]

Reference item severities (same item ordering), shape (k,).

required
reference_thresholds NDArray[float64]

Latent-trait thresholds on the reference metric (e.g. severities of items 5 and 8 of the FAO global standard), shape (t,).

required
tol float

Absolute discrepancy above which an item is flagged as unique.

0.35
max_unique int

Maximum number of items that may be flagged as unique (also a hard cap on iterations).

3

Returns:

Type Description
EquatingResult

class:EquatingResult.

Source code in src/pyfies/core/equating.py
def equate(
    beta: NDArray[np.float64],
    reference_beta: NDArray[np.float64],
    reference_thresholds: NDArray[np.float64],
    tol: float = 0.35,
    max_unique: int = 3,
) -> EquatingResult:
    """Equate ``beta`` to the metric defined by ``reference_beta``.

    Args:
        beta: Country item severity parameters, shape ``(k,)``.
        reference_beta: Reference item severities (same item ordering),
            shape ``(k,)``.
        reference_thresholds: Latent-trait thresholds on the *reference*
            metric (e.g. severities of items 5 and 8 of the FAO global
            standard), shape ``(t,)``.
        tol: Absolute discrepancy above which an item is flagged as unique.
        max_unique: Maximum number of items that may be flagged as unique
            (also a hard cap on iterations).

    Returns:
        :class:`EquatingResult`.
    """
    b1 = np.asarray(beta, dtype=np.float64)
    b_ref = np.asarray(reference_beta, dtype=np.float64)
    thres_ref = np.asarray(reference_thresholds, dtype=np.float64)
    if b1.shape != b_ref.shape:
        raise ValueError(
            f"beta has {b1.shape} but reference_beta has {b_ref.shape}"
        )
    k = b1.shape[0]
    if k < 3:
        raise ValueError("equating requires at least 3 items")

    # All items start as candidate common.
    common = np.ones(k, dtype=bool)

    # Initial standardization to common items' moments on the reference scale.
    b_st = _standardize(b1, b_ref, common)

    n_iter = 0
    for _ in range(max_unique + 1):
        n_iter += 1
        diff = np.where(common, np.abs(b_st - b_ref), -np.inf)
        worst = int(np.argmax(diff))
        if not np.isfinite(diff[worst]) or diff[worst] < tol:
            break
        if int(common.sum()) - 1 < 2:
            # Need at least 2 common items to estimate scale/shift.
            break
        common[worst] = False
        if int((~common).sum()) > max_unique:
            common[worst] = True  # revert: would exceed cap
            break
        # Re-equate using only remaining common items.
        scale1 = float(b_ref[common].std(ddof=1) / b_st[common].std(ddof=1))
        shift1 = float(b_ref[common].mean() - b_st[common].mean() * scale1)
        b_st = shift1 + scale1 * b_st

    # Final scale and shift recomputed from RAW β and the converged mask.
    scale = float(b_ref[common].std(ddof=1) / b1[common].std(ddof=1))
    shift = float(b_ref[common].mean() - b1[common].mean() * scale)
    equated_beta = shift + scale * b1
    adj_thresholds = (thres_ref - shift) / scale

    if int(common.sum()) >= 2:
        correlation = float(
            np.corrcoef(equated_beta[common], b_ref[common])[0, 1]
        )
    else:
        correlation = float("nan")

    return EquatingResult(
        scale=scale,
        shift=shift,
        common=common,
        adj_thresholds=adj_thresholds,
        correlation=correlation,
        equated_beta=equated_beta,
        n_iter=n_iter,
    )

Prevalence

pyfies.core.prevalence

Probabilistic prevalence assignment along the latent FI trait.

Implements the Gaussian-mixture prevalence formula used by RM.weights::prob.assign: for each latent threshold t and raw score r, assume the posterior severity for respondents at raw score r is Gaussian with mean :math:\theta_r and standard deviation :math:\mathrm{se}(\theta_r). The marginal prevalence beyond t is

.. math:: P(\text{severity} > t) = \sum_{r=1}^{k} \big[ 1 - \Phi \big( (t - \theta_r) / \mathrm{se}(\theta_r) \big) \big] \cdot f_r,

where :math:f_r is the weighted proportion of respondents at raw score r, normalized over all raw scores 0, ..., k. Raw score 0 is excluded from the sum (respondents with no affirmative responses can't be food insecure beyond a threshold above the lowest item).

Defaults to :math:f_r computed from the model's own raw-score distribution, matching RM.weights' default behavior.

PrevalenceTable dataclass

Prevalence rates beyond each latent threshold.

Attributes:

Name Type Description
thresholds NDArray[float64]

Latent-trait thresholds at which prevalence was evaluated, shape (t,).

prevalence NDArray[float64]

Prevalence (in [0, 1]) beyond each threshold, shape (t,). Population proportion.

prob_per_raw_score NDArray[float64]

Conditional probability of being beyond each threshold at each raw score, shape (k+1, t). Row 0 is forced to 0 to match the RM.weights convention.

raw_score_freq NDArray[float64]

Weighted frequency :math:f_r of each raw score, normalized over all raw scores including 0, shape (k+1,).

Source code in src/pyfies/core/prevalence.py
@dataclass
class PrevalenceTable:
    """Prevalence rates beyond each latent threshold.

    Attributes:
        thresholds: Latent-trait thresholds at which prevalence was evaluated,
            shape ``(t,)``.
        prevalence: Prevalence (in [0, 1]) beyond each threshold,
            shape ``(t,)``. Population proportion.
        prob_per_raw_score: Conditional probability of being beyond each
            threshold at each raw score, shape ``(k+1, t)``. Row 0 is forced
            to 0 to match the RM.weights convention.
        raw_score_freq: Weighted frequency :math:`f_r` of each raw score,
            normalized over all raw scores including 0, shape ``(k+1,)``.
    """

    thresholds: NDArray[np.float64]
    prevalence: NDArray[np.float64]
    prob_per_raw_score: NDArray[np.float64]
    raw_score_freq: NDArray[np.float64]

assign_prevalence

assign_prevalence(theta: NDArray[float64], se_theta: NDArray[float64], raw_score_freq: NDArray[float64], thresholds: NDArray[float64]) -> PrevalenceTable

Compute population prevalence beyond each latent-trait threshold.

Parameters:

Name Type Description Default
theta NDArray[float64]

Person parameter for each raw score r = 0, ..., k, shape (k+1,).

required
se_theta NDArray[float64]

Measurement error for each theta, shape (k+1,).

required
raw_score_freq NDArray[float64]

Weighted relative frequency of each raw score r = 0, ..., k, shape (k+1,). Should sum to (approximately) 1; will be used as-is. The entry for raw score 0 is treated as 0 contribution (population fraction at r = 0 is still subtracted from the denominator implicitly because they cannot exceed any threshold above the lowest possible severity).

required
thresholds NDArray[float64]

Latent-trait thresholds, shape (t,).

required

Returns:

Type Description
PrevalenceTable

class:PrevalenceTable.

Source code in src/pyfies/core/prevalence.py
def assign_prevalence(
    theta: NDArray[np.float64],
    se_theta: NDArray[np.float64],
    raw_score_freq: NDArray[np.float64],
    thresholds: NDArray[np.float64],
) -> PrevalenceTable:
    """Compute population prevalence beyond each latent-trait threshold.

    Args:
        theta: Person parameter for each raw score *r* = 0, ..., *k*,
            shape ``(k+1,)``.
        se_theta: Measurement error for each ``theta``, shape ``(k+1,)``.
        raw_score_freq: Weighted relative frequency of each raw score
            *r* = 0, ..., *k*, shape ``(k+1,)``. Should sum to (approximately)
            1; will be used as-is. The entry for raw score 0 is treated as
            0 contribution (population fraction at r = 0 is still subtracted
            from the denominator implicitly because they cannot exceed any
            threshold above the lowest possible severity).
        thresholds: Latent-trait thresholds, shape ``(t,)``.

    Returns:
        :class:`PrevalenceTable`.
    """
    theta = np.asarray(theta, dtype=np.float64)
    se = np.asarray(se_theta, dtype=np.float64)
    f = np.asarray(raw_score_freq, dtype=np.float64).copy()
    thr = np.asarray(thresholds, dtype=np.float64)

    if theta.shape != se.shape:
        raise ValueError("theta and se_theta must have the same shape")
    if theta.shape != f.shape:
        raise ValueError("raw_score_freq must align with theta (length k+1)")
    if np.any(se <= 0):
        raise ValueError("se_theta must be positive")

    # Per-raw-score conditional probability of being beyond each threshold.
    # Shape: (k+1, t)
    z = (thr[None, :] - theta[:, None]) / se[:, None]
    cond_prob = 1.0 - norm.cdf(z)
    # Raw score 0 always contributes 0 (matches RM.weights f_j[1] = 0).
    cond_prob[0, :] = 0.0

    f[0] = 0.0
    prevalence = (cond_prob * f[:, None]).sum(axis=0)

    return PrevalenceTable(
        thresholds=thr,
        prevalence=prevalence,
        prob_per_raw_score=cond_prob,
        raw_score_freq=np.asarray(raw_score_freq, dtype=np.float64),
    )