Numerical core¶
These modules implement the numerical primitives behind RaschModel.
Most users will not need to call them directly.
CML estimator¶
pyfies.core.cml ¶
Weighted Conditional Maximum Likelihood estimator for the dichotomous Rasch model.
Estimates item severity parameters :math:\beta_1, \ldots, \beta_k by
maximizing the conditional likelihood of the response patterns given the raw
scores. The conditional likelihood is invariant under a uniform shift of all
severities; we resolve this with a sum-to-zero identification constraint
(the same convention used by FAO's RM.weights and the global standard
itself). Rows with any missing item response are dropped before fitting.
The negative conditional log-likelihood is convex in :math:\beta, so a
quasi-Newton method (L-BFGS-B with analytic gradient) converges to the unique
MLE.
References
Cafiero, C., Viviani, S., & Nord, M. (2018). Food security measurement in a global context: The Food Insecurity Experience Scale. Measurement, 116, 146-152.
CMLFit
dataclass
¶
Result of a weighted CML fit.
Attributes:
| Name | Type | Description |
|---|---|---|
beta |
NDArray[float64]
|
Item severities (sum-to-zero), shape |
se_beta |
NDArray[float64]
|
Asymptotic standard errors of |
n_complete |
int
|
Number of complete cases used for estimation (rows with no missing responses, any raw score). |
n_complete_non_extreme |
int
|
Number of complete cases with non-extreme
raw scores (1 <= r <= k-1). Only these contribute to the CML
log-likelihood; matches the |
n_total |
int
|
Total rows in the input. |
weighted_raw_score_counts |
NDArray[float64]
|
Weighted count :math: |
weighted_item_totals |
NDArray[float64]
|
Weighted endorsement count :math: |
loglik |
float
|
Final conditional log-likelihood at the MLE. |
converged |
bool
|
Whether the optimizer reported convergence. |
n_iter |
int
|
Number of optimizer iterations. |
Source code in src/pyfies/core/cml.py
fit_cml ¶
fit_cml(data: NDArray[int_], weights: NDArray[float64] | None = None, max_iter: int = 100, tol: float = 1e-08) -> CMLFit
Fit the dichotomous Rasch model by weighted CML.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
NDArray[int_]
|
Response matrix of shape |
required |
weights
|
NDArray[float64] | None
|
Optional sampling weights of shape |
None
|
max_iter
|
int
|
Maximum optimizer iterations. |
100
|
tol
|
float
|
Optimizer convergence tolerance on the gradient infinity norm. |
1e-08
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
CMLFit
|
class: |
CMLFit
|
statistics. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the matrix has fewer than 2 items, no complete cases with non-extreme raw scores, or invalid weights. |
Source code in src/pyfies/core/cml.py
184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 | |
Elementary symmetric functions¶
pyfies.core.gamma ¶
Elementary symmetric functions of item easinesses, in log-space.
The Conditional Maximum Likelihood (CML) estimator for the Rasch model relies on the elementary symmetric functions
.. math:: \gamma_r(\varepsilon) = \sum_{|J|=r} \prod_{j \in J} \varepsilon_j
where :math:\varepsilon_i = \exp(-\beta_i) is the easiness of item i
parametrized via its severity :math:\beta_i. The CML likelihood is
.. math:: L(\beta) = - \sum_i T_i \beta_i - \sum_r N_r \log \gamma_r(\varepsilon),
where :math:T_i is the (weighted) number of affirmative responses to item i
and :math:N_r is the (weighted) number of respondents with raw score r.
For numerical stability we always compute :math:\log \gamma_r via the
Andersen / Verhelst recursion combined with logaddexp.
References
Andersen, E. B. (1972). The numerical solution of a set of conditional estimation equations. J. Roy. Statist. Soc. B, 34, 42-54.
Verhelst, N. D., Glas, C. A. W., & van der Sluis, A. (1984). Estimation problems in the Rasch model: The basic symmetric functions. Computational Statistics Quarterly, 1, 245-262.
log_gamma ¶
Compute :math:\log \gamma_r for r = 0, ..., k.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
beta
|
NDArray[float64]
|
Item severity parameters, shape |
required |
Returns:
| Type | Description |
|---|---|
NDArray[float64]
|
Array of shape |
NDArray[float64]
|
math: |
Source code in src/pyfies/core/gamma.py
log_gamma_minus_one ¶
Compute :math:\log \gamma_r^{(-i)} for every item i and order r.
Returns the elementary symmetric functions of the easinesses with item i excluded. Used to evaluate the CML score and the conditional probability that a respondent with raw score r endorses item i:
.. math:: P(X_i = 1 \mid R = r) = \varepsilon_i \, \gamma_{r-1}^{(-i)} / \gamma_r .
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
beta
|
NDArray[float64]
|
Item severity parameters, shape |
required |
Returns:
| Type | Description |
|---|---|
NDArray[float64]
|
Array of shape |
NDArray[float64]
|
math: |
NDArray[float64]
|
(cannot form a subset of size k from k-1 items). |
Source code in src/pyfies/core/gamma.py
conditional_endorsement_prob ¶
Probability of endorsing each item conditional on each raw score.
Returns pi[r, i] = P(X_i = 1 | R = r) for raw scores
r = 0, ..., k and items i = 0, ..., k - 1.
By construction pi[0, :] = 0 and pi[k, :] = 1: a respondent with
raw score 0 endorses no items, one with raw score k endorses all of them.
Source code in src/pyfies/core/gamma.py
Person parameters¶
pyfies.core.person ¶
Post-hoc maximum-likelihood estimation of person parameters.
Once item severities :math:\beta are estimated by CML, person parameters
:math:\theta_r for each raw score r = 1, ..., k - 1 are obtained by
solving the marginal score equation
.. math:: r = \sum_{i=1}^{k} \frac{1}{1 + \exp(\beta_i - \theta_r)} ,
i.e. the value of :math:\theta at which the expected raw score under the
Rasch model equals the observed raw score r. The corresponding measurement
error is
.. math:: \mathrm{se}(\theta_r) = \Big(\sum_{i=1}^{k} p_i (1 - p_i)\Big)^{-1/2}, \quad p_i = \frac{1}{1 + \exp(\beta_i - \theta_r)} .
Extreme raw scores (r = 0 and r = k) are undefined under standard MLE.
Following RM.weights we estimate them by solving for pseudo-raw-scores
:math:d_0 \in (0, 1) and :math:d_k \in (k - 1, k) (defaults: 0.5 and
k - 0.5).
PersonParameters
dataclass
¶
Person severity per raw score.
Attributes:
| Name | Type | Description |
|---|---|---|
theta |
NDArray[float64]
|
Estimated person severity for each raw score r = 0, ..., k,
shape |
se_theta |
NDArray[float64]
|
Measurement errors for |
pseudo_extreme |
tuple[float, float]
|
The pseudo raw scores |
Source code in src/pyfies/core/person.py
fit_person_parameters ¶
fit_person_parameters(beta: NDArray[float64], pseudo_extreme: tuple[float, float] | None = None, bracket: tuple[float, float] = (-20.0, 20.0), xtol: float = 1e-10) -> PersonParameters
Estimate :math:\theta_r for every raw score given item severities.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
beta
|
NDArray[float64]
|
Item severities, shape |
required |
pseudo_extreme
|
tuple[float, float] | None
|
Pseudo raw scores |
None
|
bracket
|
tuple[float, float]
|
Bracket passed to |
(-20.0, 20.0)
|
xtol
|
float
|
Absolute tolerance on theta in the root finder. |
1e-10
|
Returns:
| Type | Description |
|---|---|
PersonParameters
|
class: |
PersonParameters
|
|
Source code in src/pyfies/core/person.py
Equating¶
pyfies.core.equating ¶
Equating a country FIES scale to a reference standard.
Implements the iterative scale-and-shift procedure used by RM.weights to calibrate item severities across measurement contexts. The algorithm:
- Standardize the country's item severities to match the reference's mean and standard deviation on a candidate set of common items (initially all items).
- Compute the absolute discrepancy of each candidate common item from the
reference. If the largest exceeds
tol, flag that item as unique (i.e. measuring something different in this context) and re-estimate scale and shift from the remaining common items. Repeat. - Stop when no further items would be flagged, or when the number of
uniques reaches
max_unique.
Final scale and shift are recomputed in one shot from the raw
country severities and the converged common-items mask:
.. math:: \text{scale} = \sigma(\beta_{\text{ref}}[\text{common}]) / \sigma(\beta[\text{common}]), \quad \text{shift} = \mu(\beta_{\text{ref}}[\text{common}]) - \mu(\beta[\text{common}]) \cdot \text{scale}.
The country severities map onto the reference metric as
beta_on_reference = shift + scale * beta.
EquatingResult
dataclass
¶
Output of :func:equate.
Attributes:
| Name | Type | Description |
|---|---|---|
scale |
float
|
Multiplicative factor mapping country β to the reference metric. |
shift |
float
|
Additive offset mapping country β to the reference metric. |
common |
NDArray[bool_]
|
Boolean mask of items judged common (True) vs unique (False). |
adj_thresholds |
NDArray[float64]
|
Reference thresholds mapped back onto the country metric, so prevalence can be computed without rescaling β. |
correlation |
float
|
Pearson correlation of common items between the country (after equating) and the reference scale. |
equated_beta |
NDArray[float64]
|
Country severities transformed onto the reference metric,
shape |
n_iter |
int
|
Number of iterations actually performed. |
Source code in src/pyfies/core/equating.py
equate ¶
equate(beta: NDArray[float64], reference_beta: NDArray[float64], reference_thresholds: NDArray[float64], tol: float = 0.35, max_unique: int = 3) -> EquatingResult
Equate beta to the metric defined by reference_beta.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
beta
|
NDArray[float64]
|
Country item severity parameters, shape |
required |
reference_beta
|
NDArray[float64]
|
Reference item severities (same item ordering),
shape |
required |
reference_thresholds
|
NDArray[float64]
|
Latent-trait thresholds on the reference
metric (e.g. severities of items 5 and 8 of the FAO global
standard), shape |
required |
tol
|
float
|
Absolute discrepancy above which an item is flagged as unique. |
0.35
|
max_unique
|
int
|
Maximum number of items that may be flagged as unique (also a hard cap on iterations). |
3
|
Returns:
| Type | Description |
|---|---|
EquatingResult
|
class: |
Source code in src/pyfies/core/equating.py
Prevalence¶
pyfies.core.prevalence ¶
Probabilistic prevalence assignment along the latent FI trait.
Implements the Gaussian-mixture prevalence formula used by
RM.weights::prob.assign: for each latent threshold t and raw score
r, assume the posterior severity for respondents at raw score r is
Gaussian with mean :math:\theta_r and standard deviation
:math:\mathrm{se}(\theta_r). The marginal prevalence beyond t is
.. math:: P(\text{severity} > t) = \sum_{r=1}^{k} \big[ 1 - \Phi \big( (t - \theta_r) / \mathrm{se}(\theta_r) \big) \big] \cdot f_r,
where :math:f_r is the weighted proportion of respondents at raw score
r, normalized over all raw scores 0, ..., k. Raw score 0 is
excluded from the sum (respondents with no affirmative responses can't be
food insecure beyond a threshold above the lowest item).
Defaults to :math:f_r computed from the model's own raw-score distribution,
matching RM.weights' default behavior.
PrevalenceTable
dataclass
¶
Prevalence rates beyond each latent threshold.
Attributes:
| Name | Type | Description |
|---|---|---|
thresholds |
NDArray[float64]
|
Latent-trait thresholds at which prevalence was evaluated,
shape |
prevalence |
NDArray[float64]
|
Prevalence (in [0, 1]) beyond each threshold,
shape |
prob_per_raw_score |
NDArray[float64]
|
Conditional probability of being beyond each
threshold at each raw score, shape |
raw_score_freq |
NDArray[float64]
|
Weighted frequency :math: |
Source code in src/pyfies/core/prevalence.py
assign_prevalence ¶
assign_prevalence(theta: NDArray[float64], se_theta: NDArray[float64], raw_score_freq: NDArray[float64], thresholds: NDArray[float64]) -> PrevalenceTable
Compute population prevalence beyond each latent-trait threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
theta
|
NDArray[float64]
|
Person parameter for each raw score r = 0, ..., k,
shape |
required |
se_theta
|
NDArray[float64]
|
Measurement error for each |
required |
raw_score_freq
|
NDArray[float64]
|
Weighted relative frequency of each raw score
r = 0, ..., k, shape |
required |
thresholds
|
NDArray[float64]
|
Latent-trait thresholds, shape |
required |
Returns:
| Type | Description |
|---|---|
PrevalenceTable
|
class: |