Skip to content

Parity with RM.weights

pyFIES is a clean-room reimplementation of FAO's R package RM.weights. Numerical agreement is verified against the four FAO sample country datasets shipped with RM.weights (data.FAO_country1 through data.FAO_country4).

How parity is enforced

The reference outputs come from a single-shot R run:

make fixtures   # runs scripts/generate_r_fixtures.R

This installs RM.weights if needed, runs RM.w, prob.assign, and equating.fun on each sample country, and dumps the numerical outputs as JSON under tests/fixtures/r_reference/. The Python parity tests (tests/test_parity_r.py) reload the input data from those fixtures, re-fit in pyFIES, and assert agreement to fixed tolerances.

R is not required to run pyFIES, only to regenerate fixtures.

Tolerances and what to expect

Quantity Tolerance Notes
Item severities \(\beta\) \(2 \times 10^{-4}\) Limited by finite-precision \(\gamma_r\) summation; worst case is country 4 (β spread ≈ 6 units).
Person parameters \(\theta\) \(10^{-2}\) \(\beta\) noise propagates through the score-equation inversion.
Equating scale & shift \(10^{-3}\)
Common-items mask exact match All 4 countries flag the same items as unique.
Adjusted thresholds \(5 \times 10^{-4}\)
Prevalence rates \(5 \times 10^{-3}\) (= 0.5 pp) Empirically ≤ 0.3 pp on all 4 countries.
n.compl (R) ↔ n_complete_non_extreme (pyFIES) exact

Identification convention

RM.weights does not strictly enforce a sum-to-zero identification on \(\beta\) — the iterative algorithm leaves a residual offset of order \(10^{-5}\). pyFIES enforces sum-to-zero exactly. The conditional likelihood is invariant under a uniform \(\beta\) shift, so both answers are the same MLE up to the identification constant. The parity tests subtract mean(R β) from R's reported \(\beta\) before comparison.

When parity could break

  • Item severities with very wide spread (say > 6 units between most and least severe items) push the elementary symmetric function recursion closer to floating-point precision limits. The 2e-4 tolerance accommodates this; if a real dataset shows wider spread, expect somewhat larger pyFIES-vs-R differences (still well within research reporting precision).
  • Optimizer convergence: pyFIES uses SciPy's L-BFGS-B with gtol=1e-8. If you tighten or loosen RaschModel(tol=...) you may move modestly closer to or further from R.

Sample dataset summary

Country n Complete non-extreme Unique items (vs. global standard)
1 1000 423 WORRIED, HEALTHY
2 1000 505 FEWFOOD, SKIPPED
3 1008 734 HUNGRY
4 1000 597 WORRIED

These are anonymized Gallup World Poll datasets from the Voices of the Hungry project, distributed inside RM.weights and used here only as reference numerical fixtures.