The AI Governance Gap Is a Measurement Problem

Most AI governance proposals share a hidden assumption: that we can observe what we need to regulate. Capability thresholds, safety benchmarks, deployment scope — all of these presuppose measurement.

But model capability is not a single-dimensional scalar. It is context-dependent, evaluator-dependent, and adversarially fragile. The same model that scores poorly on one benchmark can outperform on a slight rephrasing.

Until we take the measurement problem seriously, governance frameworks will remain aspirational. The hard work is definitional, not political.