Research update — The Great Homecoming

What has been built

This is already a substantial body of work. A complete framework — a documented ontology in which any system is read as three coupled capacities — integration, interaction and bonding, each with a level and an orientation — and a working engine that implements it: it scores from each system's own stated-and-enacted purpose and cross-checks structure against that meaning at every step, so the model is built to audit itself rather than confirm its assumptions. Behind it sits a real case base — a corpus of 200+ historical and contemporary cases held for leakage-free analysis, spanning civilisations and states, organisations (among them a globally systemic bank), and the major economic crises of the last century; a growing set modelled in depth as multi-tick trajectories, fifteen full structural reads, and a 600-plus-check test battery — plus a standing, sealed register of dated forecasts on live systems. Most of this is engine-supported today; none of it is yet validated against data the model has not seen.

What the evidence shows engine-supported

Produced on cases built from the historical record — real evidence that the mechanism is coherent and carries information:

Transferable dynamics — historically opposite cases (one consolidating under stress, one hollowing and collapsing) reproduced from the same dynamics, with no per-case tuning.
An outcome-blind pair-sort — two systems, one terminal and one recoverable, sorted correctly on internal condition alone, without being told which was which. It is the load-bearing engine result — the others are consistency checks — and it remains engine-supported: not yet repeated on data the model has never seen.
Strand discrimination, sealed and pre-registered — the prediction was written down and time-stamped before the run: the model reads the strand each decline actually runs on (a loss-of-meaning case as a meaning decline; the 2008 crisis as an information decline). On a sealed run it did so — an early, engine-supported indication, not yet an out-of-sample result.
Tested with the misses kept — a first run that failed across the board was reworked and re-run; both the failure and the result are on the record.

Independently reproduced independent scorers · not yet out-of-sample

A different kind of evidence, added June 2026 — not whether the model is right about the future, but whether its readings are reproducible and its categories sound. We handed the written method to independent scorers who had no contact with us, or with each other, and checked whether they arrived where the method says they should.

The reading is reproducible, and it discriminates — eight independent AI systems, given only the written scoring rules and the pre-event facts, scored a battery of past cases blind. They agreed on the reading and separated every failure from every survivor, with no overlap between the two — the score is a property of the method, not of one analyst's hand. (The cases are drawn from the historical record and so recognisable, and the scorers are AI: this shows reproducibility and discrimination, not out-of-sample prediction.)
The categories pass a pre-registered, adversarial test — the framework reads every system on a small set of base categories. Those categories were put to a sealed test: thirty-eight short statements, including deliberately misleading look-alikes built to point the wrong way, with the answer key hashed and recorded before any run. Independent scorers applied the categories consistently (agreement κ 0.72–0.83, “substantial”) and saw through 88% of the traps — evidence the categories carve cleanly and are not arbitrary labels. (The scorers share training priors; a panel of human analysts is the next rung above this.)
The four-way decline reading holds up — and a planted counter-case sharpened it — the framework sorts any declining system into one of four paths (re-develop, held-descent, capture, dissolution). Seven independent scorers classified thirty-one historical cases blind: substantial agreement (κ rising 0.66→0.72 as the rules were sharpened), and no case forced a fifth category — evidence the four paths are exhaustive and distinct, not a convenient list. A case deliberately planted to break an over-strong claim did break it — re-development turned out to be possible on more kinds of foundation than first stated — so the claim was narrowed to a sharper, testable one and the miss kept on the record. (Scorers are AI and the case briefs were written in-house; human scorers and independent briefs are the next rung.)
The instrument is frozen and tamper-evident — the scoring rules are sealed by cryptographic hash before any case is scored, so a result can never be the product of rules quietly adjusted to fit.

The case record engine-supported / structural

The same instrument has been run across very different systems, each read held to the same sealed discipline — a spread that matters as much as any single result:

Historical structural reads — among them an early community that consolidated under extreme external pressure, the Córdoba caliphate, the Abbasid era and the Soviet collapse, each read as a multi-tick trajectory and checked against the historical record.
A blind localisation pilot — on one historical case, three independent builds located the failing groups before the historian's key was revealed, matching four of five named groups. The blind protocol held; the magnitudes did not — modeller choices moved them widely — so only the localisation is treated as load-bearing.
A sealed cross-era language test — registered before scoring: had organisations' stated purpose measurably homogenised across thirty years? It did so only partly, failed its pre-registered threshold on weak power, and the word-based reading was retired in favour of conduct. A negative kept openly on the record.
An organisational read — a globally systemic bank and the 2008 ratings failure read on the same dynamics: the outward form intact while the function hollowed.

One limit holds across the record, stated once rather than case by case: these are consistency checks and blind-localisation results on cases built from the historical record — genuinely informative, but not a positive result on data the model has never seen. The historical cases also carry the ordinary cautions (sources written later; the risk of a clean arc drawn over a messy record), and where a reading agrees with the secondary literature that is a consistency check, not independent confirmation. Closing that gap is what the forward register below is for.

What we are measuring next

We are explicit about the limits of what the reports surface today. The current read leans most fully on integration and correction; the model defines more than the report yet shows, and the additions are named and sequenced, not vague. A deeper diagnostic layer will surface what the engine already computes but the headline omits — per-strand topology and opacity, friction read as a three-part vector rather than one number, and life-cycle phase. And three genuinely new measurements are in development: a system's external interaction (how it actually couples with and responds to its environment — the dimension our environmental read is still missing), its output (the real, self-report-proof effect it imposes on the world — the check that exposes a system looking strong while quietly extracting), and its legacy (what it transmits to whatever comes next). These extend the instrument beneath the four-pillar headline, not on top of it, and each is admitted only if it carries information the current measures cannot. The output measure is also exactly what ESG and SDG “impact” reporting reaches for and rarely captures — a system's real, conduct-grounded effect on the world; when it lands, it becomes the rigorous core of the reporting offering described on the main site.

The discipline (the method is half the project)

A mechanism only counts if it emerges in the engine without being coded in, under sealed pre-registration: claims, thresholds and falsifiers are hashed and recorded before any run, scored exactly as sealed, with misses kept permanently. The order is simulate first, then matched-shock historical tests (condition coded blind from pre-shock sources), then live forward calls — with conduct, not words, as the standard of evidence. Findings enter canon only through a formal ratification sitting. And the apparatus is distrusted by default: five broken instruments were caught before producing a single false result, and two of our own over-claims were killed by fresh-configuration tests.

What we are still earning

One thing, stated plainly because the honesty is part of the method: a positive out-of-sample result. Two leakage-free out-of-sample tests have been run and came back null — and we treat that as information, not embarrassment. They taught us two things we now build on. First, that no single early-warning parameter works: what carries signal is the arc — the trajectory across several coupled readings over time — not any lone metric. Second, where the instrument's lane actually lies: systems read in depth and in motion, not sparse aggregates read from a distance. Both lessons fed directly into the current design. Convergence with the framework is real evidence, but it is not yet validation against the world; the mechanism continues to be strengthened, and phase-dependent results are held provisional until that review closes. None of this is hidden — it is how a serious programme builds trust, and it is exactly the gap the forward register above is designed to close.