Keep AI a Tool — govern the steering, not the engine

The Great Homecoming · Applied Case · Track C · June 2026

Keep AI a Tool

Govern the steering, not the engine — a diagnosis and a way forward

Illustrative research draft · structural diagnosis + matched intervention · proof of concept

The danger of AI is not that it gets too smart. It is that we quietly hand it the job of deciding what matters — until the machine becomes the centre everything bends around. This report is less an analysis than a way forward: it reads AI governance through the TGH lens to get the diagnosis right, and then matches the diagnosis to the kind of intervention the framework says actually heals — in the order that matters. A tool can carry purpose but cannot originate it; so the real risk is abdication, the right things to measure are legibility and orientation, and the one rule that governs everything else is: repair the steering before you scale the engine.

Diagnosis: a false centre

the tool quietly becoming the centre (abdication)

Stage: early — reachable now

the correction loop still works; the window is closing

Rule: steering before engine

govern orientation before scaling capability

How to read: a structural diagnosis (a hand read on the framework's canvas, not an engine run) paired with a matched intervention. Two confidence tiers are marked below: the diagnosis and the two governance targets are argued; the repair set and its order are the framework's bet, under forward test.

Status: proof of concept · consistency ≠ validation · factual claims sourced · © The Great Homecoming Project

The diagnosis — what kind of problem this is

The lens makes one distinction that reorganises the whole policy question. In its terms, a system runs on two registers at once: orientation (what it is actually for) and interaction (its capacity to coordinate and act). The first originates direction; the second amplifies whatever direction it is given. AI is almost pure interaction — immense capacity, almost no native orientation — and the framework's core law is that interaction amplifies whatever orientation it serves. Point a world-class amplifier at a hollowed or finite orientation (engagement, a metric, quarterly profit) and you do not get excellence; you get sophisticated misalignment at scale and at speed.

That the handover is already under way is visible in the labs' own disclosures: by May 2026 Anthropic reported that more than 80% of the code merged into its own production codebase was written by its AI (up from low single digits a year earlier), with the model also reviewing that code and increasingly choosing which experiments to run — a year after its CEO predicted AI would write 90% of code within months. Be precise about what that shows and what it does not: code written to a human-set objective is the tool working as a tool, not yet abdication. The line is crossed when the deciding migrates too — when what to build, and whether it was worth building, quietly stop being human calls. The disclosures show the doing has largely moved; the warning is about the step after.

The failure mode, nameda false centre / abdication

The risk is not a hostile superintelligence. It is abdication: step by step, because the machine is faster and frictionless, people stop doing the deciding — until the carrier quietly becomes the centre every decision routes through. The framework calls this a false centre: a finite instrument promoted to the place where the orienting purpose should sit. History names what happens when the thing built to serve becomes the thing everything serves — whether the centre is a king, a party, or a market. A machine is no different, only faster. (This is close kin to what others have called gradual disempowerment — incremental loss of human control through delegation; what the lens adds is the false-centre diagnosis and the ordered repair that follows from it.) structural

What to measure — and why not capability

Most AI policy measures capability: how smart, how fast, how large. That is both the hardest thing to verify (a training run is easier to hide than a missile) and the wrong thing to fear. Two other things are more measurable and more decisive, and both are framework-native:

Legibility — the correction loopcan we still see and steer?

Can humans still understand what the system is doing, and why? This is the correction loop applied to a human-AI system: the moment honest signal about what the system is really doing can no longer reach a human who can act on it, control is gone — regardless of capability. structural

Orientation — whose purpose?still pointed at human ends?

Is the human-and-AI system still pointed at human purposes, or has it re-centred on the machine, the metric, or the money? Orientation is read from conduct, not from mission statements. structural

The trajectory — and why the moment matters

The framework reads problems as trajectories, not snapshots, and it tracks how reachable a system still is. Abdication runs a recognisable course: convenience → delegation → dependence → the point where the human capacity to judge has atrophied so far that it can no longer be exercised even when someone wants to. The crucial property is that reachability falls as the course runs: early, the correction loop still works and the fix is cheap; late, the faculty needed to govern is the very faculty that has been lost, and no rule can install it from outside.

Where we are, on this reading: early. The loop still works — humans can still see, still object, still take the wheel back. That is the good news and the whole urgency: intervention is cheap and effective now, and gets structurally more expensive the longer abdication is allowed to harden. (The reachability gradient rests on the life-cycle part of the method still under review, so this urgency claim is held provisional — directionally argued, not proven.)

The way forward — the matched intervention

Two confidence tiers, kept apart. Read what follows at two levels. The diagnosis, the two governance targets (legibility and orientation — checkable where capability is not), and the mismatch warnings (named ways a plausible-looking intervention recurs) stand close to on their own: they are arguments and falsifiable predictions. The repair set and its order — that these are roughly the right repairs, that the set is reasonably complete, and that the sequence is fixed — is the framework's structured bet, offered under forward test, not established fact. Trust the first tier further than the second; the report marks them so you can.

The ordering rule (tier 2): reset the steering before you scale the engine. Repairing or growing capacity under a corrupted orientation does not help — it builds a more efficient version of the problem. Every step below is in this order on purpose; doing the later steps first is, on this reading, the classic failure.

Reset

Keep AI a tool — reset the false centre (first, always).

Hold AI explicitly in the position of a means that serves human judgment, never the end it serves. Concretely: a legibility floor (you may not ship what you cannot explain) and an orientation test on deployment. The floor is a rising one, not a switch — full interpretability of large models is an open research problem, so the standard is “as legible as the science currently allows, and the bar rises as the science does,” not an immediate ban.

Mismatch warning: capability caps alone govern the engine, not the steering — the framework predicts that re-grows the problem behind the cap.

Unmask

Make the system legible and keep it so.

Interpretability is the brake pedal, not a research nicety. Sustained, independent exposure of what the system is actually doing prevents the convenience-surplus from masking the slow handover of judgment.

Mismatch warning: capability that outruns legibility is the mask forming — speed bought by giving up the ability to see.

Audit

An independent audit office — accountability from a level above.

A continuous inspectorate for AI-integrated institutions, checking the two reads (still legible? still human-steered?) with real authority, like a financial regulator — the external accountability that can force correction when an institution will not correct itself.

Mismatch warning: an advisory panel without authority is the form of accountability without its function.

Incentive

Re-align the incentives — out of pure market capture.

Keep critical capability, and the institutions that form human judgment, fundable as public goods — the way we try (imperfectly, and the shield already leaks) to keep courts, universities and central banks from the highest bidder. A tool optimised only for engagement or profit is already pointed the wrong way; the incentive is the orientation.

Mismatch warning: rules layered on an unchanged profit-for-engagement incentive are re-routed around; the incentive wins.

Rebuild

Rebuild the human side — renew the judgment that anchors all the rest.

The labs admit human judgment is already slipping — not because AI got better at it, but because we practise it less. Invest in the schools, professions and habits that keep people able to set direction. This is the most forgotten step and the most important: an abdicated judgment cannot be governed back by any rule. If the human anchor is not renewed, every other repair sits on nothing.

Mismatch warning: governing AI while letting the human judgment faculty wither is treating the symptom while removing the thing the cure depends on.

Coordinate

Coordinate internationally on what can actually be checked.

A one-country pause hands the lead to the least careful. But legibility and orientation are far easier to verify than a secret training run — so make those the shared standard: the AI-era equivalent of arms inspection, built on showing your system is understandable and human-steered, not on proving you stopped.

Mismatch warning: coordinating on capability limits (unverifiable) instead of on legibility/orientation (verifiable) is a treaty that cannot be checked.

The whole way forward in one line: none of this slows AI down, and it shouldn't. The aim is narrower and more durable — that however capable the tool becomes, the hand on the wheel, and the judgment about where to go, stays human. Steering before engine; the verifiable targets (legibility, orientation) over the unverifiable one (capability).

How this reading was produced

Be clear about the evidence. The diagnosis is a structural read — an analyst applying the lens's questions to AI, by hand, the same method used for the USSR case; it is not an engine run. The factual claims (the share of lab code now AI-written) are public and sourced. The way forward is matched from the framework's standing set of repair kinds (reset a false centre, unmask, accountability from above, re-align incentives, renew the human anchor) and an ordering rule (steering before engine). Two things are held openly as the framework's bet, not as proven: that this small set is the right and roughly complete one, and that the order is fixed; the further claim that a matched repair actually resolves rather than recurs is the explicit forward test. By contrast, the verifiability point (legibility and orientation are checkable where capability is not) and the mismatch warnings are arguments and falsifiable predictions that stand on their own. Consistency ≠ validation.

Objections, and limits

Objection	Response
“Capability is the risk — a sufficiently capable system defeats ‘steering’ by design (deceptive alignment, instrumental sub-goals).”	The most serious objection, from the technical-safety camp, and partly CONCEDED: a system capable enough to defeat legibility by design is outside what this frame governs, and that failure mode is real. The claim here is narrower — that for the systems we have and are building, the steering (orientation + legibility) is what governance can actually check and what decides most outcomes, and that capability-first governance chases the one thing it cannot verify. The two views are complementary, not exclusive; this one does not replace alignment research, it governs the institutional layer alignment research cannot.
“This imports a value judgement — maybe the machine should decide.”	ACCEPTED as a real disagreement, stated openly: the reading rests on the commitment that originating purpose is a human responsibility a carrier cannot hold. A reader who rejects that will discount the orientation pillar; the legibility argument (you cannot govern what you cannot see) still stands on its own.
“Fund critical AI as a public good? The shield over courts and universities is already a sieve.”	PARTLY CONCEDED: market and political capture of those institutions is real, so this is a direction to fight for, not a mechanism to assume. The defensible core is the diagnosis it rests on — that an engagement/profit incentive is a mis-orientation — which holds regardless of how hard the remedy proves.
“The repair set and its order are asserted — why these, why closed, why this sequence?”	ACCEPTED and marked: their existence, completeness and ordering are the framework's bet (tier 2 above), under forward test, not established fact. The diagnosis, the verifiable targets, and the mismatch warnings (the most defensible content) do not depend on that bet.

Limits. Structural diagnosis, not an engine run and not out-of-sample validated; the reachability/urgency reading is provisional (the life-cycle model is under review); inputs are framework-defined and qualitative. The contribution is the diagnosis, the two verifiable governance targets, and the named anti-patterns — not a forecast and not a guarantee that any step will work. Consistency ≠ validation. © The Great Homecoming Project.

Read this first

The diagnosis — what kind of problem this is

What to measure — and why not capability

The trajectory — and why the moment matters

The way forward — the matched intervention

How this reading was produced

Objections, and limits