Authority Hierarchies¶

Why "treat all documents equally" is the wrong default in regulated environments¶

A Base2ML white paper. Sixth in a series. See also The Information Paradox, Why Conflict Detection Is Harder Than It Looks, The Half-Life of Documentation, Citation Discipline, and What "I Don't Know" Looks Like.

A retrieval that doesn't know its place¶

A standard retrieval system, asked a question, returns relevant documents in order of topical match. The local procedure document and the binding state regulation, both relevant, are returned and ranked by how closely they match the question text. The system has no built-in sense that one of these documents constrains the other. They sit side by side in the retrieval set as if they were equivalent inputs to the synthesis.

This is fine for unregulated knowledge work. A marketing team's brand guidelines and the project-management runbook can sit at the same authority level; neither constrains the other; the user just wants to find what's relevant.

It is not fine in any environment where some documents have authority over others. A borough's local ordinance is constrained by state law. A hospital's procedural SOP operates within CMS regulations. A contractor's safety manual sits inside OSHA. Each of these settings has an authority hierarchy — a structure of which documents bind which others — and a system that treats all documents as equivalently authoritative is, in those settings, producing answers that quietly miss the constraints that should be visible.

This is requirement #5 from our first paper: respect authority hierarchies. Of the seven requirements we laid out, this one is the most domain-specific. Organizations in unregulated knowledge work don't notice when systems get this wrong because they don't have a hierarchy to violate. Organizations in regulated work notice immediately, often by getting an answer that's confidently wrong about a binding constraint.

What an authority hierarchy actually is¶

A useful authority hierarchy has at least two dimensions.

Jurisdictional level captures where in a structure of authorities the document sits. In American government, the natural levels are: federal, state, county, municipal, departmental, team. In healthcare: federal regulation, state health code, hospital policy, departmental procedure, unit-level practice. In professional services: regulatory body, firm policy, practice-area procedure, engagement-level convention. The exact labels vary; the structure is recognizable.

Regulatory force captures how authoritative the document is within its level. A statute is binding. A regulatory guidance document is interpretive but not binding in the same way. An internal SOP is operationally instructive but not legally enforceable in the same sense. A reference document or summary is informational. These categories shape how the document should be weighted in any answer that touches it.

A document is positioned by both. The PA Sunshine Act is a state-level binding regulation. A borough's local meeting procedures are a municipal-level internal document. A summary of best practices from a state association is state-level guidance. An employee handbook is departmental internal. The combination of level and force places the document in the hierarchy and tells the system how to treat it relative to others.

This metadata has to come from somewhere. The cleanest approach is automatic classification at ingest time: regex and keyword heuristics for path-based and content-based signals (a file under state-law/ whose first paragraph contains "Pennsylvania Statute 65 Pa.C.S." is high-confidence state-level binding); LLM classifiers as a second pass for cases the heuristics miss; operator-driven manual overrides as the authoritative final layer when the automatic classifications get something wrong. We've discussed each of these in earlier papers; what's specific to authority is that the combination of level and force, applied per-document, is what enables the hierarchical reasoning at retrieval time.

The retrieval consequence¶

Authority hierarchy isn't just a label on the source card. It changes what the retrieval pipeline has to do.

A standard retrieval surfaces the top N most-relevant passages and stops. An authority-aware retrieval has to do more: after surfacing the most-relevant passages, it has to separately identify whether any binding higher-authority documents address the same topic, and surface those alongside the primary results.

The case worth thinking about is when the user asks a question whose answer at the local level is straightforward, but whose binding higher-authority constraint is the actually-important answer. The local borough zoning ordinance addresses variance procedures explicitly. A standard retrieval surfaces the local ordinance and produces a clean answer. The PA Sunshine Act's open-meetings constraint, which may modify the answer, isn't surfaced because it's not topically about zoning variances directly — it's about meetings. Topical similarity isn't the right relevance criterion when authority precedence is in play.

The fix is a second retrieval pass. After the primary retrieval surfaces the local content, a separate pass scans the corpus specifically for binding higher-authority documents that touch on the topic of the user's question, even if topical similarity scores are lower than the local content. The combined retrieval surfaces both: the local procedure that directly answers the question, and the binding state-level constraint the user needs to know operates over it.

This is technically achievable, requires explicit machinery to implement, and is the capability that produces the strongest user reaction in regulated-environment demos. The moment a borough manager sees their local procedure displayed under a "Binding state authority" header showing the constraint they hadn't been thinking about — that is the moment the system stops being a search interface and becomes an instrument that improves their decision-making.

When authority differs from currency¶

Authority hierarchy and currency (covered in our third paper) are sometimes confused. They're related but distinct.

A 2017 state regulation is current (still in force) AND binding (state-level, regulatory force). A 2024 internal departmental memo is current AND not binding (departmental-level, internal force). The 2017 doc and the 2024 doc can both be current; they differ in authority, not in currency.

Conversely, a 2010 borough ordinance superseded in 2023 is legacy (currency) AND was municipal-level (authority); the supersession affects its currency status, not its authority level. The same document, when it was in force, was a municipal-level internal-or-binding document; superseded, it's still a municipal-level document, but no longer authoritative because it's no longer in force.

A useful system tracks both independently. A document can be current-and-binding, current-and-internal, legacy-and-binding (which matters for time-scoped queries about what was in force on a specific date), and so on. Conflating them produces systems that handle one well and the other badly.

What authority awareness looks like in the answer¶

The presentation matters. The right rendering does several things at once.

It separates the binding higher-authority sources from the local-procedure sources visually. A user reading the answer should be able to see at a glance: here are the local procedures that directly answer your question; here are the higher-authority constraints that operate over them.

It labels each source with its position in the hierarchy. Not just "source: pa-borough-code-section-1006.txt" but "source: pa-borough-code-section-1006.txt — state-level, binding." The user, reading the source card, has the precedence information in front of them.

It surfaces hierarchical conflicts as a distinct class of conflict. When local procedure and binding higher-authority disagree, that's not the same kind of disagreement as when two local-procedure documents disagree. The first requires deference to the binding authority; the second requires the user's judgment about which local procedure is operationally in force. Different presentation, different user mental models.

It doesn't paper over the answer. If the answer is "the local procedure says X but state law says Y," the system shows both, with their respective citations and authority labels, and refuses to silently pick one. The user makes the call; the system makes the call possible.

These are presentation choices. They sound minor and they are not. The presentation is the substrate for the institutional learning that authority awareness produces over time — operators stop being surprised by the binding constraints, because the system has been surfacing them every time the topic came up.

The cost of getting it wrong¶

The cost of authority blindness is, like several other costs in this series, asymmetric and concentrated.

A system that ignores authority is fine on most queries. Most operational questions don't involve a binding higher-authority constraint that contradicts the local procedure. The system produces clean answers, the user proceeds, no harm done.

The minority of queries that do involve a higher-authority constraint are where the cost shows up. The user gets a confidently-wrong answer based on local procedure that they should have been told is constrained. They act on it. Sometime later — possibly months — the missed constraint surfaces in a context that matters: a regulatory inquiry, a legal challenge, an audit, a Right-to-Know request, a grievance. The cost is concentrated, traceable, and visible to the people whose opinion of the organization shapes its future.

Authority blindness is not a feature deficiency you discover during normal use. It's a feature deficiency you discover during the worst use case. By the time the deficiency is visible, the cost has been paid.

This is the argument for building authority awareness in from the start in regulated environments — not as a feature added later, not as a configuration option, but as a substrate of the retrieval pipeline. Adding it later is harder than building on it; the pipeline architecture changes when authority hierarchy is a first-class consideration versus a label.

What to look for¶

If authority awareness matters in your environment — government, healthcare, legal, contractor work, anywhere documents have hierarchical precedence — the questions worth asking of any system you evaluate are concrete. Does the system tag documents with an explicit authority level, or does it treat all documents as equivalently authoritative? Does it distinguish regulatory force (binding versus guidance versus internal) within a level? When the user asks a question about local procedure, does the system surface binding higher-authority documents on the same topic alongside the local content, or only documents that match the question topically? When local procedure and higher authority disagree, does the system render that as a hierarchical conflict — a distinct class — or just as another document conflict?

A specific test: ask the system a question about a local procedure that you know operates within a binding higher-authority context. The local procedure document should be cited; the higher-authority document should also be cited; and the relationship between them should be visible in the answer rendering. If the higher-authority document doesn't appear, that's a signal the system isn't authority-aware — and any answer it produces in that domain is missing a constraint that should have been visible.

If you're working through these tradeoffs and want a sounding board — diagnostic, not pitch — we'd welcome the conversation.

About Base2ML. Base2ML is a Pittsburgh-based company building knowledge-access tools for organizations that need to find what they already have. We work in the specific space where retrieval, authority hierarchy, and conflict surfacing meet operational reality.

Contact. Base2ML · chris@base2ml.com · base2ml.com · docs.base2ml.com

Numbers and percentages are deliberately not invented. Where industry research provided a credible figure we cite it; where it didn't, we say so rather than fabricating one.