Citation Discipline¶

The difference between an answer and a liability¶

A Base2ML white paper. Fourth in a series following The Information Paradox, Why Conflict Detection Is Harder Than It Looks, and The Half-Life of Documentation.

A confident-sounding answer is not a useful one¶

The defining failure mode of generic AI tools, when pointed at organizational knowledge, isn't that they refuse to answer questions. It's that they answer questions confidently when their grounding is partial or invented.

A user asks a question about overtime policy. The system responds with a fluent paragraph that paraphrases what the policy "appears to say." The user reads it. It sounds reasonable. They proceed.

What the user doesn't see: the system retrieved three policy documents, two of which were tangentially related and one of which was on a different subject entirely. The fluent paragraph wove together fragments from all three, plus some reasoning the LLM produced from its general knowledge of how organizations typically structure overtime policies. The result reads as authoritative because LLMs are good at producing prose that reads as authoritative. The result is, in practice, not traceable to any specific document — and may not be operationally accurate.

This is the situation that makes "AI for organizational knowledge" a liability rather than an asset. The user can't tell the difference between an answer that's grounded and one that isn't. The system has produced output the user has no rational basis for trusting, but no signal that they shouldn't.

Citation discipline is the practice of producing answers that the user can verify, in seconds, against the underlying documents. It's the single most important property a knowledge system can have, and most systems treat it as a feature rather than as the substrate. We argue it's the substrate — and that systems without rigorous citation discipline are not safer for being polished.

Three things that get called "citations"¶

The term is overloaded. It's worth distinguishing what's happening underneath.

Documents the system used. The system retrieved a set of documents and synthesized an answer from them. The "citations" displayed to the user are the names of those documents. The user, reading the answer, may or may not be able to trace any specific claim in the answer back to a specific passage in any specific document. The citation tells them only "these are some documents we looked at."

This is the weakest form of citation. It's better than nothing — it gives the user something to look up — but it puts all the work on the user. They have to read each cited document to figure out which passage actually grounds the claim they want to verify. In a long document, that's a non-trivial reading task. The user, in practice, rarely does it. The "citation" becomes plausibility theater.

Passages the system retrieved. The system shows not just which documents it used, but the specific passages from each document that the answer drew on. A click expands the passage; the user can see the exact text the system was working with. The verification task collapses from "read this whole document and figure out which paragraph supports the claim" to "read this paragraph and judge whether it supports the claim." This is a different order of magnitude of usefulness.

Inline citations grounded to specific claims. Each claim in the answer is tagged with the specific passage it came from. Claim A came from passage 1; claim B came from passage 2; claim C is a synthesis of passages 1 and 3. The user can trace each claim individually, not just the answer in aggregate. This is what citation discipline looks like when it's done right.

These three forms differ by an order of magnitude in user trust impact. Most systems implement the first form, claim to implement the third, and produce something closer to the second on a good day. The gap between what's claimed and what's delivered is where trust failures originate.

The hallucinated citation problem¶

There's a failure mode worth being precise about because it's not always recognized for what it is.

A generic LLM, asked to produce an answer with citations, will sometimes invent the citations. Not just the answer — the citations themselves. A reference to "Section 4.2 of the Borough Code" that doesn't exist. A page number from a document that has fewer pages. A quotation marked with quotation marks that doesn't appear anywhere in the actual document. The LLM has produced the textual form of a citation while having no actual access to a specific source it's drawing on.

This isn't a hypothetical. It's a regular failure of off-the-shelf AI tools when used for tasks that require grounded answers. Engineers building these tools sometimes describe the behavior as "the model is being creative." The user, in operational settings, doesn't experience it as creativity. They experience it as reading a confident answer with what looks like a real citation, acting on it, and discovering later that the citation doesn't exist.

The architectural fix is retrieval-augmented generation done with discipline. The LLM is given access only to passages the retrieval layer actually fetched. The LLM's prompt explicitly constrains it to cite by reference to those passages, with the passage text immediately available in context. The system enforces, at the prompt level, that any claim must be supported by a retrieved passage. The output is post-processed to verify that claimed citations point to passages that were actually in the retrieval set.

This is non-trivial engineering. It's also the difference between a system that hallucinates citations (and is therefore a liability in any setting where the citation matters) and a system that doesn't.

What citation discipline costs¶

Done properly, citation discipline imposes design constraints that not every organization is willing to accept.

The system has to refuse to answer questions whose grounding it can't establish. A user asks a question; retrieval surfaces nothing relevant; the system says "I don't have documents that cover this" rather than producing a plausible-sounding answer from general knowledge. This is the right behavior. It's also the behavior that makes the system feel "less impressive" to a casual demo audience, because the alternative — confidently-wrong answers — is more verbally satisfying in the moment.

The system has to constrain its synthesis. An LLM can produce a more elaborate answer if it's allowed to draw on its training data; constraining it to only what's in the retrieved passages produces shorter, more conservative answers. The shorter answers are more useful in operational settings because they're verifiable. The longer answers are more impressive in demos because they sound more "expert." Picking the verifiable version is the right call and the unfashionable one.

The system has to invest in retrieval quality. If retrieval surfaces the wrong passages, citation discipline can't fix that — the system will still answer based on what was retrieved, with citations to passages that don't actually contain the information the user wanted. Citation discipline is necessary but not sufficient; retrieval quality is the input on which citation discipline operates.

The system has to handle edge cases. A user asks a question that genuinely doesn't have a single-source answer; instead, the answer requires synthesizing across three passages from two documents. The system should produce that synthesis with citations to all three passages, in a way the user can follow. This is harder than single-source citation; it's the case that distinguishes good citation systems from passable ones.

These constraints add up to a system that's harder to build, slower to demo, and more honest. The honesty is the value proposition.

The audit trail dimension¶

In any regulated environment — government, healthcare, finance, legal — the citation isn't just for the immediate user. It's a record the organization may need to produce later.

A right-to-know request comes in for "all queries the system answered about topic X." A subpoena arrives asking for "the basis on which the system advised the borough manager to take action Y." An auditor reviews the organization's decision-making process and asks "show me how this conclusion was supported by the underlying documentation."

In each case, the value of the citation isn't the citation itself. It's the record that this citation existed at this moment, was presented to this user as the basis for an answer, and that the user acted on the basis of this specific grounding. The audit trail is the artifact.

Building this requires more than displaying citations to the user. It requires:

Persistent query log. Every query is recorded with the timestamp, the user, the question, the retrieved passages, the answer produced, and the citations attached to that answer. This is a structured record, not a free-text log file.

Stable references to passages. A citation that points at "passage 3 of document X" must remain valid even after the document is re-ingested or the index is rebuilt. The reference has to be content-anchored, not position-anchored. This sounds obvious; many implementations get it wrong.

Source document preservation. The cited passage has to be retrievable in its original form, not just paraphrased. If the citation points at section 4.2 of the 2018 ordinance, the audit reader has to be able to retrieve the actual text of section 4.2 of the 2018 ordinance — even six months later, even if the document has been superseded in the meantime.

Linkage between override decisions and the queries they affected. When an operator marks a document as legacy, that override should be visible in the audit trail of every query that retrieved the document. The reader of the audit can see: the system was relying on this document at this moment, and here's the override that was applied to it.

This is the audit-trail substrate. Most knowledge systems don't build it because they're optimized for the immediate user's experience, not for the third-party reader who shows up six months later asking how the organization knew what it knew.

What unverifiable answers cost¶

The cost of unverifiable answers is asymmetric across operational contexts. It's worth thinking about explicitly because it tells you how much citation discipline is worth in your specific setting.

In a casual operational context — a manager looking up a routine policy detail to confirm what they already mostly remember — an unverified answer is fine. The cost of being slightly wrong is bounded. The user has the contextual knowledge to spot a wrong answer if it appears.

In a stakes-bearing context — a borough manager fielding a public-records request, a litigator preparing a deposition, an HR director responding to a grievance, a compliance officer authoring a regulatory filing — an unverified answer is a liability. If the answer is wrong, the cost is concentrated and traceable. If the answer is right but the user can't show how it was grounded, the answer is functionally useless even when accurate. In these contexts, citation discipline isn't a feature — it's the difference between a system that can be used and one that can't.

The uncomfortable truth is that the same system serves both contexts. The user who looks up routine details casually is the same user who, on a different day, has to defend a decision to an auditor. Optimizing the system for the casual case at the expense of the stakes-bearing case is a trade most organizations don't realize they're making until they need the audit trail and find it isn't there.

What to look for¶

If citation discipline matters in your environment — and in any setting where the cost of an unverifiable answer is non-trivial, it does — the questions worth asking of any system you evaluate are concrete. When the system displays citations, can the user click through to the specific passage the answer drew on, or only to the document name? Are claims tagged to specific passages individually, or only to the answer as a whole? When the system is asked a question whose grounding the corpus doesn't support, does it produce a fluent answer with plausible-looking citations, or does it admit it doesn't know? Six months from now, when an external reader needs to reconstruct what the system told a user and why, is the audit trail structured enough to support that reconstruction?

There's a specific test worth running on any system before you trust it for stakes-bearing work: ask it a question whose answer is in the corpus, then check whether the cited passage actually supports the answer it produced. Then ask it a question whose answer isn't in the corpus and check whether it admits it doesn't know — or whether it produces a fluent answer with citations to passages that don't actually contain what the answer claims they contain. The second test is more revealing than the first.

If you're working through these tradeoffs and want a sounding board — diagnostic, not pitch — we'd welcome the conversation.

About Base2ML. Base2ML is a Pittsburgh-based company building knowledge-access tools for organizations that need to find what they already have. We work in the specific space where retrieval, authority hierarchy, and conflict surfacing meet operational reality.

Contact. Base2ML · chris@base2ml.com · base2ml.com · docs.base2ml.com

Numbers and percentages are deliberately not invented. Where industry research provided a credible figure we cite it; where it didn't, we say so rather than fabricating one.