> bits_and_friends _

$ cat /blog/2026-05-27-quellenbezug-und-nachvollziehbarkeit-bei-ki-antworten.en.md

Source attribution and traceability — why every AI answer must leave a trail

[de] [en]

Imagine an AI assistant in accounting answering the question “How do we treat reverse charge with a Belgian supplier?” in three cleanly written paragraphs. Sounds convincing. Is the answer correct? On which source does it rest? From what year is this source? Who approved it?

Without answers to these questions, the AI answer is not a usable asset — it is a guess in pretty language. In a corporate context that is not acceptable.

What source attribution concretely means

A source-bound AI answer contains at least four things:

  • The actual answer. Clear, concise, in business language.
  • The sources used. Which concrete documents, wiki pages, tickets, contracts were drawn upon?
  • The snippet used per source. Which concrete passage from the document was relevant?
  • Metadata of the source. When was the document last updated? Who approved it? In what scope does it apply?

With these four components an answer can be verified at any time — by the requester, by a reviewer, by an auditor.

Why this is more than nice-to-have

Three reasons make source attribution a minimum requirement, not a premium feature:

  • Hallucination detection. When the AI makes a statement that does not appear that way in any of its cited sources, source attribution makes it visible. Without source citation a hallucination can stay undetected indefinitely.
  • Currency check. If the source is from 2019 and the legal situation has changed since, the date makes that recognisable. Without a date every answer looks fresh.
  • Authoritativeness clarification. If the source is called “notes meeting of 12 March”, that is different from “tax guideline of corporate compliance, status Q1 2026”. The source’s authority determines the answer’s authority.

Without these three verification options, an AI answer can be technically well-worded and still wrong — in an area where wrong answers have concrete consequences.

How source attribution is implemented technically

In building a RAG system, three points matter to make source attribution solid:

  • Carry metadata at index time. Each chunk in the vector index gets not only the text, but a unique source ID, a date, a scope, an approval marker. If those metadata are omitted at index time, they are not available later.
  • Design prompts so sources are required. The language model is explicitly instructed to attribute every statement to a source ID. Without that instruction the model will use sources but not cite them.
  • Frontend display of sources. Source citations must be visible and verifiable for users — ideally as links directly into the source document, with the relevant passage highlighted.

When these three points align, the user gets an answer they can trust — because they can check it.

What additional benefits emerge

Source attribution is not only a safeguard but also a productive tool:

  • Knowledge maintenance becomes visible. When a source is used frequently but the last update is four years old, the responsible person knows an update is due.
  • Knowledge gaps become visible. When the AI cannot answer many questions because no source exists for them, that is a concrete gap in the knowledge inventory. It can be filled deliberately.
  • Training and onboarding profit. New staff see not only the answer but the original document — learning, as a side effect, which sources count as authoritative in the company.

Where source attribution hits limits

There are cases where the source trail is not as clear as it should be. From practice we know three:

  • Synthesis answers. When an answer combines five sources, “source is document X” is not quite right — it is a mixture. Here all sources must be listed, and the answer should make visible what comes from where.
  • Implicit knowledge. Some answers rest on understanding the AI brings via general training — e.g. basic accounting logic. If that is not in an explicit source, source attribution is a stretch. Honesty helps: the answer flags what stems from sources and what does not.
  • Ambiguous sources. When the same question is answered differently in two sources, that must be made clear. A smooth answer citing only one of the two hides the problem.

What remains in the end

An AI system with consistent source attribution is slower to implement but faster to accept. Staff trust it because they can verify. Compliance officers accept it because it has an audit trail. External auditors accept it because the trail is solid in disputes.

Trust in AI answers does not come from eloquent wording — it comes from the possibility of checking them. Source attribution is the technical form of that trust.