> bits_and_friends _

$ cat /blog/2026-05-27-datenqualitaet-durch-ki-gestuetzte-standardisierung.en.md

Data quality through AI-supported standardisation — small effects, big impact

[de] [en]

In every ERP report, every BI dashboard, every analysis, the same question eventually arises: “Can we trust these numbers?” The answer is usually: “As much as the master data behind them.” And master data is the thing that gets the least maintenance in most mid-sized companies — because maintenance costs time and nobody directly benefits.

AI changes the economics of that maintenance.

Where data quality emerges — and where it tips

Data quality emerges not from a one-off cleanup, but from continuous care with every new transaction. When the “industry” field is left blank or filled with “other” on a new customer, the later analysis by industry is worthless. When a supplier appears three times with slightly different spellings in the system, “top-10 suppliers” is misleading.

These small errors do not come from carelessness, but from haste. Nobody deliberately enters bad data — but under time pressure the entry process gets shortened, and the missing field stays empty.

How AI catches the haste

AI can act in three places without much effort and improve quality:

  • At creation: when a new customer, supplier or item is created, AI suggests missing fields based on existing information (name, address, web domain). Industry from the company name, VAT ID from the public register, contact person from the signature of the first email. The operator checks and accepts.
  • At recognition: when a document arrives that belongs to an existing supplier (in slightly different spelling), it is matched to the correct master record — instead of creating a new one. Duplicates are not even produced.
  • At refresh: master data is periodically reconciled against external sources — commercial register, VAT-ID database, postal code register. Out-of-date or incorrect data is proposed for update.

In all three cases the AI is proposing, the human decides. But the burden of proposal creation — which today usually sits with the user — disappears.

What can actually be measured

In projects we have accompanied, the following metrics change measurably in the first six months:

  • Share of fully populated master records: typical jump from 50–60 % to >90 %.
  • Share of duplicates in customer and supplier master: drop from 5–15 % to <1 %.
  • Share of correctly classified documents: rise from about 70 % to >95 %.
  • Effort for “data correction” in month-end / quarter-end preparation: reduction by 50–70 %.

The last two numbers are the ones that arrive in the company — closing preparation becomes calmer, reporting more reliable, and nobody spends entire days clearing inconsistencies out of the data.

Which preconditions are really necessary

The honest list of preconditions is shorter than often assumed:

  • Access to the data sources. The AI must be allowed to read from and write to master data — via official interfaces, not UI automation.
  • Defined target fields. Which fields really have to be filled? Which can stay empty? This question has to be answered once — usually it never was before.
  • Responsibility for approvals. Who decides on contentious proposals (e.g. is this really the same supplier or a different one)? This role often exists already, but it is not explicitly named.

What is not necessary: a giant master-data management project with its own software. AI-supported data quality builds on existing systems and makes them better — it does not replace them.

Where caution is appropriate

For all the enthusiasm: three points require human judgement.

  • Automatic corrections without trace. If AI overwrites data without an audit trail, nobody later knows what it changed. Every change must be documented.
  • Adjustments without versioning. Master data has history — address change, name change, legal form change. The AI must not overwrite history, but keep the older version.
  • External data sources as sole truth. When an external database service has a different address than the master record, the external service is not always correct. Sometimes our data is more current. Suggest yes — adopt only after review.

What becomes visible in the end

Data quality is one of the few topics where the investment pays back disproportionately over time. Every good analysis, every good decision, every good AI application in the company builds on it. Whoever becomes consistent early has a different basis in two years — and it costs them not more effort, but less.

AI turns an effortful, often neglected background activity into an almost invisible but effective accompaniment of daily work. That is not spectacular — but it is sustainable.