AI Agents for Business
Language-model-driven agents that take over recurring tasks in your workflow — with clearly defined tools, documented audit trail, eval suite, and human-in-the-loop. Not a chatbot toy, but tools that hold up in production.
Chatbot vs. AI agent
A chatbot reacts to inquiries with text. An AI agent completes a concrete task in a process — reads input, classifies, calls tools, writes data back, escalates when needed. Chatbots live at the user front-end; agents live in your business process. We build agents.
Typical use cases
- Ticket classification and smart routing (KIX, OTRS, Jira, Zammad)
- Doc assistants with RAG over Confluence, SharePoint, filesystem
- Code review agents for PR triage, style checks, onboarding hints
- Customer support pre-qualification with clean hand-off to humans
- Data enrichment — structured fields extracted from free text
- Compliance pre-check of documents against policy sets
- Automated report generation from heterogeneous sources
- Operations agents for DevOps workflows (build triage, incident pre-classification)
How we work
- Use-case analysis — what decision is made manually today, how often, with what error tolerance? What is the measurable target?
- Model selection — on-premise (Llama, Gemma, Mistral via vLLM/llama.cpp) or API (OpenAI, Anthropic), depending on data class and latency budget.
- Tool definition — what external actions may the agent perform? With which permissions? What are the fallback paths?
- RAG integration — vector store setup, knowledge sources, source enforcement in answers.
- Eval loop — golden test set, automated evaluation, regression tests on every prompt refactor.
- Operations — monitoring (latency, token use, eval drift), alerting, human-in-the-loop intervention, audit log.
Tech stack
Deliverables
- Agent code in repo, modular split into prompts, tools, eval, observability
- Eval suite with gold set, CI integration, regression gate
- Monitoring dashboard for latency, token cost, eval score, tool-use rate
- Audit log schema (who asked what, what did the agent answer, which human approved)
- Operations runbook incl. roll-back path for model or prompt drift
- Training of your team in prompt care, tool extension, eval updates
Customer benefit
- Noticeably relieved employees through taken-over routine work
- Less backlog, higher throughput on standard cases
- Consistent quality — the agent handles "routine case A" the same way every time
- Full traceability via audit log and eval reports
- Scalability without linear staff growth
Compliance & security
- Data classification before model selection — no VS-NfD data to cloud APIs
- On-premise option for regulated industries (KRITIS, banking, health, public sector)
- EU AI Act assessment of the use case (risk tier, transparency duties)
- ISO/IEC 42001 as management system for AI governance where relevant
- Audit trail of all agent decisions incl. model version and prompt hash
- Red-teaming against prompt injection and tool abuse
FAQ
Do you build agents from scratch or use frameworks?
Both — we use LangChain/LangGraph/LlamaIndex as a kit but write agent-specific logic ourselves where framework wrappers cost performance or maintainability. The goal is always an agent your team still understands in five years.
What about hallucinations and security?
Two layers: (1) in design — strict tool definitions, RAG source enforcement, answer validation against schema, confidence thresholds. (2) in operation — human-in-the-loop on every externally-facing decision, gold-set eval as regression gate, drift monitoring. Our AI security audit covers red-teaming on top.
Does a local model suffice or must it be GPT/Claude?
Depends. Classification, structured extraction, code triage: a 7–27B on-prem model (Gemma, Llama, Mistral) often suffices. Free conversation with tool use at low latency: API models come first. Model size is means, not end.
How quickly do we see results?
2–3 weeks for a working prototype with real eval on your use case. Production readiness depending on compliance and integration breadth typically 6–12 weeks.
What happens to our data?
Stays with you — we host nothing permanently. Development optionally in our GDPR-compliant DACH cloud (Hetzner), in your environment, or air-gapped. For API models we clarify DPA status, locality, training opt-out beforehand.
Discuss an agent use case
What repetitive task eats hours per day? Tell us the task — we respond with a first assessment of feasibility, model choice, and effort.
> Start AI Readiness Check