How We Built an Enterprise RAG System Managing 1,400+ Documents at Zero Cloud Cost

The Problem: Scattered Corporate Knowledge Costs More Than You Think

Every company with more than five people has the same invisible problem: critical knowledge is fragmented across emails, documents, chats, CRMs, databases, and people's heads.

McKinsey estimates that knowledge workers spend 19.8% of their time searching for internal information. For a 20-person company with an average salary of €40K, that's €160,000/year burned on fruitless searches, repeated questions, and decisions made without full context.

Traditional systems — corporate wikis, SharePoint, Confluence — don't solve the problem. They require someone to actively keep documentation up to date. Nobody does.

The Solution: Hybrid RAG with a Knowledge Graph

We built a Retrieval-Augmented Generation system that automatically indexes all corporate knowledge and makes it queryable in natural language. Not a generic chatbot — a system that actually knows your business.

High-Level Architecture

The system is built on three layers:

1. Ingestion Layer — Automatically processes corporate documents (strategy, marketing, prospects, infrastructure, integrations, pricing, competitive analysis) and segments them into semantically coherent chunks. It doesn't just cut text: it recognizes document structure (headings, sections, code blocks) and preserves context throughout.

2. Hybrid Search Layer — Combines semantic search (understands the meaning of a query) with keyword search (finds exact matches). A 60/40 balance between the two approaches eliminates both the false positives of purely semantic search and the rigidity of keyword-only matching.

3. Knowledge Graph Layer — A graph of entities and relationships mapping people, technologies, projects, skills, and verticals. When you ask "who owns project X?", the system doesn't just find similar documents — it navigates the relationship graph.

Real Numbers

Metric	Value
Documents indexed	1,400+ chunks from 59 source files
Document types	Strategy, marketing, prospects, infrastructure, security, competitive intel
Knowledge graph entities	71 (people, technologies, projects, skills, companies)
Mapped relationships	45+ typed connections
Average search latency	50–150ms
Additional cloud cost	€0/month
Updates	Continuous, automatic
Languages supported	5 (IT, EN, ES, PT, DE, FR)

How Search Works: Vectors Alone Are Not Enough

Most RAG systems on the market rely solely on vector search (embeddings). That works well for vague queries ("tell me something about marketing strategy") but falls apart for precise ones ("what's the enterprise tier price for facility management?").

Our hybrid approach handles both:

Semantic search (60%) — Converts the query into a high-dimensional vector and retrieves documents with the closest meaning. It uses asymmetric embeddings: the way it encodes a question differs from the way it encodes a document, because a short question and a long paragraph have fundamentally different linguistic structures.

BM25 search (40%) — A probabilistic algorithm that weights term frequency. If you search for "Vacchelli €797", the system finds exactly that figure in exactly those documents, even if the query isn't semantically "close" to anything in the index.

Quality filter — Only results above a minimum relevance threshold are returned. No results is better than wrong results: in an enterprise context, incorrect information is worse than no information at all.

The Knowledge Graph: Relationships, Not Just Documents

Documents contain facts. But companies run on relationships.

"Alessandro manages SCALA" is not a fact you find in a document — it's a relationship between a person entity and a project entity. The knowledge graph captures these relationships and enables structural queries:

"Which technologies does project X use?" → graph traversal
"Who has expertise in AI Strategy?" → entity search by type
"Which projects serve the hospitality vertical?" → multi-hop traversal

The graph supports 8 relationship types (uses, builds, serves, requires, competes_with, part_of, manages, has_skill) and 8 entity types. Each entity has its own vector embedding, so it can be found both by meaning and by structure.

Intelligent Deduplication: Never Re-Process the Same Document Twice

An underrated problem in RAG systems is re-indexing. If a document changes one line out of 200, a naive system re-processes it entirely: new chunking, new embeddings, new API costs.

Our system computes a cryptographic hash of every chunk. If the content hasn't changed, the chunk is skipped entirely — zero API calls, zero database writes. If a file has fewer chunks than its previous version (because it was shortened), orphaned chunks are removed automatically.

The result: a full re-index of 1,400+ chunks takes under 30 seconds when nothing has changed.

Native Integration with the AI Assistant

The system is not a standalone application with a UI to maintain. It integrates directly into the AI assistant via the MCP protocol (Model Context Protocol), exposing 8 operations:

Search — Natural language queries with optional filters
Retrieval — All chunks from a specific document
Catalog — Complete list of indexed sources
Ingestion — Adding new documents on demand
Graph navigation — Entity search, relationships, N-hop neighborhood
Statistics — Real-time system status

When the AI assistant receives a question, it automatically consults the RAG before responding. It doesn't "invent" answers — it retrieves verified facts from the corporate knowledge base.

Why Not a SaaS?

RAG-as-a-Service platforms (Pinecone, Weaviate Cloud, Zilliz) cost €200–2,000/month at enterprise volumes. They make sense for organizations without infrastructure expertise.

For anyone already running PostgreSQL in production, adding pgvector is an extension — not a new service. The marginal cost is literally zero: same servers, same database, same backups.

But the real advantage isn't the savings: it's control. Corporate data never leaves your infrastructure. No vendor lock-in, no pricing surprises, no dependence on APIs that can change their terms of service.

Lessons Learned

1. Hybrid beats pure vector. Semantic-only search produces too many false positives in enterprise contexts where precision matters more than recall. BM25 at 40% adds the grounding that keeps results reliable.

2. The knowledge graph is not a luxury. For companies with complex organizational structures, the relationship graph answers questions no vector system can. "Who is responsible for what?" is a graph question, not a similarity question.

3. Deduplication saves more than you expect. In a system that re-indexes periodically, hash-based dedup reduces API consumption by 90%+ on every run after the first.

4. Smart chunking is 50% of the result. Chunks that are too small lose context. Too large, and they dilute relevance. Segmenting by document structure (headers, functions, sections) with overlap preserves both context and precision.

5. Zero-cost doesn't mean zero-effort. Data engineering, NLP, and infrastructure expertise are required. But once built, the operating cost is essentially free.

Who This Is Right For

This approach makes sense for:

Companies with 10+ employees accumulating knowledge across scattered documents
Technical teams already running PostgreSQL in production
Multi-vertical organizations with heterogeneous knowledge bases
Anyone spending >€200/month on knowledge management tools

It doesn't make sense for:

Startups with 2–3 people (all the knowledge fits in people's heads)
Companies without internal technical expertise (a SaaS is the better option)
Use cases with fewer than 100 documents (overkill)

This system is part of SCALA AI OS, the AI operating system for multi-vertical companies. Learn more or request a demo.