Building a Permission-Aware Search System: Architecture, Design, and Lessons Learned

Search in enterprise applications is deceptively hard. The mechanics of finding records are straightforward, but doing it safely — respecting who should see what, across multi-tenant data, with natural language queries — is a different problem entirely. This post walks through the design of a pluggable, permission-aware search system built for internal support operations on sensitive orders and customers data.

The full source is available on GitHub.

The Core Problem

Internal support teams deal with a chaotic mix of query types: sometimes an agent pastes a raw order number, sometimes they type “show me open orders from this week for aster@example.com”, and sometimes they have no idea what they’re looking for and just want something to come back. The system needs to handle all of these without ever leaking data to an agent who doesn’t have permission to see it.

The design has five core goals:

Low-latency search suitable for real-time support tooling
Strict permission enforcement using both RBAC and ABAC, with deny precedence
Redaction-safe UX — unauthorized matches surface as ID-only placeholders, not hard errors
Pluggable internals — every major capability (rewrite engine, semantic layer, data store, policy engine) can be swapped without changing the public API contract
Contract-governed fields — a schema registry controls which fields are accessible per API version and intent

Request Flow: From Natural Language to Authorized Results

Every query goes through a five-stage pipeline before a response is returned:

Identifier resolution — If the query contains a structured token (order number, tracking ID, customer email, etc.), the system short-circuits to a fast path and skips NL processing entirely.
Query rewriting — Otherwise, a rewrite engine converts the natural language query into a structured DSL (filters, sort, pagination, intent category).
Semantic retrieval — A semantic provider (backed by embedding-based top-K over SQLite, or an external Superlinked service) refines filters and returns candidate IDs.
Authorization — Each candidate is evaluated by the policy engine. ACL + ABAC rules determine allow or deny per document.
Redaction — Authorized documents are returned in full. Unauthorized matches produce ID-only placeholders with a reasonCode and requestAccessToken — no sensitive fields leak.

The Identifier Fast Path

One of the most impactful design decisions is treating identifier-heavy queries differently from natural language queries. A support agent pasting ORD-001234 doesn’t need semantic understanding — they need the record immediately.

The system classifies every incoming query into a queryShape before doing anything else:

identifier_token — looks like a structured ID; route to fast path
contact_lookup — email or phone; route to contact fast path
typeahead_prefix — short prefix; route to typeahead fast path
sentence_nl — full natural language; route to intent/semantic path
unsupported_domain — outside scope; return early with a clarification prompt

Detected identifiers are normalized (e.g., legacy field aliases like order_number are mapped to the canonical order.number contract field) and looked up in parallel across Orders and Customers.

The response includes a resultReasonCode that makes no-result states transparent:

VISIBLE_RESULTS
MATCHES_EXIST_BUT_NOT_VISIBLE
NO_MATCH_IN_TENANT
CLARIFICATION_REQUIRED

This gives the UI enough information to show the agent a helpful message instead of a blank result.

Query Rewriting with Enhanced SLM Prompting

For natural language queries, the system uses a small language model (Qwen 2.5 7B or Llama 3.1 8B via Ollama) to rewrite the user’s message into a structured query DSL. Getting this right at scale required moving beyond naive prompting.

Schema-Aware, Few-Shot Prompting

The rewrite engine now uses three composable components that build each prompt at request time:

Schema Provider — supplies complete field definitions including types, operators, enum values, and which intents can access each field
Example Provider — injects 18 curated few-shot examples across 4 intent categories (WISMO, CRM, Returns/Refunds, Default)
Prompt Builder — assembles a ~2,362 token prompt with schema tables, examples, operator documentation, and explicit rules about what the model must not do

The improvement in accuracy after this change was substantial:

Metric	Before	After	Delta
SLM overall accuracy	60%	92%	+32%
Filter field correctness	70%	95%	+25%
Repair success rate	40%	80%	+40%

Intent Categories

The rewrite engine classifies every query into one of four intent categories, which control which fields are available in the resulting DSL:

wismo (Where Is My Order) — order state, shipment state, tracking ID, date ranges
crm_profile — customer email/ID, VIP tier, email verification status
returns_refunds — return eligibility, return status, refund status
default — minimal filters, low confidence; may trigger a clarification response

Fields not in the allowlist for a given intent are blocked with a FIELD_NOT_ALLOWED error before any retrieval happens. This keeps the authorization surface small and auditable.

The Schema Registry: Single Source of Truth

One of the most important architectural decisions is that all data-shape knowledge lives in a single place: internal/schema/ecommerce.go. No other file contains raw field names, table names, enum string literals, or identifier patterns as hardcoded constants.

The registry owns:

Resource and table names (order → orders_docs)
Canonical field names and their native SQL columns
Enum values and enum role bindings
Identifier patterns (e.g., ORD-\d{6} → (order, order.number))
Intent-scope allowlists (which fields are visible per intent)
Sort and filter metadata

The registry is built once at startup and injected into the contract validator, the identifier resolver, the semantic parser, and the SQLite adapter. Adding a new field means editing one struct — the adapter, validator, and SLM prompt all pick it up automatically.

Pluggable Architecture

The system is designed around interface boundaries that allow each layer to be swapped independently. The Search Orchestrator wires together six adapters:

AuthAdapter — validates the calling user’s identity
RewriteEngine — converts NL to DSL; currently routes between a deterministic rule-based parser, Qwen, and Llama with a configurable fallback chain
SemanticProvider — performs embedding-based top-K retrieval; can point at the local SQLite mock or a real Superlinked service
DataStore — retrieves candidate documents; SQLite adapter ships by default, MongoDB adapter is stubbed
PolicyEngine — evaluates ACL + ABAC rules per document with an in-memory grants cache
RedactionBuilder — constructs the authorized response and hidden placeholders

The Superlinked semantic provider supports three serving modes via config:

shadow — calls the provider but ignores its results (safe for initial rollout)
gated — serves provider candidates only when confidence and latency thresholds pass
off — disables the external provider entirely

Permission and Redaction Model

Authorization uses an ACL + ABAC model with deny precedence. If a user doesn’t have access to a document, the response never includes any field values — not even for “safe” fields like timestamps or states.

Unauthorized matches return only:

resourceId
reasonCode
requestAccessToken

This means the UI can accurately tell the agent “there are 3 more results you don’t have access to” without exposing anything from those records. Direct detail endpoints (GET /api/orders/{id}, GET /api/customers/{id}) always return 403 for unauthorized access — no partial data.

Running It Locally

The stack ships as three Docker images behind a single Compose file. The prerequisite is a local Ollama instance with the required models pulled:

			
ollama serve
ollama pull llama3.1:8b-instruct
ollama pull qwen2.5:7b-instruct
ollama pull nomic-embed-text

Then bring everything up:

make up-build

The UI runs on localhost:3000, the Go API on localhost:8080, and the Superlinked adapter on localhost:8081.

To smoke test the full pipeline:

			
curl -sS -X POST http://127.0.0.1:8080/api/query/interpret \
  -H 'Content-Type: application/json' \
  -H 'X-User-Id: alice' \
  -H 'X-Tenant-Id: tenant-a' \
  -d '{"message":"show open orders this week","provider":"slm-superlinked","contractVersion":"v2","debug":true}'

		

Debug Mode and Observability

The system ships with a first-class debug mode. When debug: true is set on /api/query/interpret, the response includes a full trace of the pipeline:

debug.traceId (also returned in the X-Trace-Id header)
debug.rewrite — the original message, generated DSL, detected intent, and resource type
debug.flow[] — a stage-by-stage timeline from ingress through to response
debug.filterSource[] — per-filter attribution showing whether it came from the SLM, Superlinked, or both
Retrieval latency, fallback chain, and gate reason for the semantic provider

The Next.js UI surfaces all of this in a debug panel, making it practical to diagnose query quality issues without touching logs.

What I’d Do Differently

A few things that took longer than expected:

Prompt engineering takes real iteration. The jump from 60% to 92% SLM accuracy came entirely from building the schema-aware few-shot prompting infrastructure, not from switching models. Putting effort into the prompt builder early would have saved time.

The schema registry should have been the first thing built. Starting with hardcoded field names scattered across the codebase created a lot of refactoring work later. Centralizing into a single registry from the start pays for itself immediately.

The identifier fast path is worth the complexity. It might seem like an optimization, but it’s actually a correctness feature — support agents using structured IDs expect deterministic behavior, and routing them through an NL pipeline introduces unnecessary latency and failure modes.

What’s Next

A few areas worth exploring from here:

A real-time streaming response mode for long-running queries
Expanding the MongoDB adapter from stub to production-ready
Automated replay benchmarking against real query CSVs to catch regressions in SLM accuracy
Tighter integration between the policy engine and the SLM prompt — currently the model doesn’t know what the caller is allowed to see, which means it can generate filters for fields the user can’t access (caught at validation, but wasteful)

The full source, architecture notes, and implementation docs are on GitHub. Feedback welcome.

Everyday Reflections

Leave a comment Cancel reply