Building a Permission-Aware Search System: Architecture, Design, and Lessons Learned

Search in enterprise applications is deceptively hard. The mechanics of finding records are straightforward, but doing it safely — respecting who should see what, across multi-tenant data, with natural language queries — is a different problem entirely. This post walks through the design of a pluggable, permission-aware search system built for internal support operations on sensitive orders and customers data.

The full source is available on GitHub.


The Core Problem

Internal support teams deal with a chaotic mix of query types: sometimes an agent pastes a raw order number, sometimes they type “show me open orders from this week for aster@example.com”, and sometimes they have no idea what they’re looking for and just want something to come back. The system needs to handle all of these without ever leaking data to an agent who doesn’t have permission to see it.

The design has five core goals:

  • Low-latency search suitable for real-time support tooling
  • Strict permission enforcement using both RBAC and ABAC, with deny precedence
  • Redaction-safe UX — unauthorized matches surface as ID-only placeholders, not hard errors
  • Pluggable internals — every major capability (rewrite engine, semantic layer, data store, policy engine) can be swapped without changing the public API contract
  • Contract-governed fields — a schema registry controls which fields are accessible per API version and intent

Request Flow: From Natural Language to Authorized Results

Every query goes through a five-stage pipeline before a response is returned:

  1. Identifier resolution — If the query contains a structured token (order number, tracking ID, customer email, etc.), the system short-circuits to a fast path and skips NL processing entirely.
  2. Query rewriting — Otherwise, a rewrite engine converts the natural language query into a structured DSL (filters, sort, pagination, intent category).
  3. Semantic retrieval — A semantic provider (backed by embedding-based top-K over SQLite, or an external Superlinked service) refines filters and returns candidate IDs.
  4. Authorization — Each candidate is evaluated by the policy engine. ACL + ABAC rules determine allow or deny per document.
  5. Redaction — Authorized documents are returned in full. Unauthorized matches produce ID-only placeholders with a reasonCode and requestAccessToken — no sensitive fields leak.

The Identifier Fast Path

One of the most impactful design decisions is treating identifier-heavy queries differently from natural language queries. A support agent pasting ORD-001234 doesn’t need semantic understanding — they need the record immediately.

The system classifies every incoming query into a queryShape before doing anything else:

  • identifier_token — looks like a structured ID; route to fast path
  • contact_lookup — email or phone; route to contact fast path
  • typeahead_prefix — short prefix; route to typeahead fast path
  • sentence_nl — full natural language; route to intent/semantic path
  • unsupported_domain — outside scope; return early with a clarification prompt

Detected identifiers are normalized (e.g., legacy field aliases like order_number are mapped to the canonical order.number contract field) and looked up in parallel across Orders and Customers.

The response includes a resultReasonCode that makes no-result states transparent:

  • VISIBLE_RESULTS
  • MATCHES_EXIST_BUT_NOT_VISIBLE
  • NO_MATCH_IN_TENANT
  • CLARIFICATION_REQUIRED

This gives the UI enough information to show the agent a helpful message instead of a blank result.


Query Rewriting with Enhanced SLM Prompting

For natural language queries, the system uses a small language model (Qwen 2.5 7B or Llama 3.1 8B via Ollama) to rewrite the user’s message into a structured query DSL. Getting this right at scale required moving beyond naive prompting.

Schema-Aware, Few-Shot Prompting

The rewrite engine now uses three composable components that build each prompt at request time:

  • Schema Provider — supplies complete field definitions including types, operators, enum values, and which intents can access each field
  • Example Provider — injects 18 curated few-shot examples across 4 intent categories (WISMO, CRM, Returns/Refunds, Default)
  • Prompt Builder — assembles a ~2,362 token prompt with schema tables, examples, operator documentation, and explicit rules about what the model must not do

The improvement in accuracy after this change was substantial:

MetricBeforeAfterDelta
SLM overall accuracy60%92%+32%
Filter field correctness70%95%+25%
Repair success rate40%80%+40%

Intent Categories

The rewrite engine classifies every query into one of four intent categories, which control which fields are available in the resulting DSL:

  • wismo (Where Is My Order) — order state, shipment state, tracking ID, date ranges
  • crm_profile — customer email/ID, VIP tier, email verification status
  • returns_refunds — return eligibility, return status, refund status
  • default — minimal filters, low confidence; may trigger a clarification response

Fields not in the allowlist for a given intent are blocked with a FIELD_NOT_ALLOWED error before any retrieval happens. This keeps the authorization surface small and auditable.


The Schema Registry: Single Source of Truth

One of the most important architectural decisions is that all data-shape knowledge lives in a single place: internal/schema/ecommerce.go. No other file contains raw field names, table names, enum string literals, or identifier patterns as hardcoded constants.

The registry owns:

  • Resource and table names (orderorders_docs)
  • Canonical field names and their native SQL columns
  • Enum values and enum role bindings
  • Identifier patterns (e.g., ORD-\d{6}(order, order.number))
  • Intent-scope allowlists (which fields are visible per intent)
  • Sort and filter metadata

The registry is built once at startup and injected into the contract validator, the identifier resolver, the semantic parser, and the SQLite adapter. Adding a new field means editing one struct — the adapter, validator, and SLM prompt all pick it up automatically.


Pluggable Architecture

The system is designed around interface boundaries that allow each layer to be swapped independently. The Search Orchestrator wires together six adapters:

  • AuthAdapter — validates the calling user’s identity
  • RewriteEngine — converts NL to DSL; currently routes between a deterministic rule-based parser, Qwen, and Llama with a configurable fallback chain
  • SemanticProvider — performs embedding-based top-K retrieval; can point at the local SQLite mock or a real Superlinked service
  • DataStore — retrieves candidate documents; SQLite adapter ships by default, MongoDB adapter is stubbed
  • PolicyEngine — evaluates ACL + ABAC rules per document with an in-memory grants cache
  • RedactionBuilder — constructs the authorized response and hidden placeholders

The Superlinked semantic provider supports three serving modes via config:

  • shadow — calls the provider but ignores its results (safe for initial rollout)
  • gated — serves provider candidates only when confidence and latency thresholds pass
  • off — disables the external provider entirely

Permission and Redaction Model

Authorization uses an ACL + ABAC model with deny precedence. If a user doesn’t have access to a document, the response never includes any field values — not even for “safe” fields like timestamps or states.

Unauthorized matches return only:

  • resourceId
  • reasonCode
  • requestAccessToken

This means the UI can accurately tell the agent “there are 3 more results you don’t have access to” without exposing anything from those records. Direct detail endpoints (GET /api/orders/{id}, GET /api/customers/{id}) always return 403 for unauthorized access — no partial data.


Running It Locally

The stack ships as three Docker images behind a single Compose file. The prerequisite is a local Ollama instance with the required models pulled:

ollama serve
ollama pull llama3.1:8b-instruct
ollama pull qwen2.5:7b-instruct
ollama pull nomic-embed-text

Then bring everything up:

make up-build

The UI runs on localhost:3000, the Go API on localhost:8080, and the Superlinked adapter on localhost:8081.

To smoke test the full pipeline:

curl -sS -X POST http://127.0.0.1:8080/api/query/interpret \
-H 'Content-Type: application/json' \
-H 'X-User-Id: alice' \
-H 'X-Tenant-Id: tenant-a' \
-d '{"message":"show open orders this week","provider":"slm-superlinked","contractVersion":"v2","debug":true}'

Debug Mode and Observability

The system ships with a first-class debug mode. When debug: true is set on /api/query/interpret, the response includes a full trace of the pipeline:

  • debug.traceId (also returned in the X-Trace-Id header)
  • debug.rewrite — the original message, generated DSL, detected intent, and resource type
  • debug.flow[] — a stage-by-stage timeline from ingress through to response
  • debug.filterSource[] — per-filter attribution showing whether it came from the SLM, Superlinked, or both
  • Retrieval latency, fallback chain, and gate reason for the semantic provider

The Next.js UI surfaces all of this in a debug panel, making it practical to diagnose query quality issues without touching logs.


What I’d Do Differently

A few things that took longer than expected:

Prompt engineering takes real iteration. The jump from 60% to 92% SLM accuracy came entirely from building the schema-aware few-shot prompting infrastructure, not from switching models. Putting effort into the prompt builder early would have saved time.

The schema registry should have been the first thing built. Starting with hardcoded field names scattered across the codebase created a lot of refactoring work later. Centralizing into a single registry from the start pays for itself immediately.

The identifier fast path is worth the complexity. It might seem like an optimization, but it’s actually a correctness feature — support agents using structured IDs expect deterministic behavior, and routing them through an NL pipeline introduces unnecessary latency and failure modes.


What’s Next

A few areas worth exploring from here:

  • A real-time streaming response mode for long-running queries
  • Expanding the MongoDB adapter from stub to production-ready
  • Automated replay benchmarking against real query CSVs to catch regressions in SLM accuracy
  • Tighter integration between the policy engine and the SLM prompt — currently the model doesn’t know what the caller is allowed to see, which means it can generate filters for fields the user can’t access (caught at validation, but wasteful)

The full source, architecture notes, and implementation docs are on GitHub. Feedback welcome.


Discover more from Everyday Reflections

Subscribe to get the latest posts sent to your email.



Leave a comment

About Me

Over 24 years of experience developing software to support multi-million dollar revenue scale and leading global engineering teams. Hands-on leadership in building and mentoring software engineering teams. I love History as a subject and also run regularly long distances to keep myself functional.

Newsletter