04 June 2026 11:00 - 11:30
Don't route what You can't redact: Sensitivity-aware LLM routing
Routing between LLMs often sacrifices data privacy by prioritizing context over sensitivity when balancing local and SaaS endpoints. For instance, a simple query like "What is Christopher Nuland's salary?" can leak sensitive context to an external provider.
We introduce a reference architecture that elevates sensitivity as a primary routing metric, establishing a two-dimensional decision matrix with distinct enforcement paths: direct SaaS egress, redact-then-SaaS, and local-only isolation. Our system uses Microsoft Presidio with custom PII recognizers to pseudonymize sensitive entities (e.g., "Chris Nuland" becomes <PERSON_1>) before egress. Integrating NeMo Guardrails with vLLM-hosted small language models to form a secure egress point. Preventing unauthorized data leakage when managing shared context between models. All managed within Red Hatās OpenShift AI platform.
This session provides a demo of this stack, showing how to enforce semantic routing with robust data isolation and redaction controls.