Sovrn Guide

A production-ready AI-powered customer support service built on multi-agent orchestration and Retrieval Augmented Generation (RAG). The platform uses an “agents as tools” pattern where a central orchestrator delegates to specialized agents—each focused on distinct capabilities like knowledge retrieval or account operations—delivering accurate, context-aware responses to customer queries.

Role: Lead Architect and Primary Developer

The Challenge

Customer support at scale requires balancing response quality with operational costs. Traditional approaches—either expensive fine-tuned models or brittle keyword-based systems—couldn’t meet Sovrn’s needs for accurate, up-to-date responses that could evolve with product changes.

The project required navigating technologies new to the organization: Python backend development, AWS Bedrock Knowledge Bases, the Strands Agents framework, and the Model Context Protocol (MCP). Each decision needed to balance capability against long-term maintainability.

Architecture

The system follows a layered architecture separating API concerns, agent orchestration, external service clients, and tool implementations:

┌─────────────────────────────────────────────────────────────┐
│                    API Layer (FastAPI)                      │
│  REST endpoints • JWT authentication • Request validation   │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                  Agent Layer (Strands)                      │
│  Orchestrator • Specialized agents • Tool routing           │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                     Client Layer                            │
│  Bedrock KB • Redis • S3/Athena Analytics                   │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                      Tool Layer                             │
│  RAG queries • URL encoding • MCP tools (extensible)        │
└─────────────────────────────────────────────────────────────┘

Technical Decisions

RAG Over Fine-Tuning

RAG enables incorporating up-to-date knowledge without expensive model retraining. When documentation changes, the knowledge base updates automatically—no model deployment required.

Multi-Model Strategy

I designed a cost-optimized approach using two models for different purposes:

Model	Purpose	Rationale
Claude 3.5 Sonnet	Agent reasoning and response generation	Superior reasoning for complex multi-turn conversations
Amazon Nova Pro	Knowledge base retrieval	Balanced capability and cost for high-volume RAG queries

Multi-Agent Orchestration

The system implements an “agents as tools” pattern where a central orchestrator delegates to specialized agents based on user intent. Each agent is scoped to a specific domain—knowledge retrieval, account operations, or other capabilities—allowing the orchestrator to compose complex workflows while keeping individual agents focused and maintainable.

MCP Integration

The system integrates with a separate MCP server for extensible tool capabilities. New tools can be added without modifying the core service—agents dynamically discover and use available tools at runtime.

Functional Error Handling

I implemented a Rust-inspired Result[T, E] type that makes error handling explicit throughout the codebase, eliminating silent failures and making error paths visible in the type system.

Tech Stack

Category	Technologies
Backend	Python 3.12, FastAPI, Hypercorn (ASGI)
AI/ML	AWS Bedrock (Knowledge Bases, Guardrails), Strands Agents, MCP
LLMs	Anthropic Claude 3.5 Sonnet, Amazon Nova Pro
State	Redis (indefinite persistence, 20-message sliding window)
Analytics	AWS S3 (Parquet), Athena
Auth	JWT with JWKS validation
Infrastructure	Kubernetes, Helm, Docker
CI/CD	GitHub Actions, Semantic Release
SDK	Auto-generated TypeScript client from OpenAPI

Observability

The service integrates OpenTelemetry with DataDog APM to trace the entire request flow—from API ingestion through agent reasoning to knowledge base retrieval and response generation. Every span is correlated, making it straightforward to diagnose latency issues or debug unexpected agent behavior in production.

Security and Compliance

Mandatory guardrails prevent inappropriate content in public-facing responses with fail-fast validation ensuring they cannot be accidentally disabled. All queries and responses are logged to S3/Athena for audit and analysis.

Role and Impact

I owned every aspect of this project from architecture design through production deployment—100% of the codebase, DevOps configuration, and documentation.

Outcomes:

Zero critical incidents since launch
70%+ test coverage enforced via CI
TypeScript SDK consumed by frontend applications
Patterns (Result type, client abstraction) adopted by other teams
Server-agnostic design enabling other teams to build support solutions using the same infrastructure

Sovrn Guide

Info

Tech Stack

Sovrn Guide

The Challenge

Architecture

Technical Decisions

RAG Over Fine-Tuning

Multi-Model Strategy

Multi-Agent Orchestration

MCP Integration

Functional Error Handling

Tech Stack

Observability

Security and Compliance

Role and Impact

More work

American Whitewater

Commerce Chrome Extension

Sovrn Commerce iOS