Photo-Driven Router Setup Assistant on Gemini Enterprise

Gemini Enterprise Photo-Driven Router Setup Assistant

AI/ML

About the Task

Built a Gemini Enterprise assistant that reads a router photo — model, ports, cabling, LEDs — and returns grounded fixes

results

Setup time cut from 25–40 min to 10–15 min; self-install completion up from 60–70% to 80–90%

results

Support calls down from 180–250 to 80–120 per 1,000 installs; first-contact resolution up to 75–85%

Services used

Build Strategy

Build Product

Web Development

The table of content

Overview

Setting up a new router is one of the few tasks customers are expected to handle entirely on their own — often while their internet is down and their patience is running out. They face a box of cables, a device covered in look-alike ports, and a row of blinking lights that mean nothing to most people. The usual help falls short: a printed manual assumes you know the terminology, and a chat window can't see the device in front of you. The customer is left describing a visual problem in words, and the agent is left guessing.

This use case describes a different approach, built on Gemini Enterprise: instead of asking customers to explain what they see, the assistant lets them show it. The result is a faster setup for the customer and a lighter support operation for the operator — and the sections that follow walk through how it works end to end.

‍

Business Challenge

Router setup is fundamentally a visual task, but the tools built to support it are text-first — and that mismatch is the root cause of the friction.

When a customer runs into trouble, they have to translate what they see — port assignments, label markings, LED colors and blink patterns, which cable goes where — into words they often don't have. Most people don't know a WAN port from a LAN port, or that a slow blink means something different from a steady light. So the description that reaches a manual, a search box, or a chatbot is vague and often wrong, and the guidance that comes back is generic. Text-only support makes this worse because it is blind to device state: it can only respond to what the customer types, which is exactly the part they struggle to get right.

The cost shows up across the whole setup funnel:

Setup time: 25–40 minutes for a first-time install.
Unaided completion: only 60–70% of customers finish without contacting support.
Support load: 180–250 calls per 1,000 installs, many of them repeat contacts.
Truck rolls: a non-trivial share escalate to an on-site technician visit — the most expensive outcome — often for trivial faults like a WAN cable in the wrong port.

Each is a consequence of the same gap: the customer can see the problem but can't describe it, and the support channel can't see anything at all. Closing that gap means letting the assistant observe the device directly.

‍

Solution

Photo in, guided fix out — that is the whole interaction model. Built on Gemini Enterprise, the assistant uses multimodal reasoning to interpret an ordinary phone photo the way a skilled technician would. From a single image it extracts the device model, port layout, label markings, cabling, and LED states. It then grounds that analysis in the operator's own knowledge — product manuals, SOPs, firmware notes, the model-specific LED matrix, and a known-issues database — and validates the device against the operator's registry. The output is not a generic article but a step-by-step instruction tailored to the exact device and problem it can see.

The flow follows a consistent pattern:

Capture — the customer consents, then photographs the router (back panel for setup, front panel for connectivity issues).
Identify — Gemini recognizes the model, ports, cabling, and labels, with a confidence score.
Ground — the assistant retrieves operator-specific guides and validates against the device registry.
Guide — it generates clear, personalized setup or troubleshooting steps.
Escalate when needed — if confidence is low or the issue can't be resolved, it requests a clearer photo or hands a structured case to a human agent.

Two design choices make this dependable rather than just clever. First, the assistant checks its own confidence: a blurry or badly angled photo triggers a re-capture request instead of a wrong answer. Second, it never guesses beyond what it can support — low confidence becomes a clarifying question or a human handoff, not a fabricated fix.

‍

‍

User Journey

‍

From the customer's side, it feels like a guided conversation, not a technical procedure — the assistant does the interpretation in the background.

Starting out. The customer opens the app, selects "Set up my router," and grants explicit consent for camera use and diagnostic processing.
Taking the photo. They photograph the router's back panel; the image uploads securely and Gemini identifies the model, port labels, cables, and markings.
Confidence check. If the photo is blurry, dark, or badly angled, the assistant asks for a clearer shot instead of guessing.
Personalized guidance. Once confident, it validates the device against the registry, pulls the operator's setup and activation flow for that exact model, and generates step-by-step instructions tailored to what it sees.
Connectivity issues. If the internet still isn't working, it requests a photo or short video of the front-panel lights, then reads the LED state and blink pattern against the model's LED matrix and firmware notes to pinpoint the cause.
Resolution or handoff. It either confirms success and closes the session, or opens a ticket with a structured diagnostic summary so a human inherits full context instead of starting from scratch.
Closing the loop. Every outcome is logged as anonymized metadata that feeds product and operational improvement.

The throughline is graceful escalation: at each stage the assistant advances the customer, asks for exactly what it needs, or hands off cleanly — it never fails silently.

‍

Technology Architecture

This section describes the platform end to end — how a request enters, how the system is partitioned, how the agents reason, how the two main flows run, and how data is protected and deployed.

‍

Request & Intake Path

‍

Before any reasoning happens, every request passes through a layered intake path that handles where it came from, who the customer is, and how the image reaches the platform.

Channels. Requests enter from the telecom mobile app, a web support portal, messaging channels (WhatsApp, KakaoTalk, RCS), and an internal support-agent console — all feeding the same backend.
Edge and gateway. Traffic hits a hardened public edge (Cloud Load Balancing with a CDN, fronted by Cloud Armor as a WAF), then passes through Apigee / Cloud API Gateway, which enforces OAuth2 / OIDC and API-key checks.
Identity and consent capture. Authentication resolves against the operator's existing identity providers (Firebase Auth, telecom SSO, or customer IAM), and a dedicated Consent Capture Service records authorization as a discrete, auditable event.
Image upload. Images never flow through the general request path: a Cloud Run service issues a short-lived, single-use signed upload URL, and the device writes the photo directly to a dedicated Cloud Storage bucket for temporary router images.

Together, these layers turn a request from any channel into a clean, authenticated, consented session with an image in a known location.

‍

Enterprise Architecture

‍

The system is divided into distinct trust zones, each with a clear responsibility and governed boundaries between them. Nothing crosses from one zone to the next without passing through an explicit interface.

Customer zone — the device and channels; untrusted by default.
Public edge and API protection — the hardened perimeter; the only zone exposed to the public internet.
Application zone — stateless services that orchestrate sessions and business logic; holds no customer data at rest.
Enterprise and operator systems — CRM, device registry, telecom provisioning, warranty, ticketing/field service; reached only through controlled integrations.
AI and agent zone — Gemini Enterprise, Vertex AI models, and the custom diagnostic agents; isolated so model behavior can be governed independently.
Data and knowledge zone — temporary image storage, grounded knowledge stores, and the analytics warehouse.
Operations and security zone — IAM, network controls, audit logging, and monitoring.

The benefit is containment: a request from the untrusted customer zone can reach sensitive systems only through a chain of explicit boundaries, so a problem in one zone doesn't propagate to the others.

‍

‍

Agent Orchestration

‍

The reasoning layer is a pipeline of specialized agents, each with a narrow job, coordinated by Gemini Enterprise as orchestrator; Vertex AI Gemini serves inference and the Vertex AI Agent Engine hosts the custom diagnostic agents.

Entry gate. The Policy and Safety Agent screens every request for unsafe instructions and confirms data-handling permissions before any diagnosis runs.
Parallel analysis. Three specialists examine the photo simultaneously: the Visual Device Identification Agent (model, revision, serial/QR), the Cable Validation Agent (cabling faults), and the LED Diagnostic Agent (indicator states).
Fusion and gating. A Diagnostic Fusion Step combines the three results, followed by a confidence and risk check — low confidence routes back for a clearer photo or human review; high confidence proceeds.
Grounding and response. The Knowledge Retrieval Agent pulls the relevant manuals, SOPs, LED matrix entries, and known issues, and the Grounded Response Generator turns the diagnosis plus that evidence into customer-facing steps. A Final Safety and Compliance Check screens the output before the customer ever sees it.
Escalation path. When human help is warranted, the CRM Escalation Agent and Human Handover Agent assemble a structured summary and route it into ticketing, CRM, or field service.
Learning. The Analytics Agent records the outcome for downstream analysis.

The principle throughout is decomposition with control: many narrow agents, a fusion step to reconcile them, confidence gating, and safety checks guarding both the entrance and the exit.

‍

Diagnostic Flows

‍

Two flows cover the majority of traffic, each choreographed so every system is called at the right moment with the least data necessary.

‍

First-time setup sequence

‍

Authenticate the customer and capture consent, then create a session and return a session ID.
Issue a signed upload URL; the device uploads the image, raising an image-uploaded event that notifies the orchestrator.
The orchestrator sends the prompt, context, and image reference to Gemini, which returns a structured JSON result (model, ports, labels, cabling, confidence).
Validate the serial or model against the device registry.
Retrieve grounding content — manual, LED matrix, operator SOP.
Check line and device activation via telecom provisioning when needed.
Return a recommended action with a confidence score, and store anonymized metadata.
Branch to the outcome: confirm a successful setup, or open a ticket carrying the diagnostic summary; record the final outcome.

‍

‍

LED diagnostic flow

‍

When a customer reports that the internet still isn't working, the system shifts to fault diagnosis driven by the front-panel lights.

Gemini analyzes the visible LEDs from the front-panel photo.
If a still image can't capture the state — for example, a blink — the assistant requests a five-second video and analyzes the pattern.
It classifies the LED status, retrieves the model-specific LED matrix, and compares against firmware notes and known issues.
The comparison resolves to a category with a specific action: power off/booting → safe power-on; WAN disconnected → reseat the WAN cable; authentication failed → check the telecom activation API; firmware update in progress → don't power off; Wi-Fi disabled → re-enable Wi-Fi; hardware fault suspected → escalate to replacement.
The outcome is stored.

Reading the blink pattern, not just the color, is what separates ambiguous states from definite ones — and the "do not power off during a firmware update" branch alone prevents one of the most common ways a customer bricks a new device.

‍

‍

Security & Privacy Controls

‍

A router photo is sensitive data — it can contain a serial number, a MAC address, a QR code, and an incidental view of the customer's home — so controls apply at every stage. Privacy here is enforced by the architecture, not left to policy.

On the way in. Every session is anchored by an explicit consent record, and all traffic travels under TLS.
Encryption (CMEK). Data at rest is encrypted with Cloud KMS using customer-managed keys.
Detection and redaction. Cloud DLP scans for PII, MAC addresses, and serial numbers, masking or redacting them before they propagate.
Access control. IAM and RBAC restrict every service and person to the minimum access required.
Network isolation. VPC Service Controls fence the sensitive zones even if credentials are compromised.
Model protection. Model Armor screens for prompt injection and unsafe content before and after generation.
Auditability. Audit logs and Access Transparency record who and what touched the data.
Automatic deletion. Temporary images are deleted within 24–72 hours by lifecycle policy.
Anonymized analytics and minimum-necessary handoff. Only anonymized metadata reaches analytics, and escalations carry only the data required to act, through a separate restricted support environment.

The principle is consistent: collect the minimum, protect it in depth while in use, and retire it on a schedule.

‍

‍

Data Model

‍

The data model is organized around one central entity — the diagnostic session — with every other record linked back to it, giving the operator a complete, queryable record without retaining more than necessary.

DIAGNOSTIC_SESSION carries the identifying and outcome fields: a session_id key, a hashed customer_id, operator_id, country, language, consent_timestamp, and the flags that capture how the session ended (human_escalation, image_deleted, outcome). Six entities reference it via session_id:

DEVICE — model, hardware revision, firmware, masked serial, hashed MAC, batch.
IMAGE_ANALYSIS — confidence scores, detected ports and LEDs, cable status, image quality.
DIAGNOSIS — category, root cause, recommended action, resolved flag, confidence score.
KNOWLEDGE_REFERENCE — the IDs of the grounding sources used (manual, firmware note, LED matrix, SOP, known issue).
ESCALATION_TICKET — present only when a case is handed to a human.
ANALYTICS_EVENT — the anonymized signals that feed analytics.

The design is auditable (every diagnosis links to the knowledge that grounded it), privacy-preserving by default (IDs hashed, serials masked at the schema level), and analytics-ready (confidence, categories, and outcomes captured as structured fields).

‍

‍

Deployment Topology

‍

Single Korea production region, built almost entirely from managed, independently scalable Google Cloud services:

Cloud Run (stateless app tier): Session API, Image Intake API, Prompt Builder, Diagnostic Orchestrator, Case Integration API, Feedback Collector.
AI & agents: Vertex AI Gemini (inference) and Vertex AI Agent Engine (custom ADK agents); grounding from Gemini Enterprise data stores.
Messaging, state & secrets: Cloud Tasks, Pub/Sub, Secret Manager.
Data: BigQuery (anonymized metadata), Cloud Storage (temporary image bucket), Cloud KMS + Cloud DLP (encryption and sensitive-data scanning).
External systems: CRM, ServiceNow / Zendesk / Salesforce, Field Service, Device Registry, Telecom Provisioning, Warranty — via controlled integrations.
Restricted support env: Support Agent Console with its own RBAC and a Redaction Service.
Observability: Cloud Logging, Cloud Monitoring, Looker dashboards.

‍

Results and Impact

Measured Impact

‍

Four levers, shown as before → after against the original baselines:

Setup time: 25–40 min → 10–15 min.
Self-install completion: 60–70% → 80–90% — fewer abandoned activations and returns.
Support calls / 1,000 installs: 180–250 → 80–120 — frees capacity for complex issues.
First-contact resolution: 55–65% → 75–85% — cases arrive pre-diagnosed with full context.
Cabling solved without an agent: target 70%+.
Product intelligence: surfaces confusing layouts, firmware failure patterns, and setup-driven returns — compounding over time.

Together this makes self-service the fastest, cheapest, most reliable path — and increasingly capable as the feedback loop matures.

‍

Conclusion

The router is the example, but the pattern is the point. The same approach fits any product a customer must set up or troubleshoot on their own — appliances, medical devices, networking gear, smart-home and industrial hardware. Wherever there's a gap between what a person can see and what they can describe, a multimodal assistant that observes the device directly closes it.

Stripped to essentials, the reusable loop is:

Multimodal capture — the customer shows the device with a photo or short video instead of describing it.
Grounded diagnosis — interpret it against the manufacturer's own knowledge and validate against systems of record.
Confidence-gated guidance — return a fix only when sure enough; otherwise ask for clearer input or defer.
Structured escalation — hand a fully documented case to a human rather than starting from zero.

What makes it deployable is the discipline around it: consent before processing, sensitive data masked and encrypted under operator keys, inputs retired on schedule, minimum-necessary handoffs, and an anonymized analytics loop. The knowledge sources, diagnosis categories, and constraints change by industry, but the architecture doesn't — the same template re-points at new hardware without a redesign. The router assistant is one instance of a general capability: turning a customer's camera into a diagnostic instrument, and trusted enterprise knowledge into the expertise that reads it.

‍

Advanced micro-stop analytics with ML-assisted classification and root-cause insights

AI-powered platform for automated CO₂ accounting and ESG reporting

AI/ML

Web Development

Cross-recipe: Energy vs Quality Analysis

A data-driven system optimized veneer press energy usage while maintaining product quality.

AI/ML

IoT

Feedforward Press Correction

A leading engineered wood manufacturer implemented a predictive press control system powered by data and machine learning.

AI/ML

IoT

Web Development

“Bad-Sheet” Routing

Automated system for detecting and routing defective veneer sheets using real-time sensor data and analytics.

AI/ML

IoT

Early Fan Failure Detection

Plant A deployed an on-prem predictive maintenance system for fans, reducing unplanned downtime by 38%.

AI/ML

Predictive Hydraulic Filter Change

Predictive maintenance system for hydraulic filters reduced downtime and optimized maintenance scheduling in a large industrial plant.

IoT

AI/ML

Infinity Technologies in PetTech

A smart genetic testing platform that helps pet owners and breeders easily access and understand their pets’ DNA insights through a single digital solution.

AI/ML

IoT

Mobile Development

Intelligent Budgeting: How AI-Powered Financial Planning Transforms Business Strategy

A case study on how intelligent budgeting transformed financial planning, decision-making, and organizational agility.

AI/ML

CRM/ERP

Smarter Product Management Through Interactive Constructors and Real-Time Analytics

An interactive, analytics-powered product constructor enabled smarter pricing, faster product decisions, and improved profitability across a complex portfolio.

AI/ML

CRM/ERP

Predictive analytics model for early cardiovascular risk detection using non-invasive population data.

AI/ML

CRM/ERP

Smart Fraud Detection: How Predictive Analytics is Reshaping Social Welfare Systems

A public agency used predictive analytics to overhaul fraud detection in social welfare distribution.

AI/ML

Risk-Based Oversight of Social Benefits: Catching Fraud Without Hiring More Staff

Case study: shifting from random checks to risk-based fraud detection in social benefits.

AI/ML

Predicting Employee Turnover: How Data Turns Retention into a Strategy

This article explores how predictive analytics is transforming employee retention from a reactive process into a strategic advantage.

AI/ML

From 4 Months to 30 Minutes: The New Speed of Credit Scoring

A bank cut credit model time from four months to 30 minutes by automating risk assessment for corporate clients.

AI/ML

Nova Poshta: AI-Powered Warehouse Monitoring for Conveyor Systems

Infinity Technologies Builds Real-Time Load Balancing and Bottleneck Detection for Ukraine’s Largest Logistics Operator

AI/ML

CRM/ERP

IoT

Web Development

Tell us about your project needs

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Roman Reznikov

CEO & Partner