Skip to main content

365UI Case Studies

Delivered systems, not demo scripts.

These case studies summarize delivered or validated 365UI capabilities across private knowledge Q&A, million-scale retrieval, production optimization, and workflow automation agents.

S&P 500

enterprise production environment

1M+ / 778K+

vectors and documents

137s → 6.6s

retrieval latency

12,000+

automated fixes

01 / AI-native BMC autonomous vulnerability discovery & validation (latest)

BMC Red-Team Lab

232 skills7 assessment lanesOpenBMC upstream confirmedFail-Closed Governor

Challenge

In the era of fully-automated AI lock-picks (Mythos and its successors), enterprises can no longer rely on once-a-year scans or one-shot external pentests. BMCs — the most sensitive, lowest-level server management surface — demand a 24/7, self-hosted, self-scheduled red-team loop. Generic agent shells ("OpenClaw + raw LLM") aimed at production BMCs are unguarded bombs on physical hardware.

Execution focus

  • 232 specialized red-team skills routed across 7 deep assessment lanes — Unauth-DAST, Web UI deep parsing, Redfish privilege mapping, IPMI / OEM command audit, iKVM remote desktop, supply-chain SBOM, and the OpenBMC / libpldm collaboration track.
  • ADR 0005 multi-model Council architecture: a Hunter (Claude Opus 4.7, Thinking) and a Skeptic (GPT-5.5, Extra Reasoning) debate every candidate, and a deterministic-code Governor makes the final Fail-Closed call — neutralizing single-model hallucination and prompt-injection contamination.
  • Proof Ladder: every finding is promoted step by step — static candidate → deployed daemon reachability → controlled lab reproduction → exploitability confirmation. No "AI memo for executives" passes as a security conclusion; every high / critical finding ships with a replayable evidence pack.
  • ADR 0002 purple closed-loop discipline: every offensive primitive auto-generates a paired Sigma detection rule that lands in the SIEM in parallel with remediation. Dangerous phases require explicit --allow-* approval + watchdog preflight / after-action + a global PANIC halt; confirmed vulnerabilities trigger an automatic coordinated-disclosure timer.
  • Field result: the OpenBMC `libpldm decode_get_types_resp()` report progressed from a source-level OOB read to deployed pldmd reachability, controlled fake MCTP peer-path evidence, and a candidate fix shape — the OpenBMC security team responded and indicated they will address further issues in the area. The same completion-code-first pattern extends to FRU / BIOS / Platform / Firmware Update / IBM OEM decoder hardening candidates.

Impact

A real "engineered red team" that counters Mythos-class AI 0-day tooling without depending on any external 0-day brokerOpenBMC upstream security thread engaged — the evidence chain has moved into fix collaborationEvery offensive primitive has a paired Sigma defensive rule, with the purple closed-loop landing in the SIEM

02 / S&P 500 enterprise environment

Enterprise Private AI Assistant Platform

S&P 500ProductionMulti-tenant

Challenge

The customer needed a controlled AI assistant over web content, SharePoint, PDFs, Office files, and multi-tenant configuration without sending private data into a generic chat tool.

Execution focus

  • Built a multi-source ingestion and document parsing pipeline into a unified searchable data layer.
  • Supported multi-tenant configuration and zero-code deployment patterns across industries.
  • Preserved private deployment, permission control, tracing, and evaluation for long-running production operations.

Impact

Running in productionSupports multi-tenant architectureReusable zero-code deployment path

03 / Enterprise knowledge Q&A

Million-Scale High-Accuracy Retrieval

1M+ vectors778K+ documents8 tools

Challenge

Keyword search could not answer complex enterprise knowledge questions, while vector-only retrieval created recall gaps and weak evidence.

Execution focus

  • Combined semantic retrieval, keyword matching, and RRF fusion over 1M+ vectors and 778K+ documents.
  • Added multi-stage reranking, metadata filters, parent/sibling expansion, and multi-hop reasoning.
  • Used agents to coordinate database queries, web search, and other tools so static documents and live data could share one answer path.

Impact

Improved recall quality for complex questionsGrounded answers in traceable evidenceEnabled multi-tool coordination

04 / AI platform engineering

Production Retrieval Pipeline Optimization

98.9% accuracy95% latency reduction260x import speed

Challenge

The early RAG pipeline had high latency and insufficient import throughput, blocking realistic enterprise production usage.

Execution focus

  • Optimized retrieval, reranking, context expansion, chunking, and resource scheduling.
  • Removed unnecessary calls and improved data import concurrency and batching.
  • Established repeatable evaluations so accuracy, latency, and throughput could be tracked over time.

Impact

Reduced response latency from 137 seconds to 6.6 secondsImproved import throughput by 260xReached 98.9% enterprise benchmark accuracy

05 / Code quality and recruiting operations

Workflow Automation Agents

12,000+ fixesJD parsingReview gate

Challenge

The team had large volumes of repetitive expert work that needed automation speed while retaining human review and auditability.

Execution focus

  • Code-Fix Agent identified, repaired, and verified code quality issues at scale.
  • AI Recruiter extracted screening criteria from job descriptions and generated candidate shortlists.
  • Kept automated actions and human confirmation in the same workflow to avoid uncontrolled black-box execution.

Impact

Resolved 12,000+ code quality issuesAutomated JD screening criteria extractionPreserved review gates and audit trails

Want to know if your workflow is a good pilot?

Bring one business workflow, one real dataset, and one success metric. We can start with a 30-minute diagnosis before deciding whether a pilot is worth building.

Discuss a similar project