vert-suite autonomously co-optimises CPU and GPU power on bare-metal AI servers — achieving energy savings with no changes to your software, inference stack, or hardware.
Every number below is verified at the server plug using a state-of-the-art power analyser — not estimated, not simulated.
On a cutting-edge workstation GPU, each generated token carries a significant energy cost. With millions of tokens processed daily, that adds up fast. And without active optimisation, both CPU and GPU hardware are left running at full speed regardless of actual workload demand.
* All Watt-second readings measured at the server wall-socket. Test system: NVIDIA RTX Pro 6000 Blackwell (96 GB VRAM), 24-core workstation, 128 GB RAM, 2,050 W PSU.
A production-ready software platform for autonomous GPU/CPU efficiency orchestration — inference engine agnostic, with no hardware changes, no application modifications, and no manual tuning required.
Autonomous bare-metal GPU and CPU management for AI/LLM workloads. Empirically validated energy savings of over 30% per token — with no changes to your application or inference stack.
Specify your maximum acceptable throughput reduction. vert-suite automatically scans the GPU/CPU operating envelope and locks in the profile delivering the deepest energy savings within your constraint.
Reduce OpEx, extend platform lifespan, and lower your infrastructure's carbon footprint — without replacing or upgrading hardware.
Leverage eBPF for deep system observability with minimal operational disruption — complemented by physical out-of-band telemetry for independent power validation.
Least-privilege agents, just-in-time capabilities, and mTLS-secured control-plane traffic. Transparent telemetry with out-of-band validation for full auditability.
Inference engine agnostic — validated on vLLM and llama.cpp, and compatible with other leading engines. One inclusive library, cross-platform, cross-Linux/K8S distribution, deployable in minutes with no operators, scheduler extensions, or YAML modifications.
The industry standard is to hand over the keys to the kingdom — permanent, unconditional root access to your kernel, drivers, and hardware. You shouldn't have to compromise your cluster's security just to run your compute efficiently.
vert-suite is built on a strict principle of minimal authority. Instead of deploying a privileged monolith, the architecture is physically split — a standard unprivileged worker that requests just-in-time access only when needed, and only for the exact duration required.
Public key at a terminal — signature verification before any agent is admitted.
Main agent operates as a standard worker within an unprivileged boundary — CAP_SYS_RAWIO and CAP_SYS_ADMIN never granted permanently.
A Security Escort briefly unlocks the required resource, completes the exact task, and immediately relocks — no standing privileges.
All control-plane traffic is encrypted via mTLS tunnels — no plaintext communication between components.
Digital identity verification prevents any component from operating outside its authorised scope.
No YAML changes, no Kubernetes operators, no code modifications to your inference stack.
Tell vert-suite your maximum acceptable throughput reduction. It autonomously scans the full GPU/CPU operating envelope.
Real-time CPU/GPU co-optimisation locks in the deepest energy savings within your constraint.
vert-suite does not offer a fixed menu of profiles. It offers a continuous spectrum. Simply tell the software the maximum throughput reduction you can accept — say, no more than 10%, 15%, or 20% — and it automatically identifies the optimal GPU/CPU operational mode to deliver the deepest possible energy savings within that constraint.
Three example datapoints — any point on the spectrum is achievable
The software identifies the energy-saving profile that keeps throughput within 10% of baseline. Deep savings with near-transparent operational impact.
A marginal additional throughput budget unlocks significantly deeper energy reductions. Consistently the highest-value point on the spectrum across all tested models.
Accepts a larger throughput trade-off to push energy savings to their maximum. Ideal for cost-capped batch workloads where latency is not time-critical.
Five leading open-weight models. Three SLA profiles. All energy independently verified at the server wall-socket using a state-of-the-art power analyser — against unoptimised SotA baselines.
* All Watt-second readings measured at the server wall-socket on a single NVIDIA RTX Pro 6000 Blackwell GPU server.
Continuous, zero-intervention AI agent loops create a sustained, demanding inference load. We validated vert-suite in exactly this scenario — using Claude Code acting as an autonomous coding agent running non-stop research iterations on a dedicated GPU server.
Average UK commercial electricity rate: 25.5p/kWh · 24/7 continuous operation · Savings scale linearly with fleet size
Verticular has secured the 'Growth Catalyst Early Stage' grant to advance EPIC (Energy-efficient Processing for Intensive Computing) — using AI to dynamically manage both CPU and GPU power and cut energy costs for AI data centres without performance trade-offs.
We are proud members of the NVIDIA Inception program, giving us early access to the latest GPU ecosystem and enabling us to optimise AI workloads at the deepest hardware level.
Verticular was founded by two tech leaders who spent careers engineering at the frontier of wireless networks and AI systems — and who saw first-hand the cost and energy problem coming.
Our mission is to solve the dual problem of rising cloud costs and the massive energy footprint of AI — building deep, chip-level software that lowers OpEx, maximises hardware ROI, and meets sustainability goals without compromising SLAs.
Request a demo and we'll show you wall-Socket power measurements on a real GPU server — before and after vert-suite — so you can see the savings for yourself, not just take our word for it.