Model Lifecycle ≠ Container Lifecycle — Why Jozu Splits Them

Why prebuilt inference containers like NVIDIA NIMs miss the mark on lifecycle, control, and separation of concerns.

Jul 10, 2025

If you’ve played with NVIDIA NIMs, you’ve probably had the same thought i did: this is a strong step forward for getting models into production. It’s fast. It's GPU-optimized. It wraps up Triton, models, and runtime into a neat container. For many teams, that's enough.

At Jozu, though, we’re solving the same production inference problem—but from the perspective of platform and DevOps teams, not just model consumers.

🎯 The Problem We’re Solving: Infra Teams Need Control, Not Just Speed

NIMs are fantastic for what they do: help teams quickly run high-performance inference workloads on NVIDIA hardware. But many of the teams we work with—infra, platform, and SRE leads inside organizations scaling their AI infrastructure—need something more composable, secure, and lifecycle-aware. They need:

Full control over which model goes where and how it’s updated
Clear provenance and auditability across environments
The ability to decouple containers from models, just apps were decoupled from hardware

That’s why we built Jozu Rapid Inference Containers (RICs) —a packaging and runtime architecture that treats models as first-class, injectable artifacts, not frozen blobs inside prebuilt containers.

🧱 What Are Jozu RICs?

At a high level, RICs are base inference containers (e.g., using vLLM, Triton, llama.cpp, etc.) that dynamically inject the right model at pull-time using our open packaging format called ModelKits.

A ModelKit includes:

Model weights (e.g., GGUF, safetensors, ONNX)
Optional adapters (e.g., LoRA)
Inference config (Triton, vLLM, etc.)
Metadata, licenses, SBOMs, and attestations

🧠 Models ≠ Containers

Most model serving setups today — including NVIDIA NIMs — bundle the model directly into the container image. That might feel convenient, but it tightly couples what you serve (the model) with how you serve it (the container).

This creates real problems:

Promoting a model to production means building and shipping a new container.
Rolling back a model requires knowing which image it was embedded in.
You can’t track or attest to models independently of the containers they're wrapped in.

Jozu RICs break this coupling.

We treat the container as the runtime, and the model as a pluggable artifact, delivered just in time at pull.

This gives you major advantages:

✅ Run the same container across staging, production, or air-gapped environments — just inject different models.
✅ Promote, attest, or roll back individual models without touching containers or deployments.
✅ Automate model lifecycle using GitOps: model changes are observable, auditable, and versioned like application code.

Think of it like Helm charts for models—one runtime, many artifacts, full control.

⚖️ Where NIMs Fall Short—and What Jozu Solves Instead

To be fair — if you’re a product team at an enterprise shipping a known, stable model behind an API, NIMs might just work.

They’re great for:

Use cases that only run on NVIDIA GPUs
Scenarios with a few approved models reused broadly
Teams that prioritize convenience over lifecycle governance, customization, or traceability

We’re not here to replace NIMs for those teams.

But if you're asking:

How do I track where every model came from?
How do I promote a fine-tuned adapter without repacking a container?
How do I plug models into Kubernetes-native CI/CD?

Then you’re thinking the same way we are.

🔒 The Bigger Picture: Secure ModelOps

Model security isn’t just about scanning containers anymore. It’s about:

Provenance: Where did this model come from? Who trained it?
Reproducibility: Can I build it again, exactly?
Attestation: Can I prove no one tampered with it?

That’s what RICs and ModelKits are built for — and where we think the industry is headed.

✅ Want to Try It?

Jozu’s CLI (kit) is open source. You can:

Package your own models as ModelKits
Use RIC-compatible base images
Deploy via Kubernetes and GitOps

We’d love feedback — especially if you’ve used NIMs and hit their limits.

NVIDIA NIMs and Jozu RICs are both trying to solve a hard problem: how to bring models to production without chaos.

If you're looking for a simplified start with NVIDIA’s ecosystem, NIMs offer a convenient starting point — but with trade-offs in flexibility and control.

If you want lifecycle-aware, security-conscious, DevOps-aligned model delivery, Jozu RICs might be the alternative you didn’t know you needed.

The Software Maker

Discussion about this post