Monorepos for AI Projects: The Good, the Bad, and the Ugly
Recently, I met a team that used an AI‑focused monorepo containing everything: notebooks, training pipelines, microservices, and infrastructure code. In this post I share my observations on how their data scientists, engineers, and DevOps teams collaborated, and where things broke down. I also explore how KitOps helped introduce structure at a critical point: the transition from experimentation to production.
✅ The Good
When a monorepo works, it works because the alignment and velocity benefits outweigh any drawbacks. Here is what I noticed:
Shared context: Everyone, from data scientists to platform engineers, had visibility into the same repository, which fostered fast collaboration and fewer misunderstandings.
Fast iteration: A data scientist could tweak a model and then message an engineer to wire it up to an API within the same codebase.
Unified CI/CD: Teams could run pipelines for end‑to‑end tests, integrate model‑training jobs into GitHub Actions, and deploy inference microservices using the same scripts.
❌ The Bad
This setup had major flaws, some of them critical to production readiness.
No model provenance: Models trained in notebooks were often dumped into S3 buckets with ad‑hoc names. There was no versioning or traceability. The teams included their names in the model filenames, but that practice did not age well.
Reproducibility gaps: Because experiments were often driven from notebooks, they lacked pinned dependencies or runtime configuration. Rerunning a past experiment was, at best, guesswork.
Security blind spots: With no SBOMs or attestations, the security team had no idea what was running in production, creating a compliance risk.
😬 The Ugly
Some things technically “worked,” but only through tribal knowledge, individual heroics, or duct‑taped workflows.
Manual model handoffs: Data scientists pinged infrastructure engineers on Slack with pointers to model files. There was no formalized way to package a model.
Inconsistent naming conventions: Some model folders were named
teamXXX_final_modelv2
, while others used names likemodelXX_2024_05_19
. Pipelines frequently broke when a model name changed or a new model appeared.Overloaded CI pipelines: A single Git push could retrigger training, redeploy the inference container, and run unrelated tests. The infrastructure was brittle because the monorepo lacked boundaries between experimentation and production.
Blurred ownership: When a model in production failed, nobody knew whether to call the data scientist, ML engineer, or platform SRE. The repository did not encode accountability.
🧰 How KitOps Helped
KitOps introduced structure at the artifact level without forcing the team to refactor the entire repository.
Clear handoff via ModelKit artifacts
Data scientists used thekit
CLI andpykitops
to export trained models as self‑contained, versioned ModelKits that included:Weights
Metadata (input and output schema)
Optional model cards as README files
Runtime dependencies (for example, tokenizers, configuration files, and sometimes Python code)
These kits became immutable units that downstream teams could trust.
Decoupled training and inference
ModelKits were pushed to an OCI‑compatible registry where inference microservices could pull them at runtime. Training scripts no longer needed to be bundled into deployment images. The same model could be pulled into staging, production, or offline evaluation environments with confidence, allowing platform engineers to treat inference containers as cattle rather than pets.Auditability and compliance
The team did not yet add SBOMs to ModelKits, but they recorded the monorepo SHA as an attestation with which each ModelKit created. This practice gave the security team visibility into what was running in production and where it came from, easing a key compliance bottleneck.Standardization without a repo rewrite
The team adopted one simple convention: if a model is going to production, it must be exported as a ModelKit. That rule turned Git chaos into structured deployment boundaries.
Monorepos are a double‑edged sword. Their collaboration benefits are impressive, but scaling them, especially in AI and ML systems, requires discipline.
KitOps did not “fix” the monorepo; arguably, it did not need fixing. Instead, it created clean seams where they mattered most: at the handoff between teams and in the lifecycle from experimentation to production. That was enough.