Secure Supply Chain with ModelKits Explained
This article is a guide on establishing a secure software supply chain using ModelKits. It is aimed at technical practitioners in AI/ML deployment pipelines who require effective methods for ensuring the authenticity and integrity of software artifacts. The document outlines the roles of attestations, provenance, and immutability in building a trusted pipeline and discusses best practices, compliance considerations, and emerging trends.
The Genesis: Attestations, Provenance, and Immutability
What Are Attestations?
An attestation is a verifiable piece of metadata that certifies specific properties or events related to an AI/ML artifact. This can include information about how the model was built, which datasets were used, how the dataset was collected, or simply the source.
Attestations help establish trust by providing a record of the processes and checks that the AI/ML artifacts have undergone. They offer multiple benefits:
Enhanced Security: Attestations help ensure that artifacts haven’t been tampered with and that they originate from a trusted source.
Auditability: They provide a clear, verifiable record of an artifact’s history, which is valuable for auditing and forensic investigations.
Compliance: In regulated industries, attestations can serve as evidence that an artifact meets required standards and policies.
Operational Confidence: By verifying attestations before deployment, organizations can have greater confidence that only approved and vetted artifacts are running in production.
What Is Provenance?
Provenance refers to the comprehensive record of an artifact’s origin, history, and the processes it has undergone from its creation to its current state. In the context of AI/ML artifacts (or other software artifacts), provenance may include details such as the origin of the model (for example, a hash or URL), the training process including the datasets used, lifecycle events such as validations and security checks, and the chain of custody.
The Relationship of Attestations and Provenance
Attestations and provenance are closely related, yet they serve distinct purposes within a secure supply chain. Provenance acts as a detailed record of the artifact’s lifecycle, capturing every significant step, transformation, or interaction that the artifact experiences. It provides a complete historical record, while attestations serve as formal, verifiable assertions about the artifact’s state or properties at a specific point in time. An attestation might assert that the artifact meets certain criteria or has passed specific tests.
Common Secure Supply Chains Scenarios
Rapid Verification: In many automated environments, verifying a signed attestation is quicker than reconstructing the entire provenance.
In-Depth Auditing: If a potential security issue arises, the provenance data can be reviewed in depth to understand the artifact’s full history.
Regulatory Compliance: Some standards require both a summary assertion (attestation) and a detailed audit trail (provenance) to meet strict compliance requirements.
What Is Immutability?
Immutability refers to the property of an artifact or data structure that prevents it from being changed after its creation. Immutability plays a crucial role in the effectiveness and trustworthiness of attestations and provenance.
Consistency Between Attestation and Artifact
When an artifact is immutable, it guarantees that the conditions and properties described in the attestation (such as build parameters, security scans, or provenance details) remain accurate over time. Since the content is fixed, a verifier can confidently check that a digest or hash matches the one recorded in the attestation. Any deviation indicates tampering or an inconsistency, leading to a clear rejection of the artifact.
Enhanced Security Posture
In a secure supply chain, immutability combined with a signed attestation creates a robust chain-of-custody. Each artifact remains exactly as it was when it passed through each stage of the pipeline, reducing the risk of undetected modifications.
Simplified Compliance and Audit Processes
Immutable artifacts, paired with their corresponding attestations, provide a consistent and verifiable audit trail. Auditors can rely on the fact that the artifact's properties haven’t changed since the attestation was issued. Many regulatory frameworks require strict traceability and non-repudiation of changes.
Operational Stability
Immutability ensures that only approved and vetted artifacts are running in production. If an issue arises, the assurance that the artifact is immutable allows operators to roll back to a previous, known-good state without worrying about hidden modifications.
How It All Works for ModelKits
Generation
In an AI/ML pipeline—whether during training, fine-tuning, or data extraction— the first step for secure packaging is creating a ModelKit. As the ModelKit is generated, an accompanying attestation can be produced that captures essential metadata.
The attestation metadata might include details such as the build environment, the results of scans, or the outcome of compliance tests. This attestation is often signed using cryptographic keys, ensuring that it comes from a trusted source and has not been tampered with.
How to Create Attestations for ModelKits
Package Artifacts as a ModelKit and Automate Its Generation.
Package models, datasets, code, and documentation to create an immutable package and store it on a secure OCI registry.
Tools: Use the
kit
CLI orpykitops
to automate creation of ModelKits on your pipelines.Automate: Include this step in your AI/ML pipelines so that every artifact that is a candidate for production is a ModelKit.
Define Essential Metadata
Determine which details are critical for your attestation. Common metadata elements include:
Build Environment: Information about the operating system, tools, and configuration used during the build.
Security and Compliance Checks: Results from vulnerability scans, static analysis, or other compliance tests.
Pipeline Steps: A record of the build and test stages executed.
Timestamps and Identifiers: The build date/time and unique identifiers (e.g., a hash or version tag) for the ModelKit.
Automate Attestation Generation
Incorporate attestation creation into your pipeline by automating the process:
Scripting and Tools: Use scripts or dedicated tools to automatically collect and format the metadata. (for example, output the data in JSON or YAML).
Integration: Embed this step within your existing CI/CD workflow (using systems like Jenkins, GitLab CI, or GitHub Actions) so that every build triggers an attestation generation.
Sign the Attestation
To ensure that the attestation is both authentic and tamper-evident:
Digital Signing: Use cryptographic keys to sign the attestation. This confirms that it was produced by a trusted source.
Key Management: Securely manage your private keys—tools like Cosign can help streamline signing and verification processes.
Storage and Association
The generated attestation is then associated with the ModelKit. This can be done by storing it in an OCI registry alongside the ModelKit or in an external attestation store. Tools and standards such as Cosign or Notary can be used to facilitate this association.
Verification
Before deploying a ModelKit, systems (like a Kubernetes admission controller or a CI/CD pipeline) can verify the attestation. Verification ensures that the ModelKit was built under the expected conditions, and that the metadata hasn’t been altered. It checks the digital signature against a trusted key. If the attestation is valid, it can be trusted to meet the security and compliance requirements. If not, the deployment can be halted or flagged for further review.
How to Create, Sign, Store, and Verify a Simple Attestation Using Cosign
These instructions assume that you have just created a ModelKit, tagged it with jozu.ml/myorg/mymodel:latest
, and pushed it to the OCI registry.
Generate a Key Pair with Cosign
Cosign makes it easy to generate a key pair. This command creates two files: a private key (cosign.key
) and a public key (cosign.pub
).
cosign generate-key-pair
# Consider storing keys in a secure location or HSM and set up key rotation policies.
Generate Attestation
Create an attestation file containing metadata about your ModelKit. For example:
ARTIFACT_ID="mymodel-123abc"
BUILD_ENV=$(uname -a)
TRAINING_PARAMS="learning_rate=0.01, epochs=50"
TIMESTAMP=$(date)
cat <<EOF > attestation.json
{
"artifact_id": "$ARTIFACT_ID",
"build_environment": "$BUILD_ENV",
"training_parameters": "$TRAINING_PARAMS",
"timestamp": "$TIMESTAMP"
}
EOF
This script generates an attestation.json
file containing some example metadata. It is recommended to use a well known attestation format in real-world cases.
Sign the Attestation and Attach It to Your ModelKit
cosign attest --key cosign.key --predicate attestation.json --registry-username=<registry_user> --registry-password=<registry_pass> jozu.ml/myorg/mymodel:latest
Verify the Attestation
To confirm that your attestation has been correctly signed and stored, use the verification command
cosign verify-attestation --key cosign.pub jozu.ml/myorg/mymodel:latest
This command retrieves the attestation from the OCI registry, verifies its digital signature using the public key, and displays the attestation details if the verification is successful
Standards and Formats for Attestations
A critical element of a secure supply chain (SSC) is the standardization of how attestations are formatted, signed, and verified. By adhering to industry standards, organizations ensure that the metadata accompanying artifacts is both interoperable and verifiable across different systems and tools. Below are some of the most prominent standards and formats used in the SSC landscape:
in‑toto: in‑toto is a framework designed to capture the complete supply chain of an artifact. It records every significant step—from code commit to artifact generation—in a detailed, cryptographically verifiable manner. The in‑toto format allows for complex workflows to be attested, ensuring that each stage of the build, test, and deployment process is recorded in an immutable log. This depth of detail is invaluable in environments where understanding the full history of an artifact is crucial for trust and compliance.
DSSE (Dead Simple Signing Envelope): DSSE provides a lightweight, standardized container for signing arbitrary JSON payloads. It is designed to encapsulate attestation data in a manner that is both simple and secure, making it easier to integrate into automated pipelines. DSSE’s simplicity and focus on digital signature management help maintain the integrity of the attestation, ensuring that any tampering can be quickly detected. Tools like Cosign use DSSE to wrap attestations, providing a consistent method for signing and verifying artifact metadata.
OCI Image Attestation Specification: In environments where artifacts are stored in OCI-compliant registries (e.g., container images), the OCI image attestation specification plays a vital role. This standard allows attestations to be stored alongside artifacts in the registry, linking metadata directly to the artifact via its digest. The format is tailored for containerized environments, ensuring that the provenance, security, and compliance of container images are verifiable before deployment.
Our example has not leveraged the in-toto framework but rather used a simple JSON document to better demonstrate the process. However, by leveraging these standards, organizations can create a robust framework for ensuring that every ModelKit or AI artifact is accompanied by trustworthy attestations. These formats not only facilitate secure signing and verification processes but also enhance interoperability between different tools and platforms, thereby strengthening the overall security and auditability of the software supply chain.
Alignment with Regulatory Standards
Having explored the core technical mechanisms behind ModelKits—from generating secure artifacts and signing attestations to ensuring immutable storage—we now turn our focus to a critical aspect of secure supply chains: regulatory compliance. In today’s complex security landscape, aligning with frameworks like NIST, ISO, and GDPR is not only beneficial but essential. The following section delves into how ModelKits meet these standards, providing concrete examples and a comparative overview of key compliance features.
Ensuring regulatory compliance is critical for organizations leveraging ModelKits in their secure supply chain. By aligning with established frameworks such as NIST, ISO, and industry-specific standards, ModelKits not only help protect critical assets but also streamline audit processes and support forensic investigations.
NIST SP 800-161 (Supply Chain Risk Management):
ModelKits incorporate cryptographic signatures and immutable logs to ensure traceability and non-repudiation. These features support NIST requirements for tracking and verifying the origin and integrity of artifacts. The comprehensive provenance data maintained throughout the pipeline also aids in risk assessment and incident response.ISO/IEC 27001 (Information Security Management):
With built-in mechanisms for secure artifact storage, digital signing, and continuous verification, ModelKits support ISO/IEC 27001 objectives by ensuring that only approved, tamper-evident artifacts are deployed. The structured metadata and audit trails facilitate regular security assessments and help demonstrate compliance during certification audits.GDPR (General Data Protection Regulation):
Although GDPR primarily focuses on data privacy, ModelKits can play a role in ensuring that data handling processes are auditable and that any data transformations or access events in the AI/ML pipelines are properly logged. This helps organizations enforce data protection policies and manage consent-related requirements.Industry-Specific Regulations:
For sectors such as finance, healthcare, or critical infrastructure, ModelKits can be tailored to capture compliance-specific metadata (e.g., financial audit trails or patient data access logs), thus meeting stricter regulatory requirements.
By integrating compliance considerations into every stage—from artifact generation to verification—ModelKits provide a solution for organizations looking to meet diverse regulatory requirements. The use of standardized attestations and immutable storage not only fortifies the security of the software supply chain but also offers clear, verifiable audit trails that are essential for compliance with NIST, ISO/IEC 27001, GDPR, and other regulatory frameworks.