AI Model Registry

A Permanent Home
for Your AI Models

Weights scattered across S3 buckets, laptops, and forgotten repos? Model gives every model on your team a versioned, documented, discoverable address — and keeps it there.

See How It Works
Immutable versioning
Content-addressed
Stateless pull API

The numbers behind production-grade model management

0
GB Models Deduplicated
0
SDK Languages
0
Uptime SLA
0
Command to Pull Any Model
The Problem

Five Models In, Nobody Knows Which One Is In Production

The first model lives in a home directory. The second in a shared bucket with a README nobody updates. By the sixth, weights are scattered across so many places that a simple compliance question — which models are in use? — has no clean answer.

  • No single place to find which version is running in production
  • Versions drift silently — the wrong model stays in the inference path
  • Lineage lost — nobody knows which base model a fine-tune came from
  • Compliance asks for a model inventory; you don't have one
🗂
s3://ml-team/models/final_v3_USE_THIS.pt
Last modified 8 months ago · no metadata
LOST
💻
~/Desktop/model_prod_backup_copy.bin
Owner: Alex (left company) · unknown license
RISK
📁
github.com/acme/models-private (archived)
Version tag: "v2" · which v2?
STALE
⚙️
inference-cluster-01:/mnt/weights/model.onnx
In production · origin unknown
???
model.ms/acme/support-bot · v1.4.2
Versioned · signed · documented · discoverable
✓ MODEL
The Solution

A Registry Built Around
the Questions You Actually Ask

Model treats weights as first-class objects — not generic blobs. Every feature is shaped by what teams actually need to know about a model.

📌

Stable Addresses

Every model and version gets a permanent URL and a canonical pull command. The path from finding a model to running it is short by design.

🔐

Immutable Versions

Versions are content-addressed and signed. The model running in production today is provably the same one reviewed last quarter — no silent swaps.

🔗

Full Lineage Graph

Fine-tunes point back to base models. Quantized versions point to their originals. The whole ancestry is browsable and queryable — no archaeology required.

Features

Everything a Serious
Model Registry Needs

Built around the realities of how ML teams actually produce, deploy, and govern AI models at scale.

📋

Model Cards

First-class documentation for every model. Purpose, limitations, training data, metrics, and license — authored in markdown, rendered cleanly in the dashboard.

GOVERNANCE
🗄

Deduplication at Rest

Two fine-tunes that share 90% of their weights don't pay for that storage twice. Content-addressed deduplication runs automatically across all versions.

STORAGE
🔍

Discovery & Search

Search by name, task, license, framework, or free-form text. Public and private models in one catalog, with permissions clearly drawn between them.

DISCOVERY
🌐

CDN-Accelerated Downloads

Content served from a distribution layer close to major cloud regions. Pulling a 50 GB checkpoint takes minutes — not the better part of an hour.

PERFORMANCE
🔑

Granular Access Controls

Service accounts scoped to specific models and actions. The API key your inference cluster uses can't accidentally publish a new version it's only supposed to read.

SECURITY
📊

Evaluation Attachments

Benchmark results flow back from connected evaluation platforms and attach to the version that produced them — enriching metadata without duplicating it.

EVALS
📜

Audit Logs

Every read and every write captured. The trail of who used which model when is always available — ready for compliance review without extra tooling.

COMPLIANCE
🔄

Training Pipeline Integration

A successful training run publishes itself. A fine-tune that finishes pushes its version automatically with lineage, metadata, and metrics already attached.

AUTOMATION
🏢

Self-Hosted Edition

Same API, same SDKs, same dashboard — running entirely on your infrastructure. Full data residency for environments where it's non-negotiable.

ENTERPRISE
How It Works

Familiar Ergonomics.
Purpose-Built for Models.

If you've used a code host or package registry, Model will feel immediately familiar — because the ergonomic problems have already been solved well by source control.

1

Create a Model

Give it a name, owner, and license. A model card is generated automatically from the initial metadata.

2

Push the First Version

Push weights, tokenizer, config, and evaluation results in a single command. Each push is content-addressed and returns a stable identifier.

3

Iterate with New Versions

Fine-tune, quantize, merge — each result is a new immutable version with lineage automatically recorded. Old versions are never modified.

model push acme/support-bot
# Push a new version
$ model push acme/support-bot \
  --weights ./output/model.safetensors \
  --config ./config.json \
  --base llama-3-8b-instruct@v2.1.0 \
  --tag "v1.4.2"

# Output
✓ Uploading artifacts... 14.4 GB
✓ Deduplication saved 13.8 GB
✓ Lineage recorded from base
✓ Version signed & published

model.ms/acme/support-bot@v1.4.2
1

Reference by Name & Version

Use the stable model identifier from anywhere in your stack — training code, inference servers, evaluation harnesses.

2

Content-Addressed Pull

Files already cached locally aren't redownloaded. CDN acceleration serves the rest from the closest available region.

3

Serve Without Extra Plumbing

Inference platforms that integrate directly fetch weights once on boot — no extra configuration on the application side.

SDK pull — Python
import model

# Pull by name — latest or pinned version
m = model.pull("acme/support-bot@v1.4.2")

# Ready-to-run handle
tokenizer = m.load_tokenizer()
pipe = m.load_pipeline(
  task="text-generation",
  device="cuda"
)

# Cache hit — 0 bytes downloaded
Loaded from local cache 14.4 GB
1

Automatic Recording

Lineage is captured when one model is derived from another — no one has to remember to write it down.

2

Browse or Query

The full ancestry graph is browsable in the dashboard and queryable through the API — base models, fine-tunes, quantizations, and merges.

3

Answer Hard Questions

Which production models descend from a given base? Which fine-tunes share a common ancestor? What changed between last quarter's model and today's?

model lineage acme/support-bot
# Query lineage via API
$ model lineage acme/support-bot@v1.4.2

└─ llama-3-8b-instruct@v2.1.0 (base)
  ├─ acme/support-bot@v1.0.0
  ├─ acme/support-bot@v1.3.0
  │  └─ acme/support-bot@v1.3.1-q4 (quant)
  └─ acme/support-bot@v1.4.2 (current)

# 5 descendants in production
# Training run: run_20241102_a3f9
1

Team & Org-Level Access

Private models visible only to their owner org. Fine-grained controls decide who can read, write, and publish within that org.

2

Scoped Service Accounts

Inference cluster keys read only. Training pipeline keys can write but not delete. Each service account is scoped to exactly what it needs.

3

Review & Approval Flows

Require approval before a new version is promoted to production. Enterprises can enforce this for every model that touches customer data.

model acl set
# Create a scoped service account
$ model sa create inference-cluster \
  --model acme/support-bot \
  --actions read,pull

✓ Service account created
TOKEN: msk_inf_••••••••••••

# Cannot write or publish
$ model push acme/support-bot ...
403 Forbidden: write not permitted

# Audit log entry created automatically
Why Model

Four Distinctions That
Matter in Production

🧬

Models as Primary Objects

Every feature is designed around what teams actually ask about a model — not what you'd ask about a generic blob. That orientation shows up everywhere, from search to model cards.

🔒

Reproducibility Taken Seriously

Content-addressed, signed versions. Lineage from base models to fine-tunes. Evaluation results attached to the exact version that earned them. The answer to what's in production is always trustworthy.

🌐

Openness Without Compromise

Public community models alongside private fine-tunes, in the same catalog. Pulling a model out and using it elsewhere is one command. No lock-in, because there's no need for one.

⚙️

Operational Care in the Details

Storage deduplication, CDN-served downloads, and audit logs that cover every read and write — the parts most teams notice only when they break, built right from the start.

Lineage Graph — Live View
🌐
llama-3-8b-instruct@v2.1.0 BASE

Public · MIT · 14.2 GB · Meta

🔬
acme/support-bot@v1.0.0 FINE-TUNE

Private · run_20240601 · F1: 0.82

🔬
acme/support-bot@v1.4.2 FINE-TUNE

Private · run_20241102 · F1: 0.91 · PROD

acme/support-bot@v1.4.2-q4 QUANT

Private · INT4 · 3.6 GB · serving-only

Use Cases

Built for Teams That
Build on Models

From research labs to enterprise compliance teams, Model scales with the complexity of your model lifecycle.

🔬

Research Teams

Canonical home for the artifacts of research work. Public visibility for the parts worth sharing, private visibility for everything else. Every checkpoint from every experiment has a stable address — six months later, reproduction is one pull command away.

CheckpointsExperimentsPublications
⚙️

Platform & MLOps Teams

Source of truth for inference and training infrastructure. Replace the tangle of S3 buckets and ad-hoc scripts with a single registry that feeds both. Service accounts scoped exactly to what each system needs — nothing more.

InferenceTrainingAutomation
🏢

Enterprise & Compliance

Enforce review and approval workflows for every model that touches customer data. Audit trails that hold up to regulatory review. Know exactly which models are in use, who approved them, and when — without archaeology.

Audit LogsApprovalsGovernance
🧪

Heavy Fine-Tuning Workflows

Every fine-tune produces a new version with a clear pointer to its base model, attached metrics, and an automatic record of the training run. Six months later, when someone asks why the production model behaves the way it does, the answer is in the registry — not buried in Slack threads.

LineageMetricsVersioning
Integrations

Works With Your
Existing ML Stack

Clean HTTP API, Git-like CLI, and SDKs in every language your team uses. Integrates with the wider AI tooling landscape rather than competing with it.

🐍Python
🟨JavaScript
🐹Go
🦀Rust
⌨️CLI

Platform Connectors

Training Platforms
Inference Services
Evaluation Harnesses
HuggingFace Hub
PyTorch / ONNX
Weights & Biases
GitHub Actions
Webhook Pipelines
Self-Hosted Edition
Security & Compliance

Built for Regulated Environments

The operational features that make a registry safe to run inside a real enterprise — present from day one, not bolted on later.

✍️

Content-Addressed & Signed

Every version is cryptographically signed and content-addressed. The model in production is provably the one that was reviewed.

📜

Full Audit Trail

Every read and write logged. Who pulled which model, when, from where. Exportable to your SIEM or compliance platform.

🔑

Scoped Access Controls

Per-model, per-action service accounts. Inference keys can't publish. Training keys can't delete. Least-privilege by default.

🔄

Approval Workflows

Require review before any model version is promoted to production. Enforced at the registry level — not an honor system.

🏢

Self-Hosted for Air-Gap

Same API, same SDKs, same dashboard on your infrastructure. Docker, Kubernetes, and air-gapped deployments supported.

🛡

License Enforcement

License metadata is structured and queryable. Know the license of every model in use — and block pulls of ones that don't comply.

SOC 2 Type II
GDPR Ready
HIPAA Compatible
ISO 27001
Air-Gap Supported
Signed Artifacts
What Teams Say

Trusted by Teams That
Manage Models Seriously

★★★★★

We went from a compliance team question we couldn't answer — "which models are in production?" — to having that be a one-API-call question. Model paid for itself in the first week.

DK
David K.
Head of ML Platform, FinCo AI
★★★★★

The lineage graph is what sold us. When a support fine-tune started behaving strangely, we traced it back to a base model checkpoint with a data issue in under ten minutes. Previously that would have taken days.

SR
Sana R.
ML Engineer, Inference Labs
★★★★★

Deduplication alone made the switch worthwhile. We had six fine-tunes of the same base model stored separately. Model collapsed that to a single base plus diffs. Storage bill dropped immediately.

TW
Tom W.
Platform Lead, NLP Startup
FAQ

Common Questions

How does deduplication work across model versions?
+

Weights are stored as content-addressed objects using a rolling hash of the file contents. Two versions that share blocks — even if the files have different names — automatically share the underlying storage. A fine-tune that modifies 10% of a model's weights stores only that 10% as new data. Deduplication runs automatically at push time with no configuration needed.

Can I use Model alongside HuggingFace Hub?
+

Yes. Model is designed to integrate with the wider ecosystem rather than replace it. Public models from HuggingFace can be mirrored into your Model registry, and the CLI and SDK support pulling directly from HuggingFace identifiers. Teams typically use HuggingFace for community model discovery and Model for managing their own private fine-tunes and production artifacts.

What happens if I need to delete a model version?
+

Versions are immutable once published — they cannot be silently modified. You can archive a version, which removes it from the default search surface while preserving its artifacts and audit trail. Hard deletion is available for administrators on Enterprise plans and requires a two-person approval workflow by default. This is deliberate: a registry where versions can disappear without a trace is not a reliable source of truth.

How does lineage recording work for fine-tuned models?
+

When you push a fine-tune, you specify the base model identifier in the push command (or it's inferred automatically by connected training platforms). Model records this pointer as part of the version metadata and builds the lineage graph incrementally as new versions are pushed. Lineage is queryable through the API and browsable through the dashboard — you can traverse the full ancestry of any model and find all descendants of any base.

Is there a self-hosted option for regulated industries?
+

Yes. The self-hosted edition exposes the same API surface as the hosted service and works with the same SDKs, CLI, and dashboard. Weights never leave your infrastructure. Supported deployment targets include Docker, Kubernetes, and fully air-gapped environments. Self-hosted is available on Enterprise plans with deployment support included. The same deduplication, lineage, and access control features are available in both editions.

How fast are downloads for large model checkpoints?
+

Downloads are served from a content-distribution layer placed close to the major cloud regions (AWS, GCP, Azure, and their equivalents). A 50 GB checkpoint typically pulls in 3–8 minutes from within the same cloud region, and 15–25 minutes cross-region. The client is content-addressed, so files already cached locally are never re-downloaded. Enterprise plans can add reserved capacity for predictable latency on inference cluster boot.

Get Started Today

Give Your Models the
Home They Deserve.

Create an account, push your first model, and have a stable identifier in minutes. No configuration required to get started.

Explore Features