From MinIO to Garage: Benchmark, Migration and Production Feedback

From MinIO to Garage: Benchmark, Migration and Production Feedback

How we benchmarked Garage against MinIO, migrated a production data lake with zero downtime, and what we learned after a month of coexistence and two weeks as the system of record.

From theory to production

Since February 2026, the public minio/minio repository has been in archive mode and the vendor is pushing users toward AIStor, a proprietary product whose entry ticket runs into the tens of thousands of euros. The question is no longer whether to migrate; it's where to go, and how to do it cleanly.

We laid out the topic on paper in our previous article ("Self-Hosted Object Storage: Understanding Alternatives to MinIO"). Since then, we have moved to practice on a client's infrastructure — a French B2B telephony reporting and monitoring software vendor. The path was progressive:

  • Late March: full benchmark in a dedicated sandbox, on realistic Parquet workloads.
  • Early April: deployment of Garage in parallel with MinIO on the client's K3s cluster, in cohabiting read/write mode.
  • May 7: cutover — Garage becomes the system of record, MinIO switches to a read-only D-1 mirror.
  • Today: about two weeks in production, zero intervention, full acceptance testing complete.

Here is the field report, step by step.

This article is intended for any team considering a migration or a technology choice for a self-hosted S3 data lake. The numbers and methodology are reproducible; the benchmark code is public on github.com/DataKhiProject/benchmark-garage.

Step 1 — The benchmark: what we measured

Before committing to a migration, we needed numbers. We built a reproducible test bench — the code is public on github.com/DataKhiProject/benchmark-garage — with a simple goal: compare Garage v2.2.0 and MinIO RELEASE.2025-09-07 on workloads close to those of an analytics data lake.

Methodology

  • 43,860 Parquet files representing ~5 GB, organized into 60 fictitious "clients" over 731 days (2023–2024). This reproduces the typical structure of a data lake partitioned by client / date.
  • 14 queries covering three families: point reads (Q1–Q4), wide reads (Q5–Q9), glob listing (Q10), and unit or concurrent writes (W1–W4).
  • 5 runs per query plus an uncounted warmup.
  • Two topologies tested: 3-node cluster (Garage replication_factor=2, MinIO erasure coding 3+3) and single-node (Garage RF=1, MinIO standalone).
  • Client side: DuckDB (httpfs) for Parquet reads, boto3 for writes, behind an nginx load balancer. Measured bandwidth: 940 Mbits/sec.
  • Infrastructure: three Proxmox VMs (Xeon E5-2683v3, 8 vCPU / 8 GB RAM / HDD disk), Docker containers capped at 2 CPU and 4 GB RAM each to simulate a "small budget" constraint.

Results — 3-node cluster

Workload MinIO (p50) Garage (p50) Garage gain
Small file read (Q1) 904 ms 148 ms 6.1×
Heavy client / 1 month read (Q4) 1,620 ms 760 ms 2.1×
2-year read / 730 files (Q5) 33,463 ms 21,388 ms 1.6×
Glob 43,860 files (Q10) 21,137 ms 4,577 ms 4.6×
20-row write (W1) 750 ms 101 ms 7.4×
10 concurrent workers write (W3) 1,887 ms 201 ms 9.4×
  • Reads: Garage 1.6× to 6.1× faster
  • Writes: Garage 3× to 9.4× faster
  • Listing: Garage 4.6× faster
  • Observed container CPU: MinIO 82–124% on average, Garage 13–25% — a factor of 5 to 6 in Garage's favor

Results — single-node

The single-node setup flattens the cluster's spectacular gains (less parallelization), but Garage remains consistently ahead:

Workload MinIO (p50) Garage (p50) Garage gain
Point read (Q1) 935 ms 469 ms 2.0×
2-year read (Q5) 22,348 ms 18,526 ms 1.2×
Glob 43,860 files (Q10) 11,039 ms 5,269 ms 2.1×
Unit write (W1) 83 ms 30 ms 2.8×
10,000-row write (W2) 1,667 ms 456 ms 3.7×

Average CPU: MinIO 57%, Garage 20% — about 3× less for Garage.

On the scope of these numbers

The test bench ran on a configuration representative of an economical cluster or a shared K3s (HDD, 2 CPU / 4 GB containers). On bare-metal NVMe with more RAM, the gap in pure reads will narrow; however, the CPU gains and concurrent writes stem from the architecture (CRDT vs stateful erasure coding) and remain structural. In other words: Garage ahead, everywhere — the magnitude varies with hardware.

The surprising resilience test

We stopped a cluster node during a heavy read (Q5, 7.3 million rows). First observation: MinIO continues to respond via the load balancer, Garage returns 502 Bad Gateway.

A bad mark for Garage? No. By taking the nginx LB out of the picture and querying any Garage node still standing directly, the request goes through: Garage uses internal redirection between nodes, which assumes the topology isn't masked behind a classic LB. Once the access pattern is adjusted, 100% of consistency is preserved (the expected 43,860 files are returned, in 162 seconds without error).

Takeaway: Garage doesn't have the same proxy model as MinIO. Configuring a 3-node infrastructure behind a "classic" nginx breaks resilience. It's documented, but easy to miss.

Step 2 — Why we chose Garage

Beyond the numbers, three factors weighed in:

  1. CPU / memory footprint divided by 3 to 6. On modest bare-metal machines, this is what makes the difference between a node that can host the rest of the stack (orchestrator, database, monitoring) and a node dedicated solely to storage.
  2. CRDT model on the Garage side (vs stateful erasure coding on MinIO's side). Simpler to reason about, more tolerant of asymmetric failures, and particularly suited to a geo-distributed multi-datacenter deployment.
  3. Sovereignty and license. Garage is led by Deuxfleurs (a French/European collective), AGPLv3, with no contractual surprises. On a "customer data storage" topic in B2B, this is an argument our prospects hear.

For the record: we also evaluated SeaweedFS and Ceph. SeaweedFS targets very large filesystem-oriented volumes, and Ceph remains operationally disproportionate for a deployment of this size. Garage also wins on a point that matters in B2B: a community and maintainer anchored in Europe, with whom direct dialogue remains possible.

The only real downside we identified: Garage is younger, its ecosystem (third-party clients, monitoring tools) is less extensive. This shows mainly on the observability side — you have to improvise.

Step 3 — Parallel deployment with MinIO (April)

Rather than switching overnight, we deployed Garage in coexistence on the client's K3s. For one month (April 8 → May 7), MinIO remained the system of record, but we had:

  • A Garage v2.2.0 node active in the namespace, with its 4 mirror buckets fed by rclone.
  • Test scripts (DuckDB httpfs, boto3, MinIO client) querying both backends in parallel.
  • Downstream SFTP tools (report delivery to end customers) connected indifferently to one or the other.

This phase allowed us to:

  • Validate SigV4 compliance of the various S3 clients in the codebase (Airflow, Python ETL, DuckDB, SFTPGo). We discovered during this phase that DuckDB and boto3 required an explicit region (region='garage') that MinIO accepted by default. Found in sandbox, fixed in sandbox.
  • Measure the real drift between the two backends over weeks: zero unexplained divergence.
  • Test the rollback pattern before needing it.

This coexistence phase is what really de-risked the cutover. By the time we switched, we had almost no unknowns left.

Step 4 — The cutover (May 7)

The client is a French software vendor specializing in telephony reporting and monitoring for SMBs and mid-sized enterprises. Data processed: ~70 end-customer organizations, several hundred thousand calls per day, aggregated in an S3 data lake then loaded into a star-schema PostgreSQL for Power BI dashboards. Volume at migration time: ~1.8 GB of Parquet across 4 buckets (raw, staging, debug, reports).

Approach: coexistence then switchover

We wanted zero downtime and instant rollback if any sign of instability appeared. The chosen sequence:

  1. Final re-sync (the day before): full rclone MinIO → Garage. 105,588 files / 2.0 GiB validated identical across the 4 buckets.
  2. Code switchover: a single Variable.set("minio_secret_key", garage_key) on the Airflow side and an endpoint change in the configs. Variable name preserved for zero code churn — the trick allowed us to leave every script untouched.
  3. Cutover: execution in a nighttime window, restart of Airflow components, observation of the first daily ETL runs.
  4. Reverse mirror: a Kubernetes CronJob now syncs Garage → MinIO every evening at 6 PM UTC. MinIO becomes a read-only D-1 mirror, kept for ultra-fast rollback during the first weeks.

Duration of the cutover itself: about two hours, with the bulk on the re-sync + restart window.

The only notable incident

On the first post-cutover run, some S3 requests failed with a SigV4 signature error. The explicit region was missing on an S3 client we hadn't exercised in the sandbox (the code path for monthly report generation, which didn't run in pre-prod). A targeted commit fixed it in under thirty minutes — typically the kind of friction a benchmark doesn't reveal and that a well-managed coexistence surfaces early.

Step 5 — Two weeks as the system of record

About two weeks after the cutover, we ran full acceptance testing before decommissioning MinIO. Here are the indicators:

Indicator Result
Airflow DAG runs (daily phases 1 and 2) 100% success, zero human intervention
S3/SigV4 errors in scheduler logs 0 over the entire period
Garage ↔ MinIO mirror consistency (rclone check) 99.6% to 100% depending on bucket — the gaps cover only D-1 (the mirror runs at 6 PM UTC, normal)
Historical files (pre-cutover) bit-for-bit identical between Garage and MinIO, verified on sample
Business tests (deduplication rules, FK, status distribution) all pass (35+ checks)

The Garage → MinIO mirror CronJob has now been running for several consecutive nights in under 17 minutes per run. The Garage stack runs on a single K3s node, consumes a fraction of what MinIO used in RAM, and has not needed the slightest intervention.

What the benchmark didn't say, but production revealed

  • On DuckDB reads in daily partition (the typical reporting data lake use case), the difference is not perceived by the end user. Both solutions are fast enough that user time is dominated by dashboard rendering, not by S3.
  • On concurrent writes during a backfill (reprocessing several months of history in parallel on heavy clients), this is where Garage made the difference: no observable contention, where MinIO was starting to show latency spikes in pre-prod.
  • On the return-to-stability cycle after a pod restart, Garage is significantly faster to take over. Uninteresting in normal running, lifesaving in operations.
  • On the observability side, Garage is more discreet than MinIO (no rich pre-integrated UI). We hooked up a third-party Garage WebUI, but it's less "out-of-the-box".

Business side: what does it concretely change?

Beyond the latency curves and CPU graphs, the migration brings tangible effects on the operations and commercial sides:

  • Lighter server footprint: the K3s node now dedicates a fraction of its resources to storage, where MinIO captured a large share. This frees up CPU/RAM budget for the rest of the stack (orchestrator, database, monitoring) without changing machines. In the medium term, this is concrete headroom to add clients or use cases without reinvesting.
  • Reversibility regained: the new MinIO license carried a slow but real dependency risk. Garage under AGPLv3, backed by a European collective, closes this concern for good. It becomes possible to formally document the license chain and governance in an RFP response or a security review.
  • Credibility on sovereignty: for a B2B SaaS vendor whose end customers are French SMBs and mid-sized enterprises subject to data protection requirements, being able to present a 100% open source and France-hosted stack is a real commercial argument — and a facilitator on the client IT side.
  • Clear resilience posture: the Garage → MinIO D-1 mirror, and eventually a multi-zone geo-distributed cluster, provide a continuity plan that MinIO alone (on a single node) did not deliver, without doubling the cloud bill.

These effects are rarely visible in a demo, but they matter over time. They are what makes a technically successful migration also a commercially useful migration.

What about larger volumes?

Fair question: our 1.8 GB in production is modest. Three elements frame the answer for anyone wanting to scale to several TB:

  • Architecture designed for horizontal scale: every Garage node added proportionally increases capacity and throughput, without costly rebalancing. The CRDT model avoids the contention points typical of stateful erasure coding.
  • Glob test (Q10) on 43,860 files: this is the operation that hurts an object storage system as the namespace grows. Garage is 4.6× faster than MinIO on our infra — a gap that stems from the listing algorithm, not the cache, and that mechanically holds as the namespace grows.
  • At TB scale, the main variable stops being the S3 engine and becomes the disk subsystem (HDD vs SSD vs NVMe, RAID, cache). Garage running in async event-driven mode keeps I/O cost more contained than a MinIO that has to maintain its erasure bands in strict consistency.

Garage is designed to scale; the benchmark confirms it at small scale, production confirms stability. A multi-zone geo-distributed target remains accessible on a moderate bare-metal budget for several usable TB, making it a realistic solution for a vendor wanting to control its infrastructure end to end.

Known limits and best practices

For anyone wanting to reproduce the approach, here are some pitfalls to know:

  1. No "dumb" application load balancer in front of Garage in cluster mode. Favor either direct multi-endpoint access on the client side, or a proxy that respects internal redirection.
  2. Single-node = SPOF. Single-node Garage brings CPU gain and simplicity, but you need a solid backup plan (D-1 mirror, LMDB snapshot, or the host's Backup Agent). In our case, the read-only coexistence with MinIO acts as a safety net during the consolidation phase.
  3. Explicit SigV4 region in all S3 clients: region='garage' or equivalent. Indispensable for DuckDB, boto3, and any tool that strictly enforces SigV4 signing.
  4. Airflow variables: keep the original names (here minio_secret_key) to avoid a code overhaul. Store the MinIO key as backup (minio_secret_key_minio_backup) for instant rollback.
  5. Inter-datacenter writes: if deploying a multi-zone geo-distributed cluster, synchronous cross-DC writes are slower than on LAN. Anticipate this for hot-path jobs.
  6. Long coexistence phase (at least 2–4 weeks): this is what makes the migration profitable. Skipping this step transfers the risk to the cutover.

In summary

Point Verdict
Raw performance Garage ahead, 2× to 9× depending on workload
Resource footprint 3× to 6× less CPU on Garage
Resilience OK on both sides provided you respect the Garage model (direct access)
Sovereignty Garage AGPLv3, European governance
Ecosystem maturity MinIO ahead, but Garage is sufficient for most cases
Migration Feasible in a short cycle (benchmark March → coexistence April → cutover May), with no downtime
Economic profile Compatible with a modest bare-metal budget, from single-node to geo-distributed cluster

For analytics data lake workloads, multi-client, where performance, cost control and European compliance are all sought, Garage is our benchmark. The benchmark confirmed the intuition, the coexistence de-risked the implementation, production validates the decision — two weeks without incident, on a real workload, with zero intervention.

Are you running a self-hosted data lake project, a MinIO migration, or a sovereign S3 deployment? Datakhi supports its clients across the full chain — technology choice, benchmark on their data, progressive migration, operations. Let's talk.

Further reading