Knowledgebase

Zero‑Downtime Node.js Deployments with PM2 & Git Hooks (Cluster Mode, Atomic Releases, Rollbacks)

Learn a battle‑tested, zero‑downtime deployment workflow for Node.js using PM2 (cluster mode) and Git hooks. Includes atomic releases, health checks, rollbacks, log rotation, and CI/CD‑ready patterns.

✍️ Short Summary

Ship updates with no downtime using PM2 + Git hooks. You’ll push to a remote bare repo, a post‑receive hook builds a new atomic release, swaps a current → releases/<rev> symlink, and triggers pm2 reload (cluster mode) for truly seamless restarts. Includes rollback, health checks, log rotation, and security hardening.

📎 Table of Contents (auto‑generated if long)

Why Zero‑Downtime & How This Works
Architecture Overview
Prerequisites
Directory Layout (Atomic Releases)
Option A: Git bare repo + post‑receive hook (recommended)
PM2 Configuration (ecosystem file)
Health Checks & Readiness Gates
Log Rotation & Observability
Rollback Strategy (Instant)
Optional: Using PM2 Deploy & CI/CD
NGINX/Caddy Reverse Proxy (snippet)
Security Hardening
Troubleshooting Checklist
✅ Conclusion / Next Steps
🔗 Related Articles

Note: Replace real domains with example.com. Avoid panel‑specific steps; this guide is platform‑agnostic.

🟢 Start Here — What is PM2?

PM2 is a production process manager for Node.js. It keeps your app running 24/7, scales it across CPU cores, and performs zero‑downtime reloads so users never see an outage during deploys. PM2 also standardizes logs, environment variables, and startup on reboot—perfect for single‑server or small‑cluster setups.

🎯 What PM2 Does (at a glance)

Keep‑alive & auto‑restart: If your app crashes, PM2 restarts it instantly.
Scale across cores (cluster mode): Spawn N workers behind an internal load balancer.
Zero‑downtime reloads: pm2 reload replaces workers one‑by‑one without dropping connections.
Unified logs & rotation: View live logs and add rotation with pm2‑logrotate.
Startup on boot: Generate a systemd unit and persist process lists.
Config as code: One ecosystem.config.js captures all runtime settings.

🚫 What PM2 Is Not

Not a web server/reverse proxy. Use NGINX or Caddy for TLS, HTTP/2, compression, static files, and routing.
Not a build or CI tool. Pair with your Git hooks/CI pipeline to build artifacts and trigger reloads.
Not an orchestrator. For many hosts/containers, consider Kubernetes or a PaaS.

🤔 When to Use (and When Not To)

Use PM2 when you manage Node.js on your own VM and want a simple, reliable path to scaling + zero‑downtime deploys.
Consider alternatives if you’re already on a full container/orchestration stack (Kubernetes), or a PaaS (e.g., Render/Fly/Heroku) that handles process supervision for you.

⚡ 2‑Minute Quick Start

npm i -g pm2
pm2 start server.js --name example-app                 # run once
pm2 start server.js --name example-app -i max          # cluster across all cores
pm2 reload example-app                                 # zero‑downtime reload
pm2 startup && pm2 save                                # start on reboot
pm2 logs example-app                                   # live logs

🧩 Key Terms

Fork vs Cluster: Fork = 1 process. Cluster = N workers on the same port with zero‑downtime reloads.
Reload vs Restart: Reload is rolling replacement (no downtime, cluster). Restart stops then starts (brief blip).
Instances: Number of worker processes (e.g., 2, 4, or max).
Ecosystem file: ecosystem.config.js—your app’s runtime config (env, logs, scaling, timeouts).
Sticky sessions: Needed for WebSockets/Socket.IO so clients stick to one worker.

🧠 Mental Model (Simple Diagram)

Browser → NGINX/Caddy → PM2 (LB) → [Worker 1, Worker 2, ...]
                                 ↳ pm2 reload swaps workers one‑by‑one

❓FAQ

Do I still need NGINX/Caddy? Yes—terminate TLS, serve static files, and proxy to the app.
Where are logs? pm2 logs (live). Configure rotation via pm2 install pm2-logrotate.
TypeScript? Build to JS for prod (tsc). Source maps via --enable-source-maps.
WebSockets? Start with --sticky to maintain session affinity.

🚀 Why Zero‑Downtime & How This Works

Goal: Users never feel upgrades. Technique: PM2 runs your app in cluster mode (multiple processes). A push triggers a new release build and then pm2 reload performs a graceful, rolling restart—old workers drain traffic while new ones boot.

Core guarantees:

Atomic releases: A new, versioned folder is built and switched in one step.
Reproducible builds: Use npm ci and a clean releases/<rev> directory.
Safe rollbacks: Repoint current symlink to a previous good release.

🧭 Architecture Overview

Developer → git push → [Server: bare repo]
                         └─ post‑receive hook → create releases/<rev>
                                               → npm ci && build
                                               → update symlink current → releases/<rev>
                                               → pm2 startOrReload ecosystem.config.js --env production
                                               → health check & verify

Screenshots (placeholders):

[Screenshot] PM2 dashboard after reload (pm2 status).
[Screenshot] Release directories and current symlink.
[Screenshot] /healthz endpoint returning 200 OK.

✅ Prerequisites

Linux server with Node.js 18+ and Git 2.4+.
PM2 globally installed: npm i -g pm2.
A dedicated non‑root user (e.g., deploy).
Reverse proxy (NGINX/Caddy) pointing to your app’s port (e.g., 3000).
Environment variables available on the server (via .env or PM2 envs). Do not commit secrets.

Tip: Ensure the user running Git hooks has node and pm2 in PATH. If you use a version manager (e.g., nvm), export PATH inside the hook.

🗂️ Directory Layout (Atomic Releases)

/var/www/example-app
├─ repo.git/            # bare repo (server remote)
├─ releases/            # timestamped or <rev> folders
├─ shared/              # .env, uploads/, tmp/, etc.
├─ current → releases/<rev>   # symlink switched atomically
└─ ecosystem.config.js

🧩 Option A: Git Bare Repo + post‑receive Hook (Recommended)

This option is simple, fast, and CI/CD‑ready without extra services.

1) Create folders & permissions

sudo mkdir -p /var/www/example-app/{releases,shared}
sudo mkdir -p /var/www/example-app/repo.git
sudo chown -R deploy:deploy /var/www/example-app

2) Initialize bare repo (server)

cd /var/www/example-app/repo.git
git init --bare

Add this server as a remote in your local repo, e.g.:

# on your laptop/workstation
git remote add production deploy@server:/var/www/example-app/repo.git

3) post‑receive hook (server)

Create hooks/post-receive in the bare repo and make it executable.

cat > /var/www/example-app/repo.git/hooks/post-receive <<'HOOK'
#!/usr/bin/env bash
set -euo pipefail

APP_DIR=/var/www/example-app
RELEASES_DIR="$APP_DIR/releases"
CURRENT_LINK="$APP_DIR/current"
ECOSYSTEM="$APP_DIR/ecosystem.config.js"
SHARED_DIR="$APP_DIR/shared"

# derive revision id and a release folder
read oldrev newrev ref
REV=$(echo "$newrev" | cut -c1-7)
RELEASE="$RELEASES_DIR/$REV"

# ensure PATH for node/pm2 (adjust if using nvm/asdf)
export PATH=/usr/local/bin:/usr/bin:$PATH

mkdir -p "$RELEASE"
GIT_WORK_TREE="$RELEASE" git checkout -f "$newrev"

cd "$RELEASE"
# install deps (production only); switch to `pnpm i --frozen-lockfile --prod` if using pnpm
npm ci --omit=dev

# build step if required (React/Next/Nuxt/etc.)
if [ -f package.json ] && jq -er '.scripts.build' package.json >/dev/null 2>&1; then
  npm run build
fi

# link shared assets (env, uploads, etc.)
if [ -f "$SHARED_DIR/.env" ]; then
  ln -sfn "$SHARED_DIR/.env" "$RELEASE/.env"
fi

# atomic switch
ln -sfn "$RELEASE" "$CURRENT_LINK"

# start or reload via PM2 (zero‑downtime in cluster mode)
if pm2 list | grep -q "example-app"; then
  pm2 startOrReload "$ECOSYSTEM" --env production
else
  pm2 start "$ECOSYSTEM" --env production
fi

# optional: health check
curl -fsS http://127.0.0.1:3000/healthz >/dev/null && echo "Health OK" || { echo "Health FAILED"; exit 1; }

# persist pm2 process list on reboot
pm2 save
HOOK
chmod +x /var/www/example-app/repo.git/hooks/post-receive

Why atomic? If build fails, the current symlink is untouched, so the live app stays healthy.

4) First push

git push production main

The hook builds a release, switches current, and performs a graceful pm2 reload.

⚙️ PM2 Configuration (ecosystem.config.js)

Create /var/www/example-app/ecosystem.config.js:

module.exports = {
  apps: [
    {
      name: "example-app",
      script: "./server.js",          // your entry file
      exec_mode: "cluster",           // enables zero‑downtime reloads
      instances: "max",               // or a fixed number like 2 or 4
      watch: false,
      max_memory_restart: "512M",
      env: {
        NODE_ENV: "production",
        PORT: 3000
      },
      env_production: {
        NODE_ENV: "production"
      },
      kill_timeout: 5000,              // allow workers to drain
      listen_timeout: 8000,            // readiness window
      out_file: "/var/log/pm2/example-app.out.log",
      error_file: "/var/log/pm2/example-app.err.log",
      merge_logs: true
    }
  ]
};

Important: Zero‑downtime requires cluster mode (multiple instances) or a blue‑green strategy. In fork mode (single process), reload behaves like a restart (brief interruption).

Persist PM2 on reboot:

pm2 startup
pm2 save

🩺 Health Checks & Readiness Gates

Expose a simple health endpoint (e.g., Express):

app.get("/healthz", (req, res) => {
  res.status(200).json({ ok: true, uptime: process.uptime() });
});

Readiness tips:

Defer accepting traffic until DB connections are ready.
Use listen_timeout + kill_timeout in PM2 to gracefully replace workers.
With HTTP proxies, configure active health checks to the current target.

📊 Log Rotation & Observability

Install PM2’s logrotate module:

pm2 install pm2-logrotate
pm2 set pm2-logrotate:max_size 10M
pm2 set pm2-logrotate:retain 7
pm2 set pm2-logrotate:compress true
pm2 set pm2-logrotate:dateFormat YYYY-MM-DD_HH-mm-ss

Live insights:

pm2 status
pm2 logs example-app
pm2 monit

⏪ Rollback Strategy (Instant)

List available releases:

ls -1 /var/www/example-app/releases

Switch back atomically and reload:

cd /var/www/example-app
ln -sfn releases/<good_rev> current
pm2 reload ecosystem.config.js --env production

Keep the last N releases (e.g., 5) and prune older ones via a nightly cron.

🔄 Optional: Using PM2 Deploy & CI/CD

PM2 has a built‑in deploy system that can pull from Git and run post-deploy commands. It’s CI/CD‑friendly and uses your ecosystem file. Example snippet:

module.exports = {
  apps: [ /* ...same as above... */ ],
  deploy: {
    production: {
      user: "deploy",
      host: ["server"],
      ref: "origin/main",
      repo: "git@your-vcs:org/example-app.git",
      path: "/var/www/example-app",
      "post-deploy": "npm ci --omit=dev && npm run build && pm2 startOrReload ecosystem.config.js --env production"
    }
  }
}

Works great with CI. Use environment‑specific secrets from your runner’s vault; avoid committing secrets.

🌐 Reverse Proxy (NGINX / Caddy)

NGINX example:

upstream example_app {
  server 127.0.0.1:3000;
}
server {
  server_name example.com;
  location / {
    proxy_pass http://example_app;
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
  }
}

Caddy example:

example.com {
  encode gzip
  reverse_proxy 127.0.0.1:3000
}

Ensure your proxy health checks point at /healthz.

🔐 Security Hardening

Create a locked‑down deploy user; limit sudo.
Store .env in shared/ with strict permissions.
Only allow fast‑forward pushes; protect main branch in your VCS.
Ensure hooks are executable and owned by the deploy user.
Keep Node/PM2 updated; enable OS auto‑security updates.

🧰 Troubleshooting Checklist

Hook didn’t fire? File not executable (chmod +x hooks/post-receive).
node: command not found in hook? Export PATH to Node/PM2; avoid relying on an interactive shell.
Operation must be run in a work tree? Use GIT_WORK_TREE="$RELEASE" git checkout -f "$newrev".
Build succeeds but app fails? Check pm2 logs and ensure env vars are linked from shared/.env.
No zero‑downtime? Confirm exec_mode: "cluster" and instances >= 2.
Memory leaks / crashes? Set max_memory_restart, add monitoring, profile hotspots.

🧠 PM2 Deep Dive — Concepts, Options & Best Practices

Drop this section into your existing article to give readers everything they need to operate PM2 confidently in production.

🔍 What PM2 Is (and Isn’t)

Process manager for Node.js: starts, keeps alive, restarts on failure, and reloads with zero downtime in cluster mode.
Not a build tool or CI server—pair it with your Git hooks/CI to ship artifacts.

🧪 Install, Update, and Verify

npm i -g pm2@latest
pm2 -v
pm2 update             # refresh PM2 runtime + agent without losing processes

🧬 Process Models

Fork mode: 1 process. Simple, but reload behaves like a restart (tiny blip). Suitable for workers/queues.
Cluster mode: N processes share the same port. pm2 reload replaces workers one by one → zero‑downtime web apps.
- Choose instances: "max" for all CPU cores or a fixed number (e.g., 2 or 4).
- WebSockets/Socket.IO? Use sticky sessions to keep a client on the same worker:
```
pm2 start ecosystem.config.js --sticky
```

🔄 Reload Semantics & Graceful Lifecycle

pm2 reload <name>: rolling replacement (cluster mode). Old worker drains, new worker boots.
pm2 restart <name>: stop then start (brief interruption).
pm2 stop <name>: take the app offline.

Graceful readiness (recommended):

App signals it’s ready using PM2’s wait_ready mechanism.
PM2 waits up to listen_timeout for the ready signal before routing traffic.

Ecosystem snippet:

{
  name: "example-app",
  script: "server.js",
  exec_mode: "cluster",
  instances: 2,
  wait_ready: true,       // app will call process.send('ready')
  listen_timeout: 8000,   // how long PM2 waits for 'ready'
  kill_timeout: 5000      // how long to let old worker drain
}

App code:

const http = require('http');
const server = http.createServer(handler);

server.listen(3000, () => {
  if (process.send) process.send('ready');
});

process.on('SIGINT', gracefulExit);
process.on('SIGTERM', gracefulExit);

function gracefulExit(){
  server.close(() => process.exit(0)); // finish inflight reqs
  setTimeout(() => process.exit(1), 8000); // hard timeout
}

🧾 Ecosystem File — Common Options (Cheat Sheet)

module.exports = {
  apps: [{
    name: "example-app",
    script: "./server.js",
    args: "",                 // extra CLI args to your script
    exec_mode: "cluster",     // or "fork"
    instances: "max",         // or a number
    cwd: "/var/www/example-app/current", // working dir
    watch: false,              // change to true ONLY for dev
    ignore_watch: ["node_modules", "logs", "tmp"],
    max_memory_restart: "512M",
    min_uptime: "10s",        // consider app unstable before this
    max_restarts: 10,          // cap restarts for flapping apps
    exp_backoff_restart_delay: 200, // ms; grows exponentially
    env: { NODE_ENV: "production", PORT: 3000 },
    env_production: { NODE_ENV: "production" },
    out_file: "/var/log/pm2/example-app.out.log",
    error_file: "/var/log/pm2/example-app.err.log",
    merge_logs: true,
    log_date_format: "YYYY-MM-DD HH:mm:ss Z",
    node_args: "--enable-source-maps --max-old-space-size=512",
  }]
}

🔑 Environment & Secrets

Prefer PM2 env vars (env, env_production) for non‑secret config.
For secrets, keep a .env in shared/ and symlink it into each release (as shown in your Git hook).
If you use dotenv, load it at the top of your entry file.

💾 Startup on Reboot & State Persistence

pm2 startup systemd -u deploy --hp /home/deploy   # generate unit
pm2 save                                         # persist process list
pm2 resurrect                                    # restore from dump
pm2 unstartup systemd                            # remove integration

🪵 Logs & Rotation

Live tail: pm2 logs <name> or all apps: pm2 logs.

Install rotation once:

pm2 install pm2-logrotate
pm2 set pm2-logrotate:max_size 10M
pm2 set pm2-logrotate:retain 7
pm2 set pm2-logrotate:compress true
pm2 set pm2-logrotate:dateFormat YYYY-MM-DD_HH-mm-ss

🧰 Everyday Commands

pm2 list                        # overview
pm2 status                      # same with more details
pm2 describe example-app        # full metadata
pm2 reload example-app          # zero‑downtime reload
pm2 restart example-app         # hard restart
pm2 stop example-app            # stop
pm2 delete example-app          # remove from PM2
pm2 env 0                       # show env of app with id 0
pm2 monit                       # ncurses dashboard

📈 Observability & Metrics

Add /healthz and /readyz endpoints; wire your reverse proxy health checks.
Use pm2 monit for CPU/mem; export app metrics to Prometheus (e.g., prom-client) and include GC stats.

🧱 Blue‑Green with PM2 (Alternative Pattern)

Run two app names (e.g., app-blue, app-green) on distinct ports.
Deploy to the idle color, smoke‑test /healthz, then flip the reverse proxy upstream.
Keep the previous color for instant rollback.

🧵 TypeScript & Source Maps

Prefer building to JS during deploy (tsc -p .) and run built files.
If running TS directly, use ts-node in dev only; in prod, JS is safer.
Enable stack traces with original lines:
```
node_args: "--enable-source-maps"
```

🌐 Sticky Sessions for Realtime Apps

For Socket.IO/WebSockets in cluster mode, start with sticky routing:
```
pm2 start ecosystem.config.js --sticky
```
Ensure your reverse proxy is a TCP pass‑through to PM2’s LB (or terminate TLS then proxy to the app port).

🧯 Advanced Troubleshooting

node: command not found in hooks: export PATH in the hook; avoid relying on interactive shells or nvm being sourced.
Reload isn’t zero‑downtime: verify exec_mode: "cluster" and instances >= 2; check wait_ready/listen_timeout logic.
Flapping restarts: tune min_uptime, max_restarts, exp_backoff_restart_delay; inspect pm2 logs.
Memory pressure: increase --max-old-space-size, fix leaks, or scale instances.
File watchers in prod: keep watch: false to avoid CPU spikes.

✅ Quick PM2 Checklist

Cluster mode with ≥2 instances
wait_ready, listen_timeout, kill_timeout set
Log rotation configured
pm2 startup + pm2 save in place
Health checks wired in proxy & CI smoke tests
Rollback plan tested (previous release kept)

✅ Conclusion / Next Steps

With PM2 cluster mode + atomic releases via Git hooks, you get guaranteed, zero‑downtime deploys, clean rollbacks, and predictable builds. Next, wire this into your CI pipeline, add synthetic monitoring for /healthz, and automate release pruning.

Quick next steps:

Create the bare repo and post‑receive hook.
Add the PM2 ecosystem file with cluster mode.
Push to production and verify pm2 status + /healthz.
Schedule log rotation and release cleanup.

🔗 Related Articles (suggested)

PM2 Deep Dive: Cluster Mode, Reloads, and Graceful Shutdowns
Git Hooks 101: pre‑receive, post‑receive, and Secure Deployment Patterns
Managing Secrets: .env vs environment vaults in CI/CD
Observability Stack for Node.js (logs, metrics, traces)
Blue‑Green vs Rolling Deploys: When to Choose What

🧩 Appendix – Full Example Files

A) `ecosystem.config.js`

module.exports = {
  apps: [
    {
      name: "example-app",
      script: "./server.js",
      exec_mode: "cluster",
      instances: 2,
      watch: false,
      max_memory_restart: "512M",
      env: { NODE_ENV: "production", PORT: 3000 },
      env_production: { NODE_ENV: "production" },
      kill_timeout: 5000,
      listen_timeout: 8000,
      out_file: "/var/log/pm2/example-app.out.log",
      error_file: "/var/log/pm2/example-app.err.log",
      merge_logs: true
    }
  ]
};

B) `hooks/post-receive`

(From the section above; ensure chmod +x.)

C) Example Express server with health endpoint

const express = require("express");
const app = express();

app.get("/healthz", (req, res) => res.json({ ok: true }));

app.get("/", (req, res) => res.send("Hello from zero‑downtime deploy!"));

app.listen(process.env.PORT || 3000, () => {
  console.log("Server started");
});

Last updated: {{today}}

Categories