Zero‑Downtime Node.js Deployments with PM2 & Git Hooks (Cluster Mode, Atomic Releases, Rollbacks)
Learn a battle‑tested, zero‑downtime deployment workflow for Node.js using PM2 (cluster mode) and Git hooks. Includes atomic releases, health checks, rollbacks, log rotation, and CI/CD‑ready patterns.
✍️ Short Summary
Ship updates with no downtime using PM2 + Git hooks. You’ll push to a remote bare repo, a post‑receive hook builds a new atomic release, swaps a current → releases/<rev>
symlink, and triggers pm2 reload
(cluster mode) for truly seamless restarts. Includes rollback, health checks, log rotation, and security hardening.
📎 Table of Contents (auto‑generated if long)
-
Why Zero‑Downtime & How This Works
-
Architecture Overview
-
Prerequisites
-
Directory Layout (Atomic Releases)
-
Option A: Git bare repo + post‑receive hook (recommended)
-
PM2 Configuration (ecosystem file)
-
Health Checks & Readiness Gates
-
Log Rotation & Observability
-
Rollback Strategy (Instant)
-
Optional: Using PM2 Deploy & CI/CD
-
NGINX/Caddy Reverse Proxy (snippet)
-
Security Hardening
-
Troubleshooting Checklist
-
✅ Conclusion / Next Steps
-
🔗 Related Articles
Note: Replace real domains with
example.com
. Avoid panel‑specific steps; this guide is platform‑agnostic.
🟢 Start Here — What is PM2?
PM2 is a production process manager for Node.js. It keeps your app running 24/7, scales it across CPU cores, and performs zero‑downtime reloads so users never see an outage during deploys. PM2 also standardizes logs, environment variables, and startup on reboot—perfect for single‑server or small‑cluster setups.
🎯 What PM2 Does (at a glance)
-
Keep‑alive & auto‑restart: If your app crashes, PM2 restarts it instantly.
-
Scale across cores (cluster mode): Spawn N workers behind an internal load balancer.
-
Zero‑downtime reloads:
pm2 reload
replaces workers one‑by‑one without dropping connections. -
Unified logs & rotation: View live logs and add rotation with
pm2‑logrotate
. -
Startup on boot: Generate a systemd unit and persist process lists.
-
Config as code: One
ecosystem.config.js
captures all runtime settings.
🚫 What PM2 Is Not
-
Not a web server/reverse proxy. Use NGINX or Caddy for TLS, HTTP/2, compression, static files, and routing.
-
Not a build or CI tool. Pair with your Git hooks/CI pipeline to build artifacts and trigger reloads.
-
Not an orchestrator. For many hosts/containers, consider Kubernetes or a PaaS.
🤔 When to Use (and When Not To)
Use PM2 when you manage Node.js on your own VM and want a simple, reliable path to scaling + zero‑downtime deploys.
Consider alternatives if you’re already on a full container/orchestration stack (Kubernetes), or a PaaS (e.g., Render/Fly/Heroku) that handles process supervision for you.
⚡ 2‑Minute Quick Start
npm i -g pm2
pm2 start server.js --name example-app # run once
pm2 start server.js --name example-app -i max # cluster across all cores
pm2 reload example-app # zero‑downtime reload
pm2 startup && pm2 save # start on reboot
pm2 logs example-app # live logs
🧩 Key Terms
-
Fork vs Cluster: Fork = 1 process. Cluster = N workers on the same port with zero‑downtime reloads.
-
Reload vs Restart: Reload is rolling replacement (no downtime, cluster). Restart stops then starts (brief blip).
-
Instances: Number of worker processes (e.g.,
2
,4
, ormax
). -
Ecosystem file:
ecosystem.config.js
—your app’s runtime config (env, logs, scaling, timeouts). -
Sticky sessions: Needed for WebSockets/Socket.IO so clients stick to one worker.
🧠 Mental Model (Simple Diagram)
Browser → NGINX/Caddy → PM2 (LB) → [Worker 1, Worker 2, ...]
↳ pm2 reload swaps workers one‑by‑one
❓FAQ
-
Do I still need NGINX/Caddy? Yes—terminate TLS, serve static files, and proxy to the app.
-
Where are logs?
pm2 logs
(live). Configure rotation viapm2 install pm2-logrotate
. -
TypeScript? Build to JS for prod (
tsc
). Source maps via--enable-source-maps
. -
WebSockets? Start with
--sticky
to maintain session affinity.
🚀 Why Zero‑Downtime & How This Works
Goal: Users never feel upgrades. Technique: PM2 runs your app in cluster mode (multiple processes). A push triggers a new release build and then pm2 reload
performs a graceful, rolling restart—old workers drain traffic while new ones boot.
Core guarantees:
-
Atomic releases: A new, versioned folder is built and switched in one step.
-
Reproducible builds: Use
npm ci
and a cleanreleases/<rev>
directory. -
Safe rollbacks: Repoint
current
symlink to a previous good release.
🧭 Architecture Overview
Developer → git push → [Server: bare repo]
└─ post‑receive hook → create releases/<rev>
→ npm ci && build
→ update symlink current → releases/<rev>
→ pm2 startOrReload ecosystem.config.js --env production
→ health check & verify
Screenshots (placeholders):
-
[Screenshot] PM2 dashboard after reload (
pm2 status
). -
[Screenshot] Release directories and
current
symlink. -
[Screenshot]
/healthz
endpoint returning 200 OK.
✅ Prerequisites
-
Linux server with Node.js 18+ and Git 2.4+.
-
PM2 globally installed:
npm i -g pm2
. -
A dedicated non‑root user (e.g.,
deploy
). -
Reverse proxy (NGINX/Caddy) pointing to your app’s port (e.g., 3000).
-
Environment variables available on the server (via
.env
or PM2 envs). Do not commit secrets.
Tip: Ensure the user running Git hooks has
node
andpm2
in PATH. If you use a version manager (e.g., nvm), export PATH inside the hook.
🗂️ Directory Layout (Atomic Releases)
/var/www/example-app
├─ repo.git/ # bare repo (server remote)
├─ releases/ # timestamped or <rev> folders
├─ shared/ # .env, uploads/, tmp/, etc.
├─ current → releases/<rev> # symlink switched atomically
└─ ecosystem.config.js
🧩 Option A: Git Bare Repo + post‑receive Hook (Recommended)
This option is simple, fast, and CI/CD‑ready without extra services.
1) Create folders & permissions
sudo mkdir -p /var/www/example-app/{releases,shared}
sudo mkdir -p /var/www/example-app/repo.git
sudo chown -R deploy:deploy /var/www/example-app
2) Initialize bare repo (server)
cd /var/www/example-app/repo.git
git init --bare
Add this server as a remote in your local repo, e.g.:
# on your laptop/workstation
git remote add production deploy@server:/var/www/example-app/repo.git
3) post‑receive hook (server)
Create hooks/post-receive
in the bare repo and make it executable.
cat > /var/www/example-app/repo.git/hooks/post-receive <<'HOOK'
#!/usr/bin/env bash
set -euo pipefail
APP_DIR=/var/www/example-app
RELEASES_DIR="$APP_DIR/releases"
CURRENT_LINK="$APP_DIR/current"
ECOSYSTEM="$APP_DIR/ecosystem.config.js"
SHARED_DIR="$APP_DIR/shared"
# derive revision id and a release folder
read oldrev newrev ref
REV=$(echo "$newrev" | cut -c1-7)
RELEASE="$RELEASES_DIR/$REV"
# ensure PATH for node/pm2 (adjust if using nvm/asdf)
export PATH=/usr/local/bin:/usr/bin:$PATH
mkdir -p "$RELEASE"
GIT_WORK_TREE="$RELEASE" git checkout -f "$newrev"
cd "$RELEASE"
# install deps (production only); switch to `pnpm i --frozen-lockfile --prod` if using pnpm
npm ci --omit=dev
# build step if required (React/Next/Nuxt/etc.)
if [ -f package.json ] && jq -er '.scripts.build' package.json >/dev/null 2>&1; then
npm run build
fi
# link shared assets (env, uploads, etc.)
if [ -f "$SHARED_DIR/.env" ]; then
ln -sfn "$SHARED_DIR/.env" "$RELEASE/.env"
fi
# atomic switch
ln -sfn "$RELEASE" "$CURRENT_LINK"
# start or reload via PM2 (zero‑downtime in cluster mode)
if pm2 list | grep -q "example-app"; then
pm2 startOrReload "$ECOSYSTEM" --env production
else
pm2 start "$ECOSYSTEM" --env production
fi
# optional: health check
curl -fsS http://127.0.0.1:3000/healthz >/dev/null && echo "Health OK" || { echo "Health FAILED"; exit 1; }
# persist pm2 process list on reboot
pm2 save
HOOK
chmod +x /var/www/example-app/repo.git/hooks/post-receive
Why atomic? If build fails, the
current
symlink is untouched, so the live app stays healthy.
4) First push
git push production main
The hook builds a release, switches current
, and performs a graceful pm2 reload
.
⚙️ PM2 Configuration (ecosystem.config.js)
Create /var/www/example-app/ecosystem.config.js
:
module.exports = {
apps: [
{
name: "example-app",
script: "./server.js", // your entry file
exec_mode: "cluster", // enables zero‑downtime reloads
instances: "max", // or a fixed number like 2 or 4
watch: false,
max_memory_restart: "512M",
env: {
NODE_ENV: "production",
PORT: 3000
},
env_production: {
NODE_ENV: "production"
},
kill_timeout: 5000, // allow workers to drain
listen_timeout: 8000, // readiness window
out_file: "/var/log/pm2/example-app.out.log",
error_file: "/var/log/pm2/example-app.err.log",
merge_logs: true
}
]
};
Important: Zero‑downtime requires cluster mode (multiple instances) or a blue‑green strategy. In fork mode (single process),
reload
behaves like a restart (brief interruption).
Persist PM2 on reboot:
pm2 startup
pm2 save
🩺 Health Checks & Readiness Gates
Expose a simple health endpoint (e.g., Express):
app.get("/healthz", (req, res) => {
res.status(200).json({ ok: true, uptime: process.uptime() });
});
Readiness tips:
-
Defer accepting traffic until DB connections are ready.
-
Use
listen_timeout
+kill_timeout
in PM2 to gracefully replace workers. -
With HTTP proxies, configure active health checks to the
current
target.
📊 Log Rotation & Observability
Install PM2’s logrotate module:
pm2 install pm2-logrotate
pm2 set pm2-logrotate:max_size 10M
pm2 set pm2-logrotate:retain 7
pm2 set pm2-logrotate:compress true
pm2 set pm2-logrotate:dateFormat YYYY-MM-DD_HH-mm-ss
Live insights:
pm2 status
pm2 logs example-app
pm2 monit
⏪ Rollback Strategy (Instant)
List available releases:
ls -1 /var/www/example-app/releases
Switch back atomically and reload:
cd /var/www/example-app
ln -sfn releases/<good_rev> current
pm2 reload ecosystem.config.js --env production
Keep the last N releases (e.g., 5) and prune older ones via a nightly cron.
🔄 Optional: Using PM2 Deploy & CI/CD
PM2 has a built‑in deploy
system that can pull from Git and run post-deploy
commands. It’s CI/CD‑friendly and uses your ecosystem file. Example snippet:
module.exports = {
apps: [ /* ...same as above... */ ],
deploy: {
production: {
user: "deploy",
host: ["server"],
ref: "origin/main",
repo: "git@your-vcs:org/example-app.git",
path: "/var/www/example-app",
"post-deploy": "npm ci --omit=dev && npm run build && pm2 startOrReload ecosystem.config.js --env production"
}
}
}
Works great with CI. Use environment‑specific secrets from your runner’s vault; avoid committing secrets.
🌐 Reverse Proxy (NGINX / Caddy)
NGINX example:
upstream example_app {
server 127.0.0.1:3000;
}
server {
server_name example.com;
location / {
proxy_pass http://example_app;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
Caddy example:
example.com {
encode gzip
reverse_proxy 127.0.0.1:3000
}
Ensure your proxy health checks point at
/healthz
.
🔐 Security Hardening
-
Create a locked‑down
deploy
user; limit sudo. -
Store
.env
inshared/
with strict permissions. -
Only allow fast‑forward pushes; protect main branch in your VCS.
-
Ensure hooks are executable and owned by the deploy user.
-
Keep Node/PM2 updated; enable OS auto‑security updates.
🧰 Troubleshooting Checklist
-
Hook didn’t fire? File not executable (
chmod +x hooks/post-receive
). -
node: command not found
in hook? Export PATH to Node/PM2; avoid relying on an interactive shell. -
Operation must be run in a work tree
? UseGIT_WORK_TREE="$RELEASE" git checkout -f "$newrev"
. -
Build succeeds but app fails? Check
pm2 logs
and ensure env vars are linked fromshared/.env
. -
No zero‑downtime? Confirm
exec_mode: "cluster"
andinstances >= 2
. -
Memory leaks / crashes? Set
max_memory_restart
, add monitoring, profile hotspots.
🧠 PM2 Deep Dive — Concepts, Options & Best Practices
Drop this section into your existing article to give readers everything they need to operate PM2 confidently in production.
🔍 What PM2 Is (and Isn’t)
-
Process manager for Node.js: starts, keeps alive, restarts on failure, and reloads with zero downtime in cluster mode.
-
Not a build tool or CI server—pair it with your Git hooks/CI to ship artifacts.
🧪 Install, Update, and Verify
npm i -g pm2@latest
pm2 -v
pm2 update # refresh PM2 runtime + agent without losing processes
🧬 Process Models
-
Fork mode: 1 process. Simple, but
reload
behaves like a restart (tiny blip). Suitable for workers/queues. -
Cluster mode: N processes share the same port.
pm2 reload
replaces workers one by one → zero‑downtime web apps.-
Choose
instances: "max"
for all CPU cores or a fixed number (e.g., 2 or 4). -
WebSockets/Socket.IO? Use sticky sessions to keep a client on the same worker:
pm2 start ecosystem.config.js --sticky
-
🔄 Reload Semantics & Graceful Lifecycle
-
pm2 reload <name>
: rolling replacement (cluster mode). Old worker drains, new worker boots. -
pm2 restart <name>
: stop then start (brief interruption). -
pm2 stop <name>
: take the app offline.
Graceful readiness (recommended):
-
App signals it’s ready using PM2’s
wait_ready
mechanism. -
PM2 waits up to
listen_timeout
for the ready signal before routing traffic.
Ecosystem snippet:
{
name: "example-app",
script: "server.js",
exec_mode: "cluster",
instances: 2,
wait_ready: true, // app will call process.send('ready')
listen_timeout: 8000, // how long PM2 waits for 'ready'
kill_timeout: 5000 // how long to let old worker drain
}
App code:
const http = require('http');
const server = http.createServer(handler);
server.listen(3000, () => {
if (process.send) process.send('ready');
});
process.on('SIGINT', gracefulExit);
process.on('SIGTERM', gracefulExit);
function gracefulExit(){
server.close(() => process.exit(0)); // finish inflight reqs
setTimeout(() => process.exit(1), 8000); // hard timeout
}
🧾 Ecosystem File — Common Options (Cheat Sheet)
module.exports = {
apps: [{
name: "example-app",
script: "./server.js",
args: "", // extra CLI args to your script
exec_mode: "cluster", // or "fork"
instances: "max", // or a number
cwd: "/var/www/example-app/current", // working dir
watch: false, // change to true ONLY for dev
ignore_watch: ["node_modules", "logs", "tmp"],
max_memory_restart: "512M",
min_uptime: "10s", // consider app unstable before this
max_restarts: 10, // cap restarts for flapping apps
exp_backoff_restart_delay: 200, // ms; grows exponentially
env: { NODE_ENV: "production", PORT: 3000 },
env_production: { NODE_ENV: "production" },
out_file: "/var/log/pm2/example-app.out.log",
error_file: "/var/log/pm2/example-app.err.log",
merge_logs: true,
log_date_format: "YYYY-MM-DD HH:mm:ss Z",
node_args: "--enable-source-maps --max-old-space-size=512",
}]
}
🔑 Environment & Secrets
-
Prefer PM2 env vars (
env
,env_production
) for non‑secret config. -
For secrets, keep a
.env
inshared/
and symlink it into each release (as shown in your Git hook). -
If you use
dotenv
, load it at the top of your entry file.
💾 Startup on Reboot & State Persistence
pm2 startup systemd -u deploy --hp /home/deploy # generate unit
pm2 save # persist process list
pm2 resurrect # restore from dump
pm2 unstartup systemd # remove integration
🪵 Logs & Rotation
-
Live tail:
pm2 logs <name>
or all apps:pm2 logs
. -
Install rotation once:
pm2 install pm2-logrotate pm2 set pm2-logrotate:max_size 10M pm2 set pm2-logrotate:retain 7 pm2 set pm2-logrotate:compress true pm2 set pm2-logrotate:dateFormat YYYY-MM-DD_HH-mm-ss
🧰 Everyday Commands
pm2 list # overview
pm2 status # same with more details
pm2 describe example-app # full metadata
pm2 reload example-app # zero‑downtime reload
pm2 restart example-app # hard restart
pm2 stop example-app # stop
pm2 delete example-app # remove from PM2
pm2 env 0 # show env of app with id 0
pm2 monit # ncurses dashboard
📈 Observability & Metrics
-
Add
/healthz
and/readyz
endpoints; wire your reverse proxy health checks. -
Use
pm2 monit
for CPU/mem; export app metrics to Prometheus (e.g.,prom-client
) and include GC stats.
🧱 Blue‑Green with PM2 (Alternative Pattern)
-
Run two app names (e.g.,
app-blue
,app-green
) on distinct ports. -
Deploy to the idle color, smoke‑test
/healthz
, then flip the reverse proxy upstream. -
Keep the previous color for instant rollback.
🧵 TypeScript & Source Maps
-
Prefer building to JS during deploy (
tsc -p .
) and run built files. -
If running TS directly, use
ts-node
in dev only; in prod, JS is safer. -
Enable stack traces with original lines:
node_args: "--enable-source-maps"
🌐 Sticky Sessions for Realtime Apps
-
For Socket.IO/WebSockets in cluster mode, start with sticky routing:
pm2 start ecosystem.config.js --sticky
Ensure your reverse proxy is a TCP pass‑through to PM2’s LB (or terminate TLS then proxy to the app port).
🧯 Advanced Troubleshooting
-
node: command not found
in hooks: export PATH in the hook; avoid relying on interactive shells ornvm
being sourced. -
Reload isn’t zero‑downtime: verify
exec_mode: "cluster"
andinstances >= 2
; checkwait_ready
/listen_timeout
logic. -
Flapping restarts: tune
min_uptime
,max_restarts
,exp_backoff_restart_delay
; inspectpm2 logs
. -
Memory pressure: increase
--max-old-space-size
, fix leaks, or scale instances. -
File watchers in prod: keep
watch: false
to avoid CPU spikes.
✅ Quick PM2 Checklist
-
Cluster mode with ≥2 instances
-
wait_ready
,listen_timeout
,kill_timeout
set -
Log rotation configured
-
pm2 startup
+pm2 save
in place -
Health checks wired in proxy & CI smoke tests
-
Rollback plan tested (previous release kept)
✅ Conclusion / Next Steps
With PM2 cluster mode + atomic releases via Git hooks, you get guaranteed, zero‑downtime deploys, clean rollbacks, and predictable builds. Next, wire this into your CI pipeline, add synthetic monitoring for /healthz
, and automate release pruning.
Quick next steps:
-
Create the bare repo and post‑receive hook.
-
Add the PM2 ecosystem file with cluster mode.
-
Push to production and verify
pm2 status
+/healthz
. -
Schedule log rotation and release cleanup.
🔗 Related Articles (suggested)
-
PM2 Deep Dive: Cluster Mode, Reloads, and Graceful Shutdowns
-
Git Hooks 101: pre‑receive, post‑receive, and Secure Deployment Patterns
-
Managing Secrets:
.env
vs environment vaults in CI/CD -
Observability Stack for Node.js (logs, metrics, traces)
-
Blue‑Green vs Rolling Deploys: When to Choose What
🧩 Appendix – Full Example Files
A) ecosystem.config.js
module.exports = {
apps: [
{
name: "example-app",
script: "./server.js",
exec_mode: "cluster",
instances: 2,
watch: false,
max_memory_restart: "512M",
env: { NODE_ENV: "production", PORT: 3000 },
env_production: { NODE_ENV: "production" },
kill_timeout: 5000,
listen_timeout: 8000,
out_file: "/var/log/pm2/example-app.out.log",
error_file: "/var/log/pm2/example-app.err.log",
merge_logs: true
}
]
};
B) hooks/post-receive
(From the section above; ensure chmod +x
.)
C) Example Express server with health endpoint
const express = require("express");
const app = express();
app.get("/healthz", (req, res) => res.json({ ok: true }));
app.get("/", (req, res) => res.send("Hello from zero‑downtime deploy!"));
app.listen(process.env.PORT || 3000, () => {
console.log("Server started");
});
Last updated: {{today}}