Zero‑Downtime Node.js Deployments with PM2 & Git Hooks (Cluster Mode, Atomic Releases, Rollbacks)
Learn a battle‑tested, zero‑downtime deployment workflow for Node.js using PM2 (cluster mode) and Git hooks. Includes atomic releases, health checks, rollbacks, log rotation, and CI/CD‑ready patterns.
✍️ Short Summary
Ship updates with no downtime using PM2 + Git hooks. You'll push to a remote bare repo, a post‑receive hook builds a new atomic release, swaps a current → releases/<rev> symlink, and triggers pm2 reload (cluster mode) for truly seamless restarts. Includes rollback, health checks, log rotation, and security hardening.
📎 Table of Contents (auto‑generated if long)
-
Why Zero‑Downtime & How This Works
-
Architecture Overview
-
Prerequisites
-
Directory Layout (Atomic Releases)
-
Option A: Git bare repo + post‑receive hook (recommended)
-
PM2 Configuration (ecosystem file)
-
Health Checks & Readiness Gates
-
Log Rotation & Observability
-
Rollback Strategy (Instant)
-
Optional: Using PM2 Deploy & CI/CD
-
NGINX/Caddy Reverse Proxy (snippet)
-
Security Hardening
-
Troubleshooting Checklist
-
✅ Conclusion / Next Steps
-
🔗 Related Articles
Note: Replace real domains with
example.com. Avoid panel‑specific steps; this guide is platform‑agnostic.
🟢 Start Here -- What is PM2?
PM2 is a production process manager for Node.js. It keeps your app running 24/7, scales it across CPU cores, and performs zero‑downtime reloads so users never see an outage during deploys. PM2 also standardizes logs, environment variables, and startup on reboot--perfect for single‑server or small‑cluster setups.
🎯 What PM2 Does (at a glance)
-
Keep‑alive & auto‑restart: If your app crashes, PM2 restarts it instantly.
-
Scale across cores (cluster mode): Spawn N workers behind an internal load balancer.
-
Zero‑downtime reloads:
pm2 reloadreplaces workers one‑by‑one without dropping connections. -
Unified logs & rotation: View live logs and add rotation with
pm2‑logrotate. -
Startup on boot: Generate a systemd unit and persist process lists.
-
Config as code: One
ecosystem.config.jscaptures all runtime settings.
🚫 What PM2 Is Not
-
Not a web server/reverse proxy. Use NGINX or Caddy for TLS, HTTP/2, compression, static files, and routing.
-
Not a build or CI tool. Pair with your Git hooks/CI pipeline to build artifacts and trigger reloads.
-
Not an orchestrator. For many hosts/containers, consider Kubernetes or a PaaS.
🤔 When to Use (and When Not To)
Use PM2 when you manage Node.js on your own VM and want a simple, reliable path to scaling + zero‑downtime deploys.
Consider alternatives if you're already on a full container/orchestration stack (Kubernetes), or a PaaS (e.g., Render/Fly/Heroku) that handles process supervision for you.
⚡ 2‑Minute Quick Start
npm i -g pm2
pm2 start server.js --name example-app # run once
pm2 start server.js --name example-app -i max # cluster across all cores
pm2 reload example-app # zero‑downtime reload
pm2 startup && pm2 save # start on reboot
pm2 logs example-app # live logs
🧩 Key Terms
-
Fork vs Cluster: Fork = 1 process. Cluster = N workers on the same port with zero‑downtime reloads.
-
Reload vs Restart: Reload is rolling replacement (no downtime, cluster). Restart stops then starts (brief blip).
-
Instances: Number of worker processes (e.g.,
2,4, ormax). -
Ecosystem file:
ecosystem.config.js--your app's runtime config (env, logs, scaling, timeouts). -
Sticky sessions: Needed for WebSockets/Socket.IO so clients stick to one worker.
🧠 Mental Model (Simple Diagram)
Browser → NGINX/Caddy → PM2 (LB) → [Worker 1, Worker 2, ...]
↳ pm2 reload swaps workers one‑by‑one
❓FAQ
-
Do I still need NGINX/Caddy? Yes--terminate TLS, serve static files, and proxy to the app.
-
Where are logs?
pm2 logs(live). Configure rotation viapm2 install pm2-logrotate. -
TypeScript? Build to JS for prod (
tsc). Source maps via--enable-source-maps. -
WebSockets? Start with
--stickyto maintain session affinity.
🚀 Why Zero‑Downtime & How This Works
Goal: Users never feel upgrades. Technique: PM2 runs your app in cluster mode (multiple processes). A push triggers a new release build and then pm2 reload performs a graceful, rolling restart--old workers drain traffic while new ones boot.
Core guarantees:
-
Atomic releases: A new, versioned folder is built and switched in one step.
-
Reproducible builds: Use
npm ciand a cleanreleases/<rev>directory. -
Safe rollbacks: Repoint
currentsymlink to a previous good release.
🧭 Architecture Overview
Developer → git push → [Server: bare repo]
└─ post‑receive hook → create releases/<rev>
→ npm ci && build
→ update symlink current → releases/<rev>
→ pm2 startOrReload ecosystem.config.js --env production
→ health check & verify
Screenshots (placeholders):
-
[Screenshot] PM2 dashboard after reload (
pm2 status). -
[Screenshot] Release directories and
currentsymlink. -
[Screenshot]
/healthzendpoint returning 200 OK.
✅ Prerequisites
-
Linux server with Node.js 18+ and Git 2.4+.
-
PM2 globally installed:
npm i -g pm2. -
A dedicated non‑root user (e.g.,
deploy). -
Reverse proxy (NGINX/Caddy) pointing to your app's port (e.g., 3000).
-
Environment variables available on the server (via
.envor PM2 envs). Do not commit secrets.
Tip: Ensure the user running Git hooks has
nodeandpm2in PATH. If you use a version manager (e.g., nvm), export PATH inside the hook.
🗂️ Directory Layout (Atomic Releases)
/var/www/example-app
├─ repo.git/ # bare repo (server remote)
├─ releases/ # timestamped or <rev> folders
├─ shared/ # .env, uploads/, tmp/, etc.
├─ current → releases/<rev> # symlink switched atomically
└─ ecosystem.config.js
🧩 Option A: Git Bare Repo + post‑receive Hook (Recommended)
This option is simple, fast, and CI/CD‑ready without extra services.
1) Create folders & permissions
sudo mkdir -p /var/www/example-app/{releases,shared}
sudo mkdir -p /var/www/example-app/repo.git
sudo chown -R deploy:deploy /var/www/example-app
2) Initialize bare repo (server)
cd /var/www/example-app/repo.git
git init --bare
Add this server as a remote in your local repo, e.g.:
# on your laptop/workstation
git remote add production deploy@server:/var/www/example-app/repo.git
3) post‑receive hook (server)
Create hooks/post-receive in the bare repo and make it executable.
cat > /var/www/example-app/repo.git/hooks/post-receive <<'HOOK'
#!/usr/bin/env bash
set -euo pipefail
APP_DIR=/var/www/example-app
RELEASES_DIR="$APP_DIR/releases"
CURRENT_LINK="$APP_DIR/current"
ECOSYSTEM="$APP_DIR/ecosystem.config.js"
SHARED_DIR="$APP_DIR/shared"
# derive revision id and a release folder
read oldrev newrev ref
REV=$(echo "$newrev" | cut -c1-7)
RELEASE="$RELEASES_DIR/$REV"
# ensure PATH for node/pm2 (adjust if using nvm/asdf)
export PATH=/usr/local/bin:/usr/bin:$PATH
mkdir -p "$RELEASE"
GIT_WORK_TREE="$RELEASE" git checkout -f "$newrev"
cd "$RELEASE"
# install deps (production only); switch to `pnpm i --frozen-lockfile --prod` if using pnpm
npm ci --omit=dev
# build step if required (React/Next/Nuxt/etc.)
if [ -f package.json ] && jq -er '.scripts.build' package.json >/dev/null 2>&1; then
npm run build
fi
# link shared assets (env, uploads, etc.)
if [ -f "$SHARED_DIR/.env" ]; then
ln -sfn "$SHARED_DIR/.env" "$RELEASE/.env"
fi
# atomic switch
ln -sfn "$RELEASE" "$CURRENT_LINK"
# start or reload via PM2 (zero‑downtime in cluster mode)
if pm2 list | grep -q "example-app"; then
pm2 startOrReload "$ECOSYSTEM" --env production
else
pm2 start "$ECOSYSTEM" --env production
fi
# optional: health check
curl -fsS http://127.0.0.1:3000/healthz >/dev/null && echo "Health OK" || { echo "Health FAILED"; exit 1; }
# persist pm2 process list on reboot
pm2 save
HOOK
chmod +x /var/www/example-app/repo.git/hooks/post-receive
Why atomic? If build fails, the
currentsymlink is untouched, so the live app stays healthy.
4) First push
git push production main
The hook builds a release, switches current, and performs a graceful pm2 reload.
⚙️ PM2 Configuration (ecosystem.config.js)
Create /var/www/example-app/ecosystem.config.js:
module.exports = {
apps: [
{
name: "example-app",
script: "./server.js", // your entry file
exec_mode: "cluster", // enables zero‑downtime reloads
instances: "max", // or a fixed number like 2 or 4
watch: false,
max_memory_restart: "512M",
env: {
NODE_ENV: "production",
PORT: 3000
},
env_production: {
NODE_ENV: "production"
},
kill_timeout: 5000, // allow workers to drain
listen_timeout: 8000, // readiness window
out_file: "/var/log/pm2/example-app.out.log",
error_file: "/var/log/pm2/example-app.err.log",
merge_logs: true
}
]
};
Important: Zero‑downtime requires cluster mode (multiple instances) or a blue‑green strategy. In fork mode (single process),
reloadbehaves like a restart (brief interruption).
Persist PM2 on reboot:
pm2 startup
pm2 save
🩺 Health Checks & Readiness Gates
Expose a simple health endpoint (e.g., Express):
app.get("/healthz", (req, res) => {
res.status(200).json({ ok: true, uptime: process.uptime() });
});
Readiness tips:
-
Defer accepting traffic until DB connections are ready.
-
Use
listen_timeout+kill_timeoutin PM2 to gracefully replace workers. -
With HTTP proxies, configure active health checks to the
currenttarget.
📊 Log Rotation & Observability
Install PM2's logrotate module:
pm2 install pm2-logrotate
pm2 set pm2-logrotate:max_size 10M
pm2 set pm2-logrotate:retain 7
pm2 set pm2-logrotate:compress true
pm2 set pm2-logrotate:dateFormat YYYY-MM-DD_HH-mm-ss
Live insights:
pm2 status
pm2 logs example-app
pm2 monit
⏪ Rollback Strategy (Instant)
List available releases:
ls -1 /var/www/example-app/releases
Switch back atomically and reload:
cd /var/www/example-app
ln -sfn releases/<good_rev> current
pm2 reload ecosystem.config.js --env production
Keep the last N releases (e.g., 5) and prune older ones via a nightly cron.
🔄 Optional: Using PM2 Deploy & CI/CD
PM2 has a built‑in deploy system that can pull from Git and run post-deploy commands. It's CI/CD‑friendly and uses your ecosystem file. Example snippet:
module.exports = {
apps: [ /* ...same as above... */ ],
deploy: {
production: {
user: "deploy",
host: ["server"],
ref: "origin/main",
repo: "git@your-vcs:org/example-app.git",
path: "/var/www/example-app",
"post-deploy": "npm ci --omit=dev && npm run build && pm2 startOrReload ecosystem.config.js --env production"
}
}
}
Works great with CI. Use environment‑specific secrets from your runner's vault; avoid committing secrets.
🌐 Reverse Proxy (NGINX / Caddy)
NGINX example:
upstream example_app {
server 127.0.0.1:3000;
}
server {
server_name example.com;
location / {
proxy_pass http://example_app;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
Caddy example:
example.com {
encode gzip
reverse_proxy 127.0.0.1:3000
}
Ensure your proxy health checks point at
/healthz.
🔐 Security Hardening
-
Create a locked‑down
deployuser; limit sudo. -
Store
.envinshared/with strict permissions. -
Only allow fast‑forward pushes; protect main branch in your VCS.
-
Ensure hooks are executable and owned by the deploy user.
-
Keep Node/PM2 updated; enable OS auto‑security updates.
🧰 Troubleshooting Checklist
-
Hook didn't fire? File not executable (
chmod +x hooks/post-receive). -
node: command not foundin hook? Export PATH to Node/PM2; avoid relying on an interactive shell. -
Operation must be run in a work tree? UseGIT_WORK_TREE="$RELEASE" git checkout -f "$newrev". -
Build succeeds but app fails? Check
pm2 logsand ensure env vars are linked fromshared/.env. -
No zero‑downtime? Confirm
exec_mode: "cluster"andinstances >= 2. -
Memory leaks / crashes? Set
max_memory_restart, add monitoring, profile hotspots.
🧠 PM2 Deep Dive -- Concepts, Options & Best Practices
Drop this section into your existing article to give readers everything they need to operate PM2 confidently in production.
🔍 What PM2 Is (and Isn't)
-
Process manager for Node.js: starts, keeps alive, restarts on failure, and reloads with zero downtime in cluster mode.
-
Not a build tool or CI server--pair it with your Git hooks/CI to ship artifacts.
🧪 Install, Update, and Verify
npm i -g pm2@latest
pm2 -v
pm2 update # refresh PM2 runtime + agent without losing processes
🧬 Process Models
-
Fork mode: 1 process. Simple, but
reloadbehaves like a restart (tiny blip). Suitable for workers/queues. -
Cluster mode: N processes share the same port.
pm2 reloadreplaces workers one by one → zero‑downtime web apps.-
Choose
instances: "max"for all CPU cores or a fixed number (e.g., 2 or 4). -
WebSockets/Socket.IO? Use sticky sessions to keep a client on the same worker:
pm2 start ecosystem.config.js --sticky
-
🔄 Reload Semantics & Graceful Lifecycle
-
pm2 reload <name>: rolling replacement (cluster mode). Old worker drains, new worker boots. -
pm2 restart <name>: stop then start (brief interruption). -
pm2 stop <name>: take the app offline.
Graceful readiness (recommended):
-
App signals it's ready using PM2's
wait_readymechanism. -
PM2 waits up to
listen_timeoutfor the ready signal before routing traffic.
Ecosystem snippet:
{
name: "example-app",
script: "server.js",
exec_mode: "cluster",
instances: 2,
wait_ready: true, // app will call process.send('ready')
listen_timeout: 8000, // how long PM2 waits for 'ready'
kill_timeout: 5000 // how long to let old worker drain
}
App code:
const http = require('http');
const server = http.createServer(handler);
server.listen(3000, () => {
if (process.send) process.send('ready');
});
process.on('SIGINT', gracefulExit);
process.on('SIGTERM', gracefulExit);
function gracefulExit(){
server.close(() => process.exit(0)); // finish inflight reqs
setTimeout(() => process.exit(1), 8000); // hard timeout
}
🧾 Ecosystem File -- Common Options (Cheat Sheet)
module.exports = {
apps: [{
name: "example-app",
script: "./server.js",
args: "", // extra CLI args to your script
exec_mode: "cluster", // or "fork"
instances: "max", // or a number
cwd: "/var/www/example-app/current", // working dir
watch: false, // change to true ONLY for dev
ignore_watch: ["node_modules", "logs", "tmp"],
max_memory_restart: "512M",
min_uptime: "10s", // consider app unstable before this
max_restarts: 10, // cap restarts for flapping apps
exp_backoff_restart_delay: 200, // ms; grows exponentially
env: { NODE_ENV: "production", PORT: 3000 },
env_production: { NODE_ENV: "production" },
out_file: "/var/log/pm2/example-app.out.log",
error_file: "/var/log/pm2/example-app.err.log",
merge_logs: true,
log_date_format: "YYYY-MM-DD HH:mm:ss Z",
node_args: "--enable-source-maps --max-old-space-size=512",
}]
}
🔑 Environment & Secrets
-
Prefer PM2 env vars (
env,env_production) for non‑secret config. -
For secrets, keep a
.envinshared/and symlink it into each release (as shown in your Git hook). -
If you use
dotenv, load it at the top of your entry file.
💾 Startup on Reboot & State Persistence
pm2 startup systemd -u deploy --hp /home/deploy # generate unit
pm2 save # persist process list
pm2 resurrect # restore from dump
pm2 unstartup systemd # remove integration
🪵 Logs & Rotation
-
Live tail:
pm2 logs <name>or all apps:pm2 logs. -
Install rotation once:
pm2 install pm2-logrotate pm2 set pm2-logrotate:max_size 10M pm2 set pm2-logrotate:retain 7 pm2 set pm2-logrotate:compress true pm2 set pm2-logrotate:dateFormat YYYY-MM-DD_HH-mm-ss
🧰 Everyday Commands
pm2 list # overview
pm2 status # same with more details
pm2 describe example-app # full metadata
pm2 reload example-app # zero‑downtime reload
pm2 restart example-app # hard restart
pm2 stop example-app # stop
pm2 delete example-app # remove from PM2
pm2 env 0 # show env of app with id 0
pm2 monit # ncurses dashboard
📈 Observability & Metrics
-
Add
/healthzand/readyzendpoints; wire your reverse proxy health checks. -
Use
pm2 monitfor CPU/mem; export app metrics to Prometheus (e.g.,prom-client) and include GC stats.
🧱 Blue‑Green with PM2 (Alternative Pattern)
-
Run two app names (e.g.,
app-blue,app-green) on distinct ports. -
Deploy to the idle color, smoke‑test
/healthz, then flip the reverse proxy upstream. -
Keep the previous color for instant rollback.
🧵 TypeScript & Source Maps
-
Prefer building to JS during deploy (
tsc -p .) and run built files. -
If running TS directly, use
ts-nodein dev only; in prod, JS is safer. -
Enable stack traces with original lines:
node_args: "--enable-source-maps"
🌐 Sticky Sessions for Realtime Apps
-
For Socket.IO/WebSockets in cluster mode, start with sticky routing:
pm2 start ecosystem.config.js --stickyEnsure your reverse proxy is a TCP pass‑through to PM2's LB (or terminate TLS then proxy to the app port).
🧯 Advanced Troubleshooting
-
node: command not foundin hooks: export PATH in the hook; avoid relying on interactive shells ornvmbeing sourced. -
Reload isn't zero‑downtime: verify
exec_mode: "cluster"andinstances >= 2; checkwait_ready/listen_timeoutlogic. -
Flapping restarts: tune
min_uptime,max_restarts,exp_backoff_restart_delay; inspectpm2 logs. -
Memory pressure: increase
--max-old-space-size, fix leaks, or scale instances. -
File watchers in prod: keep
watch: falseto avoid CPU spikes.
✅ Quick PM2 Checklist
-
Cluster mode with ≥2 instances
-
wait_ready,listen_timeout,kill_timeoutset -
Log rotation configured
-
pm2 startup+pm2 savein place -
Health checks wired in proxy & CI smoke tests
-
Rollback plan tested (previous release kept)
✅ Conclusion / Next Steps
With PM2 cluster mode + atomic releases via Git hooks, you get guaranteed, zero‑downtime deploys, clean rollbacks, and predictable builds. Next, wire this into your CI pipeline, add synthetic monitoring for /healthz, and automate release pruning.
Quick next steps:
-
Create the bare repo and post‑receive hook.
-
Add the PM2 ecosystem file with cluster mode.
-
Push to production and verify
pm2 status+/healthz. -
Schedule log rotation and release cleanup.
🔗 Related Articles (suggested)
-
PM2 Deep Dive: Cluster Mode, Reloads, and Graceful Shutdowns
-
Git Hooks 101: pre‑receive, post‑receive, and Secure Deployment Patterns
-
Managing Secrets:
.envvs environment vaults in CI/CD -
Observability Stack for Node.js (logs, metrics, traces)
-
Blue‑Green vs Rolling Deploys: When to Choose What
🧩 Appendix - Full Example Files
A) ecosystem.config.js
module.exports = {
apps: [
{
name: "example-app",
script: "./server.js",
exec_mode: "cluster",
instances: 2,
watch: false,
max_memory_restart: "512M",
env: { NODE_ENV: "production", PORT: 3000 },
env_production: { NODE_ENV: "production" },
kill_timeout: 5000,
listen_timeout: 8000,
out_file: "/var/log/pm2/example-app.out.log",
error_file: "/var/log/pm2/example-app.err.log",
merge_logs: true
}
]
};
B) hooks/post-receive
(From the section above; ensure chmod +x.)
C) Example Express server with health endpoint
const express = require("express");
const app = express();
app.get("/healthz", (req, res) => res.json({ ok: true }));
app.get("/", (req, res) => res.send("Hello from zero‑downtime deploy!"));
app.listen(process.env.PORT || 3000, () => {
console.log("Server started");
});
Last updated: {{today}}