The Production-Ready VPS Playbook: OS Hardening, Gateways, TLS, Observability & Ops Print

  • 0

✍️ Short Summary

This add‑on to your main guide fills common gaps teams hit when taking a KVM NVMe VPS from first boot to production: OS baseline, secrets management, TLS & security headers, reverse proxy best practices (Nginx/Caddy), observability, backups/DR, Cloudflare edge config, performance tuning, runbooks, and stack‑specific connection snippets (MongoDB included).


📎 Table of Contents

  1. OS & Security Baseline

  2. Secrets & Configuration Management

  3. Edge Gateways: Nginx vs Caddy

  4. TLS & App Security Headers

  5. Process Management Patterns

  6. Logging & Observability

  7. Backups & Disaster Recovery

  8. Networking: Cloudflare, Real‑IP, WebSockets

  9. Performance Tuning Cheatsheet

  10. Compliance & Data Handling

  11. Runbooks: On‑Call & Incident Basics

  12. Templates & Snippets

    • Nginx SSR/WebSockets

    • Caddy reverse proxy

    • Cloudflare Real‑IP

    • systemd service template

    • .env example

  13. Stack DB Connect Quick‑Refs (MongoDB)

  14. Conclusion / Next Steps


1) 🔒 OS & Security Baseline

Goal: Hard, predictable foundation for all stacks.

Checklist

  • Create non‑root sudo user; disable password SSH; use ed25519 keys

  • UFW allow 22, 80, 443. Deny others by default

  • Fail2ban with jails for SSH, Nginx, and auth logs

  • Unattended‑upgrades for security patches

  • Time sync with chrony; set timezone; enable NTP

  • Kernel/VM: disable Transparent Huge Pages (THP), set vm.swappiness=1

  • FD limits: LimitNOFILE=64000 via systemd drop‑in

  • Swap: create (1–2× RAM) or confirm swap is adequate for crash tolerance

  • Filesystem: prefer ext4/xfs on NVMe; consider noatime/lazytime mounts


2) 🔑 Secrets & Configuration Management

Goal: Keep credentials safe, versionable, and environment‑specific.

  • Store application config in .env with 0600 permissions

  • For production, reference env via systemd EnvironmentFile=

  • Rotate secrets quarterly; maintain a secrets changelog

  • Optional: age/sops for encrypted config in Git

  • Separate staging and production env files and buckets


3) 🌐 Edge Gateways: Nginx vs Caddy

When to choose Nginx

  • Familiarity, fine‑grained performance knobs, advanced caching

When to choose Caddy

  • Fastest path to automatic HTTPS, simple reverse proxy, clean config

Rule of thumb: Use Caddy for 1–3 services and quick wins; use Nginx when you need granular control, complex caching, or legacy features.


4) 🔐 TLS & App Security Headers

Let’s Encrypt (either via Certbot with Nginx or Caddy’s auto‑TLS).

Minimum headers

  • Strict-Transport-Security: max-age=31536000; includeSubDomains; preload

  • Content-Security-Policy (start with default-src 'self' and expand)

  • X-Frame-Options: SAMEORIGIN

  • X-Content-Type-Options: nosniff

  • Referrer-Policy: no-referrer-when-downgrade

  • Permissions-Policy as needed


5) 🧩 Process Management Patterns

Pick one per service — don’t double‑wrap.

  • Node.js: PM2 or systemd (not both). PM2 for clustering/zero‑downtime; systemd for OS‑native control

  • Python: Gunicorn/Uvicorn behind Nginx/Caddy, managed by systemd

  • Rails: Puma + systemd; assets precompiled in CI/CD

  • .NET: Kestrel + Nginx/Caddy; systemd service

  • Go/Rust: single binary + systemd; graceful shutdown; health endpoints


6) 📊 Logging & Observability

Target: See issues before users do.

  • Structured JSON logs (app + proxy); centralize to Loki/ELK

  • Metrics: node_exporter, app exporters (e.g., MongoDB exporter), Prometheus + Grafana dashboards

  • Uptime: Prometheus alerts + external pingers

  • Logrotate policies; compress & age out


7) 💾 Backups & Disaster Recovery

  • Filesystem snapshots (provider) + logical dumps (DB tools)

  • 3‑2‑1 rule: 3 copies, 2 media, 1 offsite (S3/B2)

  • Restore tests monthly; document RTO/RPO targets

  • DB specifics (see templates): Postgres (pgBackRest), MySQL (Percona/XtraBackup or mysqldump), MongoDB (mongodump + snapshots)


8) 🛰️ Networking: Cloudflare, Real‑IP, WebSockets

  • Proxy orange‑cloud for DNS A/AAAA; set CF-Connecting-IP handling

  • Respect real‑client IP in app & logs (Nginx/Caddy snippets below)

  • WebSockets: ensure upgrade headers and keepalive tuning

  • Rate limits/WAF: start with sensible defaults; log violations


9) 🚀 Performance Tuning Cheatsheet

  • Keepalive: keepalive_timeout 15s; HTTP/2 enabled

  • Compression: Brotli (prefer) with sane min sizes; fallback gzip

  • Static: long Cache-Control + content hashing

  • DB: create indexes early; monitor slow logs; pool sizes appropriate

  • Redis: persistent AOF + memory policy; separate from session cache for Woo/Magento

  • Queues: RabbitMQ/Redis with dead‑letter policies


10) 🛡️ Compliance & Data Handling

  • Data residency: pin region; document data flows

  • Encrypt PII at rest (DB or app‑level). Use KMS or sealed secrets

  • GDPR basics: retention schedules, right‑to‑erasure playbook


11) 🧭 Runbooks: On‑Call & Incident Basics

  • Who to page: roles & contact ladder

  • First 5 minutes: check health endpoints, error rates, DB connections, disk free, TLS expiry

  • Rollback: documented blue/green or tag‑based deployment rollback

  • Postmortem: template with action items & owners


12) 🔧 Templates & Snippets

Nginx (SSR + WebSockets)

map $http_upgrade $connection_upgrade { default upgrade; '' close; }
server {
  listen 80; server_name example.com;
  location / {
    proxy_pass http://127.0.0.1:3000;
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection $connection_upgrade;
    proxy_read_timeout 60s;
  }
}

Caddy (auto‑TLS + reverse proxy)

example.com {
  encode zstd gzip
  reverse_proxy 127.0.0.1:3000
  header {
    Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
    X-Content-Type-Options "nosniff"
    Referrer-Policy "no-referrer-when-downgrade"
  }
}

Cloudflare Real‑IP (Nginx)

# Pull latest ranges from Cloudflare docs periodically
set_real_ip_from 173.245.48.0/20;
set_real_ip_from 103.21.244.0/22;
# ... (other ranges)
real_ip_header CF-Connecting-IP;

systemd service (template)

[Unit]
Description=App Service
After=network.target

[Service]
User=app
EnvironmentFile=/etc/app/app.env
WorkingDirectory=/var/www/app
ExecStart=/usr/bin/node server.js
Restart=always
RestartSec=3
LimitNOFILE=64000

[Install]
WantedBy=multi-user.target

.env (example)

PORT=3000
NODE_ENV=production
MONGODB_URI=mongodb://appuser:STRONGPASS@127.0.0.1:27017/appdb?authSource=appdb
REDIS_URL=redis://127.0.0.1:6379
SESSION_SECRET=replace_me

13) 🗄️ Stack DB Connect Quick‑Refs (MongoDB)

Node.js (Mongoose)

const mongoose = require('mongoose');
mongoose.connect(process.env.MONGODB_URI, {
  maxPoolSize: 10, serverSelectionTimeoutMS: 5000
});

FastAPI (Motor)

import motor.motor_asyncio as mtr
client = mtr.AsyncIOMotorClient(os.getenv("MONGODB_URI"), maxPoolSize=10)
db = client.appdb

Django (Mongo/Alt‑ORM)

DATABASES={
  'default':{
    'ENGINE':'djongo',
    'NAME':'appdb',
    'CLIENT':{'host':os.environ.get('MONGODB_URI')}
  }
}

Rails (Mongoid)

production:
  clients:
    default:
      uri: <%= ENV['MONGODB_URI'] %>

Security notes

  • Auth enabled, local bind or IP allow‑list

  • UFW deny 27017 by default; allow only trusted IPs

  • Replica set for change streams/transactions when needed

  • Backups: daily mongodump + provider snapshots; test restores


✅ Conclusion / Next Steps

  • Apply the baseline & security hardening

  • Choose Nginx or Caddy per your complexity needs

  • Wire observability + backups from day one

  • Use the snippets to accelerate SSR, WebSockets, and Real‑IP correctness

  • Add the MongoDB pieces into each stack chapter


🔗 Related Articles (suggested)

  • Choosing Nginx vs Caddy for App Gateways (trade‑offs & configs)

  • Zero‑downtime Deployments (blue/green, canaries, rollbacks)

  • Prometheus, Grafana & Loki on a Single VPS (quick start)

  • Redis vs RabbitMQ for Queues (when to pick which)


Was this answer helpful?

« Back