My Isekai Journey
16 min readUpdated 2026-02-16

Running Enterprise Node.js Applications at Scale: Lessons from Production

Battle-tested strategies for building, deploying, and maintaining Node.js applications that handle millions of requests while staying reliable and maintainable.

Cover image for Running Enterprise Node.js Applications at Scale: Lessons from Production

The moment everything changed#

Our Node.js app was humming along nicely. 10k requests per minute, stable memory usage, happy users. Then we landed a major client. Overnight, traffic spiked to 500k requests per minute.

The app crashed. Hard.

That week taught me more about scalability than the previous three years combined. Let me share what I learned about running Node.js at enterprise scale.

Part 1: Foundation - Application Architecture#

The Cluster Module: Your First Line of Defense#

Node.js is single-threaded. On a server with 8 CPU cores, you're using 12.5% of capacity. Unacceptable.

// server.ts
import cluster from "cluster";
import os from "os";
import { createServer } from "./app";
 
const numCPUs = os.cpus().length;
 
if (cluster.isPrimary) {
  console.log(`Primary ${process.pid} is running`);
  console.log(`Forking ${numCPUs} workers...`);
 
  // Fork workers
  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }
 
  // Handle worker crashes
  cluster.on("exit", (worker, code, signal) => {
    console.log(`Worker ${worker.process.pid} died (${signal || code})`);
    console.log("Starting a new worker...");
    cluster.fork();
  });
 
  // Graceful shutdown
  process.on("SIGTERM", () => {
    console.log("SIGTERM received, shutting down gracefully");
 
    for (const id in cluster.workers) {
      cluster.workers[id]?.kill();
    }
  });
} else {
  // Workers share the same port
  const server = createServer();
 
  server.listen(3000, () => {
    console.log(`Worker ${process.pid} started`);
  });
 
  // Graceful shutdown for workers
  process.on("SIGTERM", () => {
    console.log(`Worker ${process.pid} shutting down...`);
    server.close(() => {
      process.exit(0);
    });
  });
}

But there's a better way: PM2

// ecosystem.config.js
module.exports = {
  apps: [
    {
      name: "api",
      script: "./dist/server.js",
      instances: "max", // Use all CPUs
      exec_mode: "cluster",
 
      // Memory management
      max_memory_restart: "512M",
 
      // Logging
      error_file: "./logs/error.log",
      out_file: "./logs/out.log",
      log_date_format: "YYYY-MM-DD HH:mm:ss Z",
 
      // Environment
      env: {
        NODE_ENV: "development",
      },
      env_production: {
        NODE_ENV: "production",
      },
 
      // Auto-restart on file changes (development)
      watch: false,
 
      // Graceful shutdown
      kill_timeout: 5000,
      wait_ready: true,
      listen_timeout: 10000,
    },
  ],
};
# Start with PM2
pm2 start ecosystem.config.js --env production
 
# Monitor
pm2 monit
 
# Logs
pm2 logs
 
# Zero-downtime reload
pm2 reload api

PM2 cluster mode visualization


Part 2: Request Handling - Making Every Millisecond Count#

Async Everything, Block Nothing#

// ❌ BAD: Blocking operations
import fs from "fs";
import { createHash } from "crypto";
 
app.get("/bad", (req, res) => {
  // Blocks the event loop!
  const data = fs.readFileSync("/large-file.json");
 
  // CPU-intensive operation
  const hash = createHash("sha256").update(data).digest("hex");
 
  res.json({ hash });
});
 
// ✅ GOOD: Non-blocking
import { readFile } from "fs/promises";
import { createHash } from "crypto";
import { Worker } from "worker_threads";
 
app.get("/good", async (req, res) => {
  try {
    // Non-blocking file read
    const data = await readFile("/large-file.json");
 
    // Offload CPU work to worker thread
    const hash = await hashInWorker(data);
 
    res.json({ hash });
  } catch (error) {
    res.status(500).json({ error: "Internal server error" });
  }
});
 
function hashInWorker(data: Buffer): Promise<string> {
  return new Promise((resolve, reject) => {
    const worker = new Worker("./hash-worker.js", {
      workerData: data,
    });
 
    worker.on("message", resolve);
    worker.on("error", reject);
    worker.on("exit", (code) => {
      if (code !== 0) {
        reject(new Error(`Worker stopped with exit code ${code}`));
      }
    });
  });
}

Worker thread for CPU-intensive tasks:

// hash-worker.js
import { parentPort, workerData } from "worker_threads";
import { createHash } from "crypto";
 
const hash = createHash("sha256").update(workerData).digest("hex");
 
parentPort?.postMessage(hash);

Connection Pooling: Don't Create, Reuse#

// lib/db.ts
import { Pool } from "pg";
 
// ❌ BAD: New connection per request
export async function queryBad(sql: string) {
  const client = new Client(dbConfig);
  await client.connect();
  const result = await client.query(sql);
  await client.end();
  return result;
}
 
// ✅ GOOD: Connection pool
const pool = new Pool({
  host: process.env.DB_HOST,
  port: Number(process.env.DB_PORT),
  database: process.env.DB_NAME,
  user: process.env.DB_USER,
  password: process.env.DB_PASSWORD,
 
  // Pool configuration
  max: 20, // Maximum pool size
  min: 5, // Minimum pool size
  idleTimeoutMillis: 30000, // Close idle connections after 30s
  connectionTimeoutMillis: 2000, // Timeout if all connections busy
 
  // Performance
  statement_timeout: 30000, // Query timeout
  query_timeout: 30000,
});
 
export async function query(sql: string, params?: any[]) {
  const client = await pool.connect();
 
  try {
    return await client.query(sql, params);
  } finally {
    client.release(); // Return to pool
  }
}
 
// Monitor pool health
pool.on("connect", () => {
  console.log("New client connected to pool");
});
 
pool.on("error", (err) => {
  console.error("Unexpected error on idle client", err);
});
 
// Graceful shutdown
export async function closePool() {
  await pool.end();
  console.log("Database pool closed");
}

Caching: The Ultimate Performance Multiplier#

// lib/cache.ts
import Redis from "ioredis";
 
const redis = new Redis({
  host: process.env.REDIS_HOST,
  port: Number(process.env.REDIS_PORT),
  password: process.env.REDIS_PASSWORD,
 
  // Connection pool
  maxRetriesPerRequest: 3,
  enableReadyCheck: true,
  retryStrategy: (times) => {
    const delay = Math.min(times * 50, 2000);
    return delay;
  },
});
 
export async function cached<T>(
  key: string,
  ttlSeconds: number,
  fn: () => Promise<T>,
): Promise<T> {
  // Try cache first
  const cached = await redis.get(key);
 
  if (cached) {
    return JSON.parse(cached);
  }
 
  // Cache miss, compute value
  const value = await fn();
 
  // Store in cache
  await redis.setex(key, ttlSeconds, JSON.stringify(value));
 
  return value;
}
 
export async function invalidate(pattern: string) {
  const keys = await redis.keys(pattern);
 
  if (keys.length > 0) {
    await redis.del(...keys);
  }
}
 
// Usage
app.get("/api/posts/:slug", async (req, res) => {
  const post = await cached(
    `post:${req.params.slug}`,
    3600, // 1 hour
    async () => {
      return db.post.findUnique({
        where: { slug: req.params.slug },
        include: { author: true, tags: true },
      });
    },
  );
 
  if (!post) {
    return res.status(404).json({ error: "Not found" });
  }
 
  res.json(post);
});
 
// Invalidate on update
app.put("/api/posts/:slug", async (req, res) => {
  await db.post.update({
    where: { slug: req.params.slug },
    data: req.body,
  });
 
  // Clear cache
  await invalidate(`post:${req.params.slug}`);
 
  res.json({ success: true });
});

Multi-level caching:

// lib/multi-cache.ts
import NodeCache from "node-cache";
 
// L1: In-memory cache (fastest, per-instance)
const memoryCache = new NodeCache({
  stdTTL: 60, // 1 minute default
  checkperiod: 120, // Cleanup interval
  useClones: false, // Don't clone objects (faster)
});
 
// L2: Redis (shared across instances)
import { redis } from "./cache";
 
export async function multiCached<T>(
  key: string,
  ttl: { memory: number; redis: number },
  fn: () => Promise<T>,
): Promise<T> {
  // L1: Check memory
  const memCached = memoryCache.get<T>(key);
  if (memCached) {
    return memCached;
  }
 
  // L2: Check Redis
  const redisCached = await redis.get(key);
  if (redisCached) {
    const value = JSON.parse(redisCached);
    memoryCache.set(key, value, ttl.memory);
    return value;
  }
 
  // Cache miss: compute
  const value = await fn();
 
  // Store in both caches
  memoryCache.set(key, value, ttl.memory);
  await redis.setex(key, ttl.redis, JSON.stringify(value));
 
  return value;
}

Part 3: Error Handling - When Things Go Wrong#

Comprehensive error handling#

// lib/error-handler.ts
import { Request, Response, NextFunction } from "express";
 
export class AppError extends Error {
  constructor(
    public statusCode: number,
    public message: string,
    public isOperational: boolean = true,
  ) {
    super(message);
    Error.captureStackTrace(this, this.constructor);
  }
}
 
// Global error handler
export function errorHandler(
  err: Error,
  req: Request,
  res: Response,
  next: NextFunction,
) {
  // Log error
  console.error("Error:", {
    message: err.message,
    stack: err.stack,
    url: req.url,
    method: req.method,
    ip: req.ip,
  });
 
  // Send to error tracking service
  if (process.env.NODE_ENV === "production") {
    // Sentry, Datadog, etc.
    trackError(err, {
      user: req.user?.id,
      url: req.url,
      method: req.method,
    });
  }
 
  // Operational errors (expected)
  if (err instanceof AppError && err.isOperational) {
    return res.status(err.statusCode).json({
      error: err.message,
    });
  }
 
  // Programming errors (unexpected)
  res.status(500).json({
    error:
      process.env.NODE_ENV === "production"
        ? "Internal server error"
        : err.message,
  });
}
 
// Async wrapper to catch promise rejections
export function asyncHandler(
  fn: (req: Request, res: Response, next: NextFunction) => Promise<any>,
) {
  return (req: Request, res: Response, next: NextFunction) => {
    Promise.resolve(fn(req, res, next)).catch(next);
  };
}
 
// Usage
app.get(
  "/api/posts/:id",
  asyncHandler(async (req, res) => {
    const post = await db.post.findUnique({
      where: { id: req.params.id },
    });
 
    if (!post) {
      throw new AppError(404, "Post not found");
    }
 
    res.json(post);
  }),
);
 
// Apply error handler (must be last!)
app.use(errorHandler);

Circuit breaker pattern#

// lib/circuit-breaker.ts
export class CircuitBreaker {
  private failures = 0;
  private lastFailureTime = 0;
  private state: "closed" | "open" | "half-open" = "closed";
 
  constructor(
    private threshold: number = 5,
    private timeout: number = 60000, // 1 minute
    private resetTimeout: number = 30000, // 30 seconds
  ) {}
 
  async execute<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === "open") {
      const now = Date.now();
 
      if (now - this.lastFailureTime >= this.resetTimeout) {
        this.state = "half-open";
        console.log("Circuit breaker: moving to half-open");
      } else {
        throw new Error("Circuit breaker is open");
      }
    }
 
    try {
      const result = await fn();
 
      if (this.state === "half-open") {
        this.reset();
      }
 
      return result;
    } catch (error) {
      this.recordFailure();
      throw error;
    }
  }
 
  private recordFailure() {
    this.failures++;
    this.lastFailureTime = Date.now();
 
    if (this.failures >= this.threshold) {
      this.state = "open";
      console.log("Circuit breaker: opened after", this.failures, "failures");
    }
  }
 
  private reset() {
    this.failures = 0;
    this.state = "closed";
    console.log("Circuit breaker: reset to closed");
  }
}
 
// Usage with external API
const apiCircuitBreaker = new CircuitBreaker(5, 60000, 30000);
 
async function callExternalAPI(data: any) {
  return apiCircuitBreaker.execute(async () => {
    const response = await fetch("https://external-api.com/endpoint", {
      method: "POST",
      body: JSON.stringify(data),
      headers: { "Content-Type": "application/json" },
    });
 
    if (!response.ok) {
      throw new Error(`API error: ${response.status}`);
    }
 
    return response.json();
  });
}

Circuit breaker states diagram


Part 4: Monitoring & Observability#

Structured logging#

// lib/logger.ts
import winston from "winston";
 
const logger = winston.createLogger({
  level: process.env.LOG_LEVEL || "info",
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.errors({ stack: true }),
    winston.format.json(),
  ),
  defaultMeta: {
    service: "api",
    environment: process.env.NODE_ENV,
  },
  transports: [
    new winston.transports.File({
      filename: "logs/error.log",
      level: "error",
      maxsize: 10485760, // 10MB
      maxFiles: 5,
    }),
    new winston.transports.File({
      filename: "logs/combined.log",
      maxsize: 10485760,
      maxFiles: 5,
    }),
  ],
});
 
// Console in development
if (process.env.NODE_ENV !== "production") {
  logger.add(
    new winston.transports.Console({
      format: winston.format.combine(
        winston.format.colorize(),
        winston.format.simple(),
      ),
    }),
  );
}
 
export default logger;
 
// Usage
logger.info("User logged in", { userId: "123", ip: "1.2.3.4" });
logger.error("Database connection failed", { error: err.message });
logger.warn("High memory usage", { usage: process.memoryUsage() });

Request tracing#

// middleware/tracing.ts
import { v4 as uuidv4 } from "uuid";
import { Request, Response, NextFunction } from "express";
 
export function tracingMiddleware(
  req: Request,
  res: Response,
  next: NextFunction,
) {
  // Generate unique request ID
  const requestId = (req.headers["x-request-id"] as string) || uuidv4();
  req.id = requestId;
 
  // Add to all log statements
  req.log = logger.child({ requestId });
 
  // Track timing
  const start = Date.now();
 
  res.on("finish", () => {
    const duration = Date.now() - start;
 
    req.log.info("Request completed", {
      method: req.method,
      url: req.url,
      statusCode: res.statusCode,
      duration,
      userAgent: req.headers["user-agent"],
      ip: req.ip,
    });
 
    // Send metrics
    metrics.recordHttpRequest({
      method: req.method,
      path: req.route?.path || req.url,
      statusCode: res.statusCode,
      duration,
    });
  });
 
  next();
}

Metrics collection#

// lib/metrics.ts
import { Counter, Histogram, Registry } from "prom-client";
 
const registry = new Registry();
 
// HTTP request counter
const httpRequestCounter = new Counter({
  name: "http_requests_total",
  help: "Total number of HTTP requests",
  labelNames: ["method", "path", "status"],
  registers: [registry],
});
 
// HTTP request duration
const httpRequestDuration = new Histogram({
  name: "http_request_duration_ms",
  help: "Duration of HTTP requests in ms",
  labelNames: ["method", "path", "status"],
  buckets: [10, 50, 100, 200, 500, 1000, 2000, 5000],
  registers: [registry],
});
 
// Database query duration
const dbQueryDuration = new Histogram({
  name: "db_query_duration_ms",
  help: "Duration of database queries in ms",
  labelNames: ["operation"],
  buckets: [1, 5, 10, 25, 50, 100, 250, 500, 1000],
  registers: [registry],
});
 
export const metrics = {
  recordHttpRequest(data: {
    method: string;
    path: string;
    statusCode: number;
    duration: number;
  }) {
    const labels = {
      method: data.method,
      path: data.path,
      status: data.statusCode.toString(),
    };
 
    httpRequestCounter.inc(labels);
    httpRequestDuration.observe(labels, data.duration);
  },
 
  recordDbQuery(operation: string, duration: number) {
    dbQueryDuration.observe({ operation }, duration);
  },
 
  getRegistry() {
    return registry;
  },
};
 
// Expose metrics endpoint
app.get("/metrics", async (req, res) => {
  res.set("Content-Type", registry.contentType);
  res.end(await registry.metrics());
});

Health checks#

// routes/health.ts
import { Router } from "express";
import { pool } from "@/lib/db";
import { redis } from "@/lib/cache";
 
const router = Router();
 
// Liveness probe (is the app running?)
router.get("/health/live", (req, res) => {
  res.json({ status: "ok" });
});
 
// Readiness probe (is the app ready to serve traffic?)
router.get("/health/ready", async (req, res) => {
  const checks = {
    database: await checkDatabase(),
    redis: await checkRedis(),
  };
 
  const allHealthy = Object.values(checks).every((c) => c.healthy);
 
  res.status(allHealthy ? 200 : 503).json({
    status: allHealthy ? "ready" : "not ready",
    checks,
    timestamp: new Date().toISOString(),
  });
});
 
async function checkDatabase(): Promise<{
  healthy: boolean;
  latency?: number;
}> {
  const start = Date.now();
 
  try {
    await pool.query("SELECT 1");
    return { healthy: true, latency: Date.now() - start };
  } catch (error) {
    return { healthy: false };
  }
}
 
async function checkRedis(): Promise<{ healthy: boolean; latency?: number }> {
  const start = Date.now();
 
  try {
    await redis.ping();
    return { healthy: true, latency: Date.now() - start };
  } catch (error) {
    return { healthy: false };
  }
}
 
export default router;

Part 5: Deployment Strategies#

Zero-downtime deployments#

// server.ts
import express from "express";
import { createServer } from "http";
 
const app = express();
const server = createServer(app);
 
// Track active connections
let connections = new Set();
 
server.on("connection", (conn) => {
  connections.add(conn);
 
  conn.on("close", () => {
    connections.delete(conn);
  });
});
 
// Graceful shutdown
function gracefulShutdown(signal: string) {
  console.log(`${signal} received, shutting down gracefully`);
 
  // Stop accepting new connections
  server.close(() => {
    console.log("HTTP server closed");
 
    // Close database connections
    pool.end().then(() => {
      console.log("Database pool closed");
      process.exit(0);
    });
  });
 
  // Force close after timeout
  setTimeout(() => {
    console.error("Forced shutdown after timeout");
    connections.forEach((conn) => conn.destroy());
    process.exit(1);
  }, 30000); // 30 seconds
}
 
process.on("SIGTERM", () => gracefulShutdown("SIGTERM"));
process.on("SIGINT", () => gracefulShutdown("SIGINT"));
 
const PORT = process.env.PORT || 3000;
 
server.listen(PORT, () => {
  console.log(`Server running on port ${PORT}`);
 
  // Signal PM2 that app is ready
  if (process.send) {
    process.send("ready");
  }
});

Docker multi-stage build#

# Dockerfile
# Stage 1: Build
FROM node:20-alpine AS builder
 
WORKDIR /app
 
# Copy package files
COPY package*.json ./
COPY tsconfig.json ./
 
# Install dependencies
RUN npm ci --only=production && \
    npm cache clean --force
 
# Copy source
COPY src ./src
 
# Build TypeScript
RUN npm run build
 
# Stage 2: Production
FROM node:20-alpine
 
WORKDIR /app
 
# Security: Run as non-root
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nodejs -u 1001
 
# Copy only necessary files from builder
COPY --from=builder --chown=nodejs:nodejs /app/node_modules ./node_modules
COPY --from=builder --chown=nodejs:nodejs /app/dist ./dist
COPY --from=builder --chown=nodejs:nodejs /app/package*.json ./
 
USER nodejs
 
EXPOSE 3000
 
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=40s \
  CMD node -e "require('http').get('http://localhost:3000/health/live', (r) => process.exit(r.statusCode === 200 ? 0 : 1))"
 
CMD ["node", "dist/server.js"]

Kubernetes deployment#

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
  labels:
    app: api
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      containers:
        - name: api
          image: myregistry/api:latest
          ports:
            - containerPort: 3000
          env:
            - name: NODE_ENV
              value: production
            - name: DB_HOST
              valueFrom:
                secretKeyRef:
                  name: db-credentials
                  key: host
          resources:
            requests:
              cpu: 100m
              memory: 256Mi
            limits:
              cpu: 500m
              memory: 512Mi
          livenessProbe:
            httpGet:
              path: /health/live
              port: 3000
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 3000
            initialDelaySeconds: 10
            periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: api
spec:
  selector:
    app: api
  ports:
    - port: 80
      targetPort: 3000
  type: LoadBalancer
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Kubernetes deployment architecture


Part 6: Performance Optimization#

Memory management#

// Monitor memory usage
setInterval(() => {
  const usage = process.memoryUsage();
 
  logger.info("Memory usage", {
    rss: `${Math.round(usage.rss / 1024 / 1024)}MB`,
    heapTotal: `${Math.round(usage.heapTotal / 1024 / 1024)}MB`,
    heapUsed: `${Math.round(usage.heapUsed / 1024 / 1024)}MB`,
    external: `${Math.round(usage.external / 1024 / 1024)}MB`,
  });
 
  // Alert if memory usage is high
  if (usage.heapUsed / usage.heapTotal > 0.9) {
    logger.warn("High memory usage detected");
  }
}, 60000); // Every minute
 
// Force garbage collection in development
if (process.env.NODE_ENV === "development" && global.gc) {
  setInterval(() => {
    global.gc();
    logger.debug("Manual garbage collection triggered");
  }, 300000); // Every 5 minutes
}

Stream large responses#

// ❌ BAD: Load entire file into memory
app.get("/download", async (req, res) => {
  const data = await readFile("/large-file.zip");
  res.send(data); // OOM for large files!
});
 
// ✅ GOOD: Stream the file
import { createReadStream } from "fs";
 
app.get("/download", (req, res) => {
  const stream = createReadStream("/large-file.zip");
 
  res.setHeader("Content-Type", "application/zip");
  res.setHeader("Content-Disposition", "attachment; filename=file.zip");
 
  stream.pipe(res);
 
  stream.on("error", (err) => {
    logger.error("Stream error", { error: err.message });
    res.status(500).end();
  });
});
 
// Streaming database results
app.get("/export", async (req, res) => {
  res.setHeader("Content-Type", "text/csv");
  res.setHeader("Content-Disposition", "attachment; filename=export.csv");
 
  // Write CSV header
  res.write("id,name,email\n");
 
  // Stream results
  const cursor = db.user.findMany({
    select: { id: true, name: true, email: true },
  });
 
  for await (const user of cursor) {
    res.write(`${user.id},${user.name},${user.email}\n`);
  }
 
  res.end();
});

Request payload limits#

import express from "express";
 
const app = express();
 
// Prevent large payloads from overwhelming the server
app.use(express.json({ limit: "10mb" }));
app.use(express.urlencoded({ extended: true, limit: "10mb" }));
 
// Rate limiting
import rateLimit from "express-rate-limit";
 
const limiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 100, // Limit each IP to 100 requests per window
  message: "Too many requests from this IP",
  standardHeaders: true,
  legacyHeaders: false,
});
 
app.use("/api/", limiter);

Part 7: Security Hardening#

// middleware/security.ts
import helmet from "helmet";
import cors from "cors";
 
// Security headers
app.use(
  helmet({
    contentSecurityPolicy: {
      directives: {
        defaultSrc: ["'self'"],
        styleSrc: ["'self'", "'unsafe-inline'"],
        scriptSrc: ["'self'"],
        imgSrc: ["'self'", "data:", "https:"],
      },
    },
    hsts: {
      maxAge: 31536000,
      includeSubDomains: true,
      preload: true,
    },
  }),
);
 
// CORS
app.use(
  cors({
    origin: process.env.ALLOWED_ORIGINS?.split(",") || [],
    credentials: true,
    maxAge: 86400, // 24 hours
  }),
);
 
// Request sanitization
import mongoSanitize from "express-mongo-sanitize";
import xss from "xss-clean";
 
app.use(mongoSanitize()); // Prevent NoSQL injection
app.use(xss()); // Prevent XSS
 
// Input validation
import { body, validationResult } from "express-validator";
 
app.post(
  "/api/users",
  body("email").isEmail().normalizeEmail(),
  body("password").isLength({ min: 12 }),
  body("name").trim().isLength({ min: 1, max: 100 }),
  async (req, res) => {
    const errors = validationResult(req);
 
    if (!errors.isEmpty()) {
      return res.status(400).json({ errors: errors.array() });
    }
 
    // Process request...
  },
);

The Enterprise Checklist#

Architecture

  • Multi-process with cluster/PM2
  • Load balancing
  • Service discovery
  • Circuit breakers

Performance

  • Connection pooling
  • Multi-level caching
  • Async everywhere
  • Streaming for large data

Reliability

  • Graceful shutdown
  • Health checks
  • Error boundaries
  • Automatic restarts

Observability

  • Structured logging
  • Request tracing
  • Metrics collection
  • APM integration

Security

  • Input validation
  • Rate limiting
  • Security headers
  • Secrets management

Deployment

  • Zero-downtime deploys
  • Blue-green deployments
  • Auto-scaling
  • Rollback strategy

The Real Numbers#

After implementing these patterns, here's what we saw:

  • Throughput: 10k → 800k requests/min
  • P99 latency: 850ms → 120ms
  • Memory usage: 512MB → 280MB per instance
  • Crash rate: 3-4/day → 0 (zero crashes in 90 days)
  • MTTR (Mean Time To Recovery): 15min → 2min

Key Takeaways#

  1. Think in processes: One Node process ≠ one server
  2. Async is non-negotiable: Block the event loop = game over
  3. Cache aggressively: Memory > Redis > Database > External API
  4. Monitor everything: You can't improve what you don't measure
  5. Design for failure: Services will fail. Handle it gracefully.

Running Node.js at scale isn't magic. It's understanding the runtime, applying proven patterns, and monitoring relentlessly.

What's your biggest Node.js scaling challenge? Let's solve it together.