Scaling Serverless Architecture for Enterprise Automation
Marcus Rivera
Principal Engineer

Inside story of how NukeSend’s SignalCore platform handles 2.8B events/day with 99.99% uptime on AWS Lambda, saving 62% infra cost and enabling sub-100ms p99 latency for real-time email personalization.
Introduction
Black-Friday 2023: NukeSend sent 4.3M emails in 5 minutes, peaking at 860K events/second. SignalCore—our fully serverless event mesh—scaled from 8K to 1.2M concurrent Lambda executions without pre-warming, maintained p99 latency 87ms, and cost 62% less than the previous container-based ECS stack. This post dissects the architectural decisions, cold-start mitigation, and cost-optimization tricks that make enterprise-grade serverless viable at planet scale.
Architecture Overview
Event-driven mesh: (1) API-Gateway → Lambda authorizer (JWT + mTLS) → Kinesis Data Streams (1,000 shards auto-scaled via On-Demand); (2) Lambda@Edge for geo personalization; (3) EventBridge for cross-service choreography; (4) Step-Functions Express for sagas under 5 minutes; (5) DynamoDB on-demand for idempotency; (6) S3 Object Lambda for dynamic image resizing. All functions packaged as 10 MB ARM64 custom runtimes using Rust + Tokio for minimal cold-start.
Cold Start Elimination
Provisioned Concurrency reserved only for auth and billing functions (2% of total). Remaining functions warmed via scheduled EventBridge rule every 4 minutes invoking dummy ping with 128 MB memory to stay in microVM pool. SnapStart enabled for Java-based ML inference functions, cutting init duration from 3.2s to 480ms. Lambda layers pre-load 180MB shared libs (OpenCV, Torch) reducing package size 64%.
Cost Optimization
Compute Savings Plans cover 70% baseline; on-demand spikes billed per ms. Tiered memory tuning: I/O-bound functions run 256MB ($0.0000000021/ms) while CPU-bound ML inference uses 3GB but finishes 5× faster—net cost 38% lower. S3 Intelligent-Tiering moves 82% of attachment objects to Glacier within 30 days, saving $48K/month. Step-Functions Express workflows under 5s cost $0.000025 per execution vs Standard ($0.00025), a 90% reduction for high-volume paths.
Observability
OpenTelemetry instrumentation exports traces to AWS X-Ray with 0.1% sampling for cost control. Embedded metrics format (EMF) sends custom CloudWatch metrics via stdout—no API calls, zero extra cost. Alarms wired to PagerDuty with automatic rollback via CloudFormation stack-sets when p99 latency >200ms or error-rate >1%. Distributed tracing correlation ID propagates through 12 services, enabling end-to-mail-click visibility.
Security
IAM access-analyzer validates every policy change; functions have single-table DynamoDB access only. Secrets stored in Parameter Store with 3-hour TTL enforced via Lambda extension that rotates JWT signing keys. Network isolation via VPC Lattice—functions have no NAT gateway; egress allowed only to AWS services via PrivateLink. Code signing with AWS Signer ensures only trusted artifacts deploy; pipeline blocked if CVE score >7.
Performance Benchmarks
Black-Friday peak: 2.8B events, 1.2M concurrent Lambda, zero throttling. p50 latency 27ms, p99 87ms. Cost per 1M emails dropped from $18.40 (ECS Fargate) to $6.95 (serverless). Cold-start ratio <0.3% during spike. Zero incidents recorded, achieving 99.99% SLA.
Future Work
Migrate long-running ML retraining jobs to AWS Batch with Spot for 70% savings, pilot Lambda Powertools for Rust for even faster cold-starts, and evaluate AWS Lambda SnapStart for Python to break the 100ms barrier.