Service catalog

Fractional senior cloud architect — on call for the hard parts.

Most engagements are recurring: a few hours a week, scoped working sessions, and written records so context survives. Some are fixed projects — migrations, buildouts, audit prep. Either way, scope is agreed in writing before any billable work starts.

Service 01

AWS architecture & buildouts

Greenfield and retrofit cloud designs that the team on the ground can actually operate. VPCs, IAM Identity Center, bastion + SSM Session Manager, RDS, ALB/NLB, CloudFront, Route 53 — chosen and sized against the workload, not a generic reference architecture.

Typical scope

  • VPC + subnet + security group design
  • IAM Identity Center / SSO with custom permission sets
  • Bastion + multi-key SSH or SSM-only access
  • RDS Multi-AZ, encryption, parameter groups
  • ALB / NLB with health checks and rolling-patch target groups

What you get

  • Decision log of trade-offs considered
  • Runbook covering day-two operations
  • Backup, restore, and rollback procedures
  • Cost model with forward projection
Service 02

Cost optimization reviews

A structured pass through your AWS bill, compute inventory, storage classes, backup retention, data-transfer paths, and unused resources. Findings are prioritized by monthly impact and ease of change — no "turn everything off" advice, no ten-page PDFs that end in a shrug.

Typical scope

  • EC2 and RDS right-sizing against real CloudWatch utilization
  • Cross-region backup retention review
  • Unused ELBs, NAT gateways, EIPs, snapshots, volumes
  • S3 lifecycle & storage class review
  • Reserved / Savings Plan fit analysis

What you get

  • Prioritized recommendation list with $/mo impact
  • One-click CLI commands to enact safe reductions
  • Executed changes (if scoped in)
  • Follow-up check 30 days after changes
Service 03

Incident response & remediation

Something went sideways. TLS failed to renew; a CloudWatch Agent pinned a production box to 100% CPU; GuardDuty stopped notifying; a default-VPC security group opened everything to the world. Short-cycle engagement to stabilize, document, and hand back with a fix that sticks.

Recent examples (generalized)

  • ACM renewal failure on a production telemetry stack
  • CloudWatch Agent CPU runaway driven from 100% to 7.4%
  • GuardDuty → SNS → email/Teams pipeline restored
  • Permissive default-VPC security group locked down
  • NAT / SSM access restored on a quarantined EC2

What you get

  • Stabilization in hours, not weeks
  • Root cause written down, not guessed at
  • Prevention checklist for the same class of failure
  • Optional: residual monitoring to catch recurrence
Service 04

Observability engineering

Alerts you actually want to receive. CloudWatch metric + log pipelines, alarms tuned with multi-datapoint thresholds, AWS Managed Grafana dashboards built end-to-end, status pages for external users, Windows CWAgent configured without the non-obvious gotchas.

Typical scope

  • CloudWatch metric filters, log groups, alarm families
  • AWS Managed Grafana dashboards (Windows, Linux, SQL, IIS)
  • Alarm categorization framework (page / ticket / FYI)
  • Cronitor or similar public status pages
  • Alert routing: Chatbot, Teams, Slack, email

What you get

  • Dashboards the team will open during an incident
  • Documented alarm thresholds and why they're set that way
  • Runbooks linked from alarms
  • A reduction in pager noise, measured
Service 05

Compliance readiness

SOC 2 and ISO audit support without gold-plating. Stale IAM users and keys audited out; GuardDuty, Security Hub, and VPC Flow Logs producing evidence that flows into your ticketing tool; MFA and SSO enforced; permissive access closed. Auditor-ready without the six-figure GRC consultancy.

Typical scope

  • IAM audit: stale users, unrotated keys, missing MFA
  • VPC Flow Logs + CloudTrail retention
  • Security Hub → ticketing Lambda with checkpointing
  • GuardDuty delegated admin + SNS pipeline
  • Evidence-pipeline diagrams for auditors

What you get

  • Findings burndown with owners and dates
  • Evidence trails your auditor can verify
  • Ongoing monitoring so drift is caught early
  • Partnership through the external audit itself
Service 06

Database & platform upgrades

RDS major-version upgrades (MySQL, PostgreSQL) planned ahead of end-of-life. Aurora writer/reader workload diagnosis. Ignition SCADA Cloud Edition in-place upgrades with gateway backup and rollback paths. Postgres schema design with real constraint and index discipline — reviewed by multiple LLMs cross-checking each other before anything lands in production.

Typical scope

  • RDS MySQL → newer major; PostgreSQL major upgrades
  • Aurora MySQL workload diagnosis (writer vs reader split)
  • Ignition Cloud Edition in-place upgrades + gateway backup
  • Schema design: FKs, check constraints, triggers, indexes
  • Migration from SQL Server to PostgreSQL on AWS

What you get

  • Upgrade plan with pre-flight checks and rollback
  • Tested restore from backup before the upgrade window
  • Post-upgrade verification report
  • Forward-looking runbook for the next major version

Scope is agreed in writing before any billable hour.

Step 1

Discovery call

30–45 minutes. Free. You describe what you have and what's hurting. I leave you with a written read on what's worth doing next — even if we don't end up working together.

Step 2

Scoped SOW

A short written statement of work: outcomes, boundaries, rate, and cadence. You approve before anything starts. Amendments are written too.

Step 3

Recurring or project

Ongoing retainers typically run a few hours a week with fixed working sessions. Fixed projects run to a defined finish line with written deliverables.

Ready to scope something?

Send what you have — current setup, what's breaking, what you're hoping to fix. A short message is enough to start.

Start a conversation