WhiteCliff Studio: Architecture of a One-Person Production Platform

This is a write-up of the systems behind WhiteCliff Studio, a small home-decor e-commerce operation I run on the side. The interesting part isn't the storefront — it's the platform underneath. I built and operate the whole stack myself, on my own AWS account, with my own money on the line.

What it actually does

Strip the business away and the platform is doing four jobs:

Pull data from ~10 third-party sources every few hours into a single store
Reconcile that data with Shopify — pricing, inventory, MAP enforcement, discontinuations
Generate content (captions, room scenes, Pinterest pins, blog posts) and publish to social platforms on a schedule
Render dashboards for me to actually look at what's happening

That's it. Four jobs, ~110 Lambda functions, three deployable services, one person on call.

Repository topology

Three repos, each a separately deployable Serverless Framework workload.

Repo	Functions	What it owns
`wcs-seo-dashboard`	67	Business operations: React dashboard, content automation, pricing intel, analytics, social publishing
`wcs-data-collector`	13	Daily metric collection from GSC, Shopify, GA4, Pinterest, Instagram, Google Ads → DynamoDB
`whitecliff`	30	Inventory sync: vendor feed ingestion, MAP enforcement, discontinuation pipeline → Shopify

The split isn't arbitrary. Each one runs on a different cadence (collection daily, dashboard every 4 hours, inventory event-driven), gets touched by different concerns when it breaks, and changes deploy independently so a bad inventory deploy can't take the dashboard down.

Stack topology (inside `wcs-seo-dashboard`)

wcs-seo-dashboard is split further into three CloudFormation stacks sharing one git repo:

Stack	Functions	~CF resources	Purpose
`wcs-seo-dashboard-prod` (main / core)	35	~270 / 500	Shared infra (DynamoDB, S3, CloudFront, WAF), data pipeline, blog, pricing
`wcs-dashboard-social-prod`	24	~170 / 500	Content generation + social publishing
`wcs-dashboard-analytics-prod`	8	~60 / 500	Alerts, metric thresholds, reporting

The split exists because CloudFormation has a 500-resource hard limit per stack and each Lambda + HTTP event lands around 8 resources. I wrote up the math and the migration in a separate post.

The core stack owns all the shared AWS primitives (DynamoDB table, S3 buckets, CloudFront, alarms SNS topic). The other stacks reference those via CloudFormation exports rather than redefining them. Cross-stack deploys go analytics → social → main so the main stack can resolve exports during update.

Data pipeline

A scheduled Lambda runs every 4 hours and collects from:

Google Search Console (queries, pages, impressions, clicks)
Shopify GraphQL (orders, products, inventory)
GA4 (sessions, conversions, attribution)
Pinterest (organic pin metrics + Pinterest Ads)
Instagram (post metrics)
Google Ads (campaign metrics)
Google Merchant Center (price benchmarks)
Google Sheets (manual entry / overrides)
OpenAI (content generation outputs)
Google Calendar (content scheduling)

Each collector is idempotent — the same window can be re-pulled without writing duplicates. Results land in a single DynamoDB metrics table (partitioned by source) and a snapshot S3 JSON object the frontend consumes statically.

The single-table design is deliberate. With 10 sources and one operator, I want one place to look when something is off. Cross-source queries (margin = revenue from Shopify minus ad spend from Google Ads) happen at read time in the Lambda or in the dashboard, not in pre-joined materialized views I'd have to keep in sync.

Front end

React + TypeScript + Vite, built to a static bundle and dropped on S3 behind CloudFront with a WAF. No backend-for-frontend layer in between. Two read patterns:

Static JSON — most views fetch ${DATA_BASE_URL}/file.json from CloudFront. Pre-computed by the pipeline. Fast, cheap, can serve a recruiter from cache.
API calls — interactive features (approving content, applying prices, marking vendor items reviewed) hit API Gateway → Lambda → DynamoDB.

The frontend doesn't know it's talking to ~100 Lambdas. It sees ~30 endpoints and a flat JSON tree on S3.

Inventory sync (the part that breaks at 3 a.m.)

Vendor feeds drop into S3 daily. An ingest Lambda parses them, fans out work via SQS to a small fleet of workers. Workers reconcile with Shopify via GraphQL, applying:

Quantity sync — push vendor on-hand to Shopify inventory levels
Pricing logic — cost + margin + competitor benchmark → recommended price, with manual override windows
MAP enforcement — flag products where Shopify price violates Minimum Advertised Price
Discontinuations — products that drop out of two consecutive feeds enter a review queue, not auto-delete

DynamoDB streams trigger downstream side effects (Shopify metafield updates, content queue invalidation) so the ingestion Lambda stays linear and fast. Every worker is idempotent and uses optimistic concurrency on the Shopify side — Shopify's API throttles aggressively, so retries matter.

Content automation

The content pipeline is where AI lives. OpenAI generates captions, blog drafts, room-scene descriptions, and pin variants. Each generation is queued for human approval before publishing — I'd rather have one bad post than spend a Saturday cleaning up ten.

Scheduled publishing uses EventBridge with per-post scheduled rules for exact-time execution, plus a rate(6 hours) fallback that catches missed posts. The fallback is the safety net; the exact-time rule is the optimization. Both write to the same idempotent publisher so duplicates can't happen.

Pinterest and Instagram are the two output platforms. Pinterest pin analytics get refreshed every 6 hours back into the metrics table, which closes the loop on "what worked."

Observability

Boring on purpose:

CloudWatch alarms on every Lambda error rate + duration
SNS topic receiving all alarms, fanned out to Slack (one channel) and email (severity-filtered)
Structured JSON logs across all services, queryable via CloudWatch Insights
Health endpoint the data pipeline pings on every run; if it stops, I get paged

There's no Grafana, no Datadog, no dedicated observability platform. With ~110 functions and one human, the alarm budget is "I should be able to read a quarter's worth of alerts in 10 minutes." That ruled out anything with dashboards I'd have to actively look at.

Secrets, deploys, and the GitHub Actions seam

All secrets in SSM Parameter Store. No .env files in any repo. Lambdas pull at cold start, cache for the duration.
GitHub Actions runs tests, AI-assisted code review, and deploy-on-merge-to-main. Each service deploys independently; only the changed service rebuilds.
Sharp has a known platform issue where local npm install on macOS breaks Lambda runtime; CI handles that by installing the Linux x64 variant explicitly. (I learned this the hard way.)

Cost shape

The platform is fully serverless and the workload is bursty (4-hour pipeline, event-driven inventory, intermittent dashboard use). The whole thing runs comfortably inside AWS Free Tier–adjacent costs — Lambda invocations and DynamoDB on-demand reads are the dominant line items, and even those are dollars per month, not hundreds. Static S3 + CloudFront for the dashboard is essentially free.

If this was a 24/7 high-throughput app, the math flips. For an ops platform you mostly read, serverless is the right answer.

What I'd do differently

A few things, in increasing order of "I should fix this":

Earlier service-domain split. I waited too long. The main stack got close to the 500-resource limit before I split out analytics and social. Splitting under pressure is worse than splitting when you have time.
Less DynamoDB in the dashboard's read path. Static JSON on CloudFront is the right default. I still have a few endpoints that hit DynamoDB on every page view, and they're slower than they should be.
A real feature-flag system. I use simple env-var flags. For solo dev that's fine; the moment there are two of us it isn't.

What this isn't

It isn't a high-traffic system. It isn't multi-region. It isn't designed for sub-second p99 across millions of requests per minute.

It's designed for a one-person operations team to run a real business on top of, without staffing up to keep the lights on. That's a different optimization target than what most "AWS reference architecture" blog posts imagine, and I think it's a more interesting one.

The platform engineering for a team of one is its own discipline. Most of what I've written here is just disciplined application of the same patterns I run in my day job — serverless, idempotent workers, declarative infra, observability that pages on the right thing — at a scale where I can keep all of it in my head.

That's the part I think transfers.