Technical Architecture Document

MotoPartPicker — System Architecture

Full-stack architecture for a motorcycle parts compatibility platform. Designed for simplicity at launch, with clear scale checkpoints to 300K MAU. Every technology choice optimized for a 2-person team.

Stack SvelteKit · Neon · Fly.io
Auth BetterAuth · OAuth
Billing Stripe Webhooks
Storage R2 · Resend
Scale Target 300K MAU (Year 3)
Version v1.0 · April 2026
00

Architecture Principles

Every technical decision in this document flows from five governing principles. When tradeoffs arise, these serve as the tiebreaker — in this order.

SOLID at service level
Separation of concerns
Defense in depth
12-Factor App
YAGNI, designed to scale

SOLID at Service Level

Each SvelteKit route module has a single responsibility. Business logic lives in service files, not load functions. Interfaces are preferred over direct implementation calls.

Separation of Concerns

Presentation, business logic, and data access are never mixed. SSR handles public data; client-side handles interactivity. API routes are the only data boundary.

Defense in Depth

Auth is enforced at the route level, the service level, and the database level. No single layer is trusted alone. Secrets never touch source code.

12-Factor App

Config via environment variables. Stateless processes. Port binding. Dev/prod parity. Logs as event streams via structured pino output to stdout.

YAGNI, Designed to Scale

No Redis, no Elasticsearch, no microservices at launch. Postgres handles search, queuing, and sessions. Scale gates are documented and infrastructure-tested.

01

System Overview

MotoPartPicker is a single SvelteKit application deployed on Fly.io that handles both SSR and API routing. All persistent state lives in Neon Postgres. External services (auth, billing, storage, email) are integrated via official SDKs and webhooks.

bikes parts compatibility_records verifications users builds retailers part_prices affiliate_clicks retailer_subscriptions
02

Frontend Architecture

SvelteKit provides SSR and client-side hydration in a single framework. SSR is the default for all public pages — critical because "[bike] [part] compatible" queries are high-volume organic search traffic. Interactive features (build planner, part filters, comparison) hydrate on the client after initial load.

SEO is the primary growth channel. Every public bike/part page is fully rendered HTML on first request. No client-side-only rendering for indexable content. Server load functions handle all data fetching before the response is sent.

Rendering Strategy

SSR — SEO Pages

Bike listing, part detail, compatibility result, build showcase, marketing landing. All rendered server-side with server load functions.

CSR — Interactive

Build planner, live part filters, comparison tool, user dashboard. Hydrated after SSR shell; state managed in Svelte stores.

Feature-Sliced Route Structure

// src/routes/ — feature-sliced by domain routes/ ├── (marketing)/ — landing, about, pricing ├── bikes/ │ ├── +page.svelte — bike selector (year/make/model) │ └── [bikeId]/ │ └── +page.svelte — compatible parts for bike ├── parts/ │ └── [partId]/ │ └── +page.svelte — part detail + prices ├── builds/ │ ├── +page.svelte — user build list (CSR) │ └── [buildId]/ │ └── +page.svelte — build detail (SSR for public) └── api/ — all +server.ts endpoints

Performance Budget

<2s
LCP
Largest Contentful Paint
<100ms
FID
First Input Delay
<0.1
CLS
Cumulative Layout Shift
03

Backend Architecture

The backend is a set of SvelteKit +server.ts API routes. Auth is enforced via BetterAuth middleware. Rate limiting uses an in-memory sliding window at launch, graduating to a Postgres-backed counter at 25K MAU.

Auth strategy: BetterAuth with session cookies. Google and GitHub OAuth. Sessions stored in Postgres via BetterAuth's session adapter. No JWTs in local storage. Rate limits: 100 req/min unauthenticated, 300 req/min authenticated.

API Surface

Method Path Auth Description
GET /api/bikes none List bikes, filterable by year / make / model
GET /api/bikes/[id]/parts none Compatible parts for a specific bike
GET /api/parts/[id] none Part detail including prices across retailers
GET /api/parts/[id]/verifications none Community verifications for a part + bike combo
POST /api/verifications user Submit a fit verification for a part on a bike
GET /api/builds user List authenticated user's builds
POST /api/builds user Create a new build for the authenticated user
PUT /api/builds/[id] owner Update a build (ownership verified server-side)
GET /api/prices/[partId] none Current prices across all retailers for a part
POST /api/affiliate/click none Record an affiliate click for attribution tracking
POST /api/webhooks/stripe stripe Handle billing events: invoice.paid, subscription.updated, etc.
04

Data Architecture

Everything lives in Postgres. No secondary data stores at launch. Neon's serverless driver handles connection pooling transparently — no need for PgBouncer or a separate pooler.

Neon Serverless Driver

HTTP-based Postgres connection that works in edge runtimes. Built-in connection pooling eliminates the need for PgBouncer at this scale.

Full-Text Search via tsvector

Parts search uses Postgres tsvector + GIN index. No Elasticsearch needed at 300K MAU given read-heavy, low-write search patterns.

Partitioned Affiliate Clicks

The affiliate_clicks table is range-partitioned by month. Old partitions can be archived without downtime.

Neon Branching

Each dev environment and preview deploy gets a Neon branch: instant copy-on-write clone of production schema. Zero cost for idle branches.

-- Core compatibility relationship CREATE TABLE compatibility_records ( id uuid PRIMARY KEY DEFAULT gen_random_uuid(), bike_id uuid REFERENCES bikes(id), part_id uuid REFERENCES parts(id), status 'verified' | 'community' | 'no_data', created_at timestamptz DEFAULT now(), UNIQUE (bike_id, part_id) ); -- Full-text search index on parts CREATE INDEX parts_search_idx ON parts USING gin(to_tsvector('english', name || ' ' || coalesce(description, ''))); -- Affiliate clicks: partitioned by month CREATE TABLE affiliate_clicks ( id bigserial, part_id uuid, retailer_id uuid, clicked_at timestamptz DEFAULT now(), user_id uuid NULLABLE ) PARTITION BY RANGE (clicked_at);
05

Integration Architecture

All external integrations are treated as unreliable. Every outbound call uses exponential backoff with a dead-letter audit log for permanent failures.

RevZilla / Amazon

Hourly price scrape via pg-boss background job. Results written to part_prices. Stale prices (>24h) marked as such in the UI.

Stripe Webhooks

Events: invoice.paid, subscription.updated, subscription.deleted. Signature verified with STRIPE_WEBHOOK_SECRET. Idempotent processing via event ID.

Resend (Email)

Transactional only: fit verification confirmation, build share notifications, weekly digest (opt-in). Templates are React Email components compiled server-side.

Google / GitHub OAuth

Managed entirely by BetterAuth. Zero custom OAuth code. Callback URLs registered per environment. PKCE enforced.

Retry strategy: exponential backoff with dead-letter logging. 3 retries at 5s, 25s, 125s intervals. After the 3rd failure the job is moved to an integration_errors audit table and an alert fires. No silent failures.
06

Background Processing

Background jobs run via pg-boss, a Postgres-based job queue using the SKIP LOCKED pattern. No Redis, no separate worker process — the same Fly.io machine that serves HTTP also processes jobs. At 100K MAU this becomes a dedicated machine.

  • hourly price-update Scrape RevZilla + Amazon for part prices
  • on-event send-email Transactional notifications via Resend
  • daily data-quality-check Flag stale prices, orphaned records, low-confidence verifications
  • daily sitemap-generation Regenerate XML sitemap and ping Google Search Console
Failure handling. 3 retries with exponential backoff. On the 3rd failure pg-boss marks the job as failed, writes to the audit log, and triggers a Sentry alert. The price-update job is designed to be fully idempotent — rerunning never duplicates records.
07

Security Architecture

Security is layered: HTTPS at the edge, session auth at the route level, ownership checks at the service level, and column-level encryption at the database. No single layer is trusted alone.

Transport Security

HTTPS everywhere via Fly.io SSL termination. HSTS enforced. X-Frame-Options: DENY. X-Content-Type-Options: nosniff.

CSP Headers

Content Security Policy restricts script sources to self + Google Fonts. CORS: api.motopartpicker.com only. No wildcard origins.

PII Handling

users.email is encrypted at rest (Neon transparent encryption). display_name is public. No PII in logs.

FTC Compliance

Affiliate disclosure visible on every page containing purchase links. Disclosure text: "We earn a commission from purchases. This doesn't affect our compatibility data."

Secrets Management

All secrets stored in Fly.io secrets (fly secrets set). Exposed as environment variables. Never committed to source. Rotated on team member offboarding.

08

Scaling Strategy

Three defined scale checkpoints. Each checkpoint has specific infrastructure triggers and a cost estimate. Nothing is provisioned until the trigger is hit.

Stage Trigger Fly.io Neon Cache
Launch
0 – 5K MAU
1 × shared-cpu-1x (256 MB) Free tier SvelteKit in-memory (5 min TTL)
Year 1
5K – 25K MAU
p95 latency >400ms 2 × shared-cpu-2x (512 MB) Pro (~$19/mo) SvelteKit + CDN for static assets
Year 2
25K – 100K MAU
DB CPU >60% sustained 3 × dedicated-cpu-1x (1 GB) Scale + read replica CDN + server-side cache per route
Year 3
100K – 300K MAU
Multiple regions requested Multi-region Fly machines Scale (multi-region) Redis for hot data (re-evaluate)
Cache strategy: server-side first. SvelteKit's server load functions cache bike and parts data for 5 minutes using a simple Map-based LRU store. At Year 2, Cloudflare Pages / R2 CDN caches public HTML responses. Redis is deferred until Postgres read replica can no longer handle read traffic.
09

Observability

Observability stack is minimal at launch: Sentry for errors, UptimeRobot for uptime checks, pino for structured logs. Metrics are event-driven, not time-series — Postgres query counts and pino log aggregation are sufficient to identify bottlenecks at Year 1 scale.

Sentry (Errors)

Error tracking with source maps. All unhandled exceptions in server load functions and API routes. Release tracking tied to Fly.io deployments.

UptimeRobot

60-second HTTP checks on /api/bikes and the homepage. PagerDuty integration. Alert if downtime >2 consecutive checks.

Pino (Structured Logs)

JSON logs to stdout. Fields: level, msg, route, userId, duration_ms, status. No PII in log fields. Shipped to Fly.io log aggregation.

Key Metrics and Alert Thresholds

api.latency.p95
API response time at 95th percentile
Alert: > 500ms sustained 5 min
api.error_rate
Ratio of 5xx responses to total requests
Alert: > 5% over any 10-min window
affiliate.click_through_rate
Affiliate clicks per part detail page view
Monitor: drop > 30% week-over-week
verification.submission_rate
Verifications submitted per active user
Monitor: weekly cohort trend
uptime.availability
% of time the app is reachable
Alert: < 99.5% in any 24h window
jobs.failure_rate
Background job failure percentage
Alert: any job fails 3rd retry
10

Architecture Decision Records

Five decisions that meaningfully shaped the architecture. Each is considered final for the Year 1 scope with explicit conditions that would trigger reconsideration.

ADR-001

SvelteKit over Next.js

Accepted

Decision: Use SvelteKit as the full-stack framework instead of Next.js or Remix.

Rationale: SvelteKit produces significantly smaller client bundles (no Virtual DOM overhead), has first-class SSR with minimal configuration, and ships one deployment target (Node adapter on Fly.io) with no edge-function complexity. For a 2-person team building a data-rich but interaction-light app, the simpler mental model and faster build times outweigh Next.js's ecosystem size. Reconsider if we hire React engineers who cannot ramp on Svelte.

ADR-002

Neon over Supabase

Accepted

Decision: Use Neon as the managed Postgres provider instead of Supabase or PlanetScale.

Rationale: Neon is pure Postgres — no custom extensions, no proprietary realtime layer, no RLS magic to reason about. Database branching enables true prod-parity dev and PR preview environments at zero cost. Supabase's Auth and Realtime features are compelling but add vendor lock-in we don't need (BetterAuth handles auth, no realtime requirement). Reconsider if pg-boss bottlenecks at high job throughput and we need a Redis-backed queue — at that point Supabase's full platform may make sense.

ADR-003

BetterAuth over Clerk

Accepted

Decision: Use BetterAuth (library) instead of Clerk or Auth0 (managed services).

Rationale: Auth is a core trust feature for a community-driven platform. Self-hosted auth means sessions live in our Postgres database, user data never touches a third-party auth vendor, and there is no per-MAU pricing cliff. BetterAuth is a library, not a service — it compiles into the SvelteKit app with zero cold-start overhead. Clerk's UI components are excellent but the vendor dependency and pricing model are incompatible with our bootstrapped constraints. Reconsider if compliance requirements (SOC 2, HIPAA) mandate a certified auth vendor.

ADR-004

Postgres Full-Text Search over Elasticsearch

Accepted

Decision: Implement parts search using Postgres tsvector + GIN index instead of Elasticsearch or Typesense.

Rationale: The parts catalog is read-heavy and write-sparse. Full-text search on a <500K row table is well within Postgres's capabilities with a GIN-indexed tsvector column. Eliminating a second data store removes an entire failure domain, infrastructure cost (~$50-150/mo), and operational complexity. Search quality at this scale (no ML ranking, no synonyms) does not require Elasticsearch. Reconsider at 1M+ parts records or if search quality scores fall below acceptable in user testing.

ADR-005

pg-boss over BullMQ / Redis

Accepted

Decision: Use pg-boss for background job processing instead of BullMQ (which requires Redis).

Rationale: Adding Redis introduces a second stateful service: separate deployment, separate connection string, separate failure domain, separate monitoring. pg-boss uses the SKIP LOCKED Postgres primitive to provide reliable at-least-once job delivery from the same database we already operate. For 4 job types running at hourly or daily frequency, this is entirely sufficient. Job throughput at launch is <100 jobs/hour — trivially within pg-boss's capacity. Reconsider if job volume exceeds 10,000/hour, which is a Year 3 problem at the earliest.