How I Vibe-Coded Two Enterprise Apps in 8 Months for $13,945

Quick Take

Two production-grade enterprise apps, both Next.js 15 on Azure Container Apps, built solo in eight months with Cursor and Anthropic's frontier models. Solo developer spend: $13,945 across 34,875 Cursor events and 450 active coding hours. The recipe below is the exact stack and rollout plan that produced them. Download it, paste it into Cursor, build your own.

~350,000 lines of TypeScript across two apps, one stack, one rollout pattern
The majority of model spend was on Anthropic Opus-thinking models. The frontier model matters more than the framework
The honest cost-per-feature was about $31/hour of active coding time, all-in, including the parts where the AI was wrong
Below: the architecture, the exposure-model decision, what worked, what didn't, and the recipe you can feed straight into Cursor

Download the Recipe (PDF)

Download as Markdown

I spent the last eight months shipping two enterprise applications, solo, using Cursor and Anthropic's frontier models. Both are in production now. One is a commercial loan origination system. The other is a multi-app internal platform that consolidates a dozen back-office tools behind a single sign-on. Different problem domains, almost identical stack.

I am writing this for the engineer or technical founder who has heard "vibe coding" and wants to know whether it is real, what it costs, and what the actual architecture looks like on the other side. No hype. Just the numbers, the stack, and the recipe.

The numbers, up front

Pulled directly from the Cursor usage report for my account over the build window:

Metric	Value
Active coding hours	450.3
Active days	185
Cursor events	34,875
Cursor spend (solo developer)	$13,945
Effective rate	~$31 per active coding hour
Model mix	Majority on Anthropic Opus-thinking

Eight months of calendar time, roughly 450 hours of focused coding, $13,945 in AI spend. Two apps shipped. For comparison: a single mid-level contract engineer at $150/hour would have cost $67,500 for the same 450 hours, and would not have written 350,000 lines of code in that window.

That is the headline number. The honest version is below.

The stack I landed on (both apps)

Both applications converged on the same architecture. That convergence was not planned. It is what happens when you let the AI pick reasonable defaults, course-correct toward the simplest thing that scales, and stop fighting the framework.

Layer	Choice	Why
Runtime	Next.js 15 (App Router, Turbopack)	Single repo, server components, file-based routing, mature ecosystem
Language	TypeScript 5 on Node 20	End-to-end types, zero context-switching with the AI
ORM	Prisma 6	Type-safe queries, migrations as code, AI-friendly schema
Database	Azure Database for PostgreSQL Flexible Server (v16, pgvector on one app)	Boring, durable, cheap, every AI tool understands it
Cache / queue	Azure Cache for Redis (ioredis)	Sessions, rate limits, lightweight job queues, outbox drains
Auth	NextAuth 4 + Microsoft Entra ID (OIDC SSO)	One-click sign-on for any Microsoft 365 tenant
Validation	Zod 4 (shared between API and forms)	One schema, two surfaces, zero drift
UI	Tailwind 3 + Radix UI primitives	Composable, accessible, no design-system politics
Telemetry	Application Insights (server + browser SDK)	Distributed tracing without standing up another vendor
Secrets	Azure Key Vault (with Container Apps secret references)	Zero secrets in CI, rotation without redeploy
Email	Azure Communication Services (transactional)	Native to the tenant, cheap, no SendGrid bill
Compute	Azure Container Apps + Container Apps Jobs (cron + on-demand)	Scales to zero, no Kubernetes, predictable pricing
Registry	Azure Container Registry (managed-identity pull)	Private images, OIDC from GitHub, no static creds
CI/CD	GitHub Actions with OIDC federation to Azure	No long-lived deploy secrets, branch-protected main

The two apps together run on:

2 resource groups
2 Container Apps
2 Container Apps Environments
2 PostgreSQL flexible servers
2 Redis caches
Scheduled Container Apps Jobs for background work on both apps
1 shared Microsoft Entra ID tenant with per-app application registrations
0 Kubernetes clusters, 0 surprise bills

Why this stack, specifically

Three principles drove every architectural choice.

Boring at the boundary, interesting in the middle. The plumbing - auth, database, deploy, secrets, telemetry - is all first-party Microsoft or first-party framework defaults. The interesting code is in the business logic, which is also the only code worth writing custom. Every hour spent gluing a non-default service together is an hour not spent on the actual problem.

Static typing end to end. The AI is dramatically better when types flow from the database row through the Prisma client through the API handler through the Zod schema into the React form. One change to a Prisma model surfaces every place it breaks. The compiler is the autopilot for refactors the AI does not catch.

No long-lived secrets anywhere. GitHub federates to Azure via OIDC. Container Apps pull from Key Vault via managed identity. The only secret in GitHub is the DATABASE_URL needed by the migration job, and even that is scoped to the migration runner, not the app. Rotating credentials is a 30-second Key Vault operation, not a deploy.

The combined effect: a stack the AI can navigate end-to-end without you having to remember which alternative path you took last time.

Two paths for internet-facing security

The two apps had different exposure requirements, and that drove the single most consequential decision after the stack itself.

App A has a client portal that must be reachable from the public internet for non-tenant users, plus webhooks from third-party financial services that have to land on a public URL. App Proxy was not a fit. Instead, I paid for a third-party professional security review: a senior developer signed off on the auth flows, the magic-link portal, the rate limits, the input validation, and the audit posture before launch. The Container App takes public traffic directly through its managed certificate.

App B is internal-only. Every user authenticates against the same Microsoft Entra ID tenant before they ever see the application. I put Microsoft Entra Application Proxy in front of it. The Container Apps Environment runs with internal ingress, the only public surface is the App Proxy URL, and the App Proxy enforces pre-authentication, conditional access, and traffic inspection before any request reaches my code. It is the security posture of a VPN, without making users install a VPN client. The cost is a per-user monthly charge and one connector VM to keep patched. For an app that should never be reachable from outside the tenant, that is the easiest call in the stack.

Both postures are valid. Which one you pick is a function of who needs to reach the app:

Internal users only, all inside your Entra tenant: put Azure App Proxy in front. The recurring per-user cost buys you VPN-grade isolation, conditional access at the edge, and the lowest possible attack surface. If you are at all uncertain about your app's security grade, this is the safer default.
Any non-tenant users, customer portals, or partner access: stand the Container App up with public ingress and budget for a real, paid security review before launch. If you cannot afford the review, the app should not launch publicly.

The recipe below covers both paths and marks the steps that change depending on which exposure model you pick.

What vibe coding actually felt like

I want to be precise about this because the term has been stretched to mean everything from "AI autocompleted my for-loop" to "GPT-4 wrote my entire business."

What it actually looked like for me:

A working day was three to five hours of focused Cursor sessions. I would open a feature spec - usually a paragraph I had typed into a markdown file in docs/specs/ - and ask the model (almost always Claude Opus in extended thinking mode) to propose a plan. I would read the plan, push back on parts I disagreed with, then ask it to execute. I would watch the diffs, run the type checker, run the tests, and iterate.

The model did the typing. I did the deciding.

Most features took two to four iterations to land. The first pass was usually a working skeleton with a wrong assumption baked in. The second pass corrected the assumption. The third pass tightened the types, added error handling, wrote the tests. Sometimes a fourth pass to fix something the test suite caught.

The honest summary: I was acting as a technical lead reviewing pull requests from an extremely fast, extremely literal junior engineer who never tires, never gets defensive, and occasionally hallucinates a method that does not exist on the Prisma client.

The leverage is real. The supervision is non-negotiable.

Where the AI was strong, and where it wasn't

The pattern was consistent across both apps.

The AI was excellent at:

Scaffolding new API routes that matched the existing conventions in the repo
Wiring up a new Prisma model and the migration that goes with it
Translating a clear English spec into a working server component plus its data loader
Writing the kind of glue code humans hate to write - retry logic, exponential backoff, idempotency keys, audit-log entries
Generating tests from a description of expected behavior, then running them and fixing the failures
Fluent multi-file refactors with Cursor's Composer or agent mode

The AI was unreliable at:

Architecture decisions where the right answer depended on context outside the repo (compliance, cost ceilings, vendor preference). It would happily pick something defensible but suboptimal.
Anything involving cross-system invariants: webhooks that needed to remain in lockstep with a database row, race conditions in scheduled jobs, distributed locks. I caught real bugs by reading the diff carefully.
Knowing when to stop. Left alone, the model would keep adding "improvements" that were really just rearrangements. I had to be explicit about scope.
Long-context refactors on files past about 1,500 lines. The model would drop subtle things on the floor. Splitting files first was a forcing function the AI liked anyway.

The single biggest unlock was the CLAUDE.md (or AGENTS.md) file at the repo root. A two-page document describing the stack, the conventions, the names of internal helpers, and the things the AI should never do. Every session inherited that context for free. Without it, every chat started from zero.

Where the money went

The bill is interesting because it shows what frontier models cost when you actually use them as a daily driver, not as a novelty.

The majority of my $13,945 spend went to Anthropic Opus-thinking models. Sonnet handled the work that did not need deeper reasoning: boilerplate, commit messages, small typed refactors. A handful of cheaper non-thinking models picked up trivia. Spend was concentrated in the second half of the build window as the codebases grew and more sessions moved into multi-file agent work.

The honest tradeoff: frontier models cost real money. A Sonnet-only build would have been meaningfully cheaper and noticeably slower on the hard parts: schema redesigns, multi-file refactors, debugging gnarly Prisma and Postgres interactions, security-sensitive code. The Opus-thinking premium paid for itself there.

I would do it again at this ratio. I would not do it at a Sonnet-only ratio for an enterprise-grade build.

What I would not skip on a second build

Looking at the two apps with hindsight, the things that mattered most were not the technologies. They were the disciplines.

A CLAUDE.md from day one. Even a 200-word version. The AI is dramatically better when it knows the rules of your repo.
A lib/permissions.ts from day one. Authorization sprinkled across handlers is the single most expensive mistake to clean up later. Centralize it before the second handler exists.
A health endpoint and an audit log from day one. You can ship without them. You cannot operate without them.
OIDC from GitHub to Azure from day one. Switching from static credentials to OIDC later is annoying. Setting up OIDC at the start is twenty minutes.
The exposure-model decision before the first deploy. Internal app: budget for Azure App Proxy. Public app: budget for a paid security review. Either is fine. Skipping both is not.
Report-only Conditional Access policies before any enforcement. I have an entire piece on how we lock down SMB Microsoft 365 tenants for $50 of hardware per user - the same pattern applies to the Entra apps you stand up for your own product.

What I would skip: most of the bespoke abstractions I built in month two that I deleted in month six. The AI loves building abstractions. Resist until the third place you need them.

The recipe

The whole point of this piece is the recipe below. It is the exact step-by-step plan I would feed into Cursor on day one to bootstrap an app on this stack. Sixty-plus numbered steps, organized into phases, each one a thing you would actually ask Cursor to do.

Download it as a PDF for reading, or as a markdown file to paste straight into your AI editor.

Download the Recipe (PDF)

Download as Markdown

The recipe assumes:

You have an Azure subscription with Owner permissions and a Microsoft Entra ID tenant.
You have a GitHub organization (or personal account) you can configure with OIDC federation to Azure.
You have Node 20+ and Docker Desktop locally.
You are comfortable in PowerShell or bash and have read at least one Next.js App Router project.

The phases cover: the Azure foundation, the Entra registrations, the repository scaffold, authorization and audit, the API layer, the frontend, containerization, the Container Apps rollout (with the App Proxy vs public-ingress decision called out explicitly), GitHub Actions CI/CD with OIDC, observability, cost discipline, the vibe coding loop itself, and hardening before launch. That is the entire path from empty resource group to production-grade enterprise app.

Bottom line

Two production apps. Eight months. $13,945 in AI spend. About 450 hours of focused coding time at an effective rate of $31 per hour. The stack is boring on purpose. The discipline matters more than the tools.

The recipe scales because it is opinionated. Every decision was made once, captured in the stack, captured in the CLAUDE.md, and inherited by every Cursor session afterward. If you are starting an enterprise app today and want a path that does not waste the first month on framework debates, the download above is the path.

If you want the security side of running an app like this in a Microsoft 365 tenant, the tenant lockdown runbook is the companion piece. If you want the disaster-recovery side, the Synology M365 backup setup covers the cheap on-site backup layer that pairs with cloud backup for a real 3-2-1 posture.

Two apps. One stack. One recipe. Yours to run.