How to Build an AI-Powered SaaS with Next.js, OpenAI and Claude in 2026
Every new SaaS product in 2026 has an AI component. Whether it's a writing tool, a code assistant, a data analyzer, or something entirely new, integrating AI models is no longer optional for most products.
But integrating AI correctly — with proper server-side key management, per-user rate limiting, token usage tracking, and the ability to switch models without rewriting your codebase — is a different challenge from just getting a response from OpenAI.
This guide covers how to build a production-ready AI infrastructure layer in Next.js.
The Problems Most AI Integrations Get Wrong
Exposing API keys to the client. If your API key is in your frontend code, it's public. Anyone can extract it, use it, and run up your bill. Always call AI providers server-side.
No rate limiting. Without per-user limits, a single power user (or a malicious actor) can drain your monthly API budget overnight.
No usage tracking. If you don't know how many tokens each user is consuming, you can't build a fair pricing model or identify abuse.
Hard-coded to a single provider. OpenAI has outages. Anthropic has outages. New models come out. If you're hard-coded to one provider, switching means rewriting your integration layer.
Step 1: Install AI Provider SDKs
npm install openai @anthropic-ai/sdk @google/generative-ai
Add to your env:
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_AI_API_KEY=...
# Default provider: "openai" | "anthropic" | "google"
AI_PROVIDER=openai
AI_MODEL=gpt-4o
Step 2: Build a Unified AI Provider Abstraction
Create lib/ai/providers.ts:
import OpenAI from "openai";
import Anthropic from "@anthropic-ai/sdk";
import { GoogleGenerativeAI } from "@google/generative-ai";
export type AIProvider = "openai" | "anthropic" | "google";
export interface AIMessage {
role: "user" | "assistant" | "system";
content: string;
}
export interface AIResponse {
content: string;
usage: {
promptTokens: number;
completionTokens: number;
totalTokens: number;
};
}
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const google = new GoogleGenerativeAI(process.env.GOOGLE_AI_API_KEY!);
export async function generateCompletion(
messages: AIMessage[],
options: {
provider?: AIProvider;
model?: string;
maxTokens?: number;
temperature?: number;
} = {}
): Promise<AIResponse> {
const provider = options.provider ?? (process.env.AI_PROVIDER as AIProvider) ?? "openai";
const maxTokens = options.maxTokens ?? 1000;
const temperature = options.temperature ?? 0.7;
switch (provider) {
case "openai": {
const model = options.model ?? process.env.AI_MODEL ?? "gpt-4o";
const response = await openai.chat.completions.create({
model,
messages: messages.map((m) => ({
role: m.role === "system" ? "system" : m.role,
content: m.content,
})),
max_tokens: maxTokens,
temperature,
});
return {
content: response.choices[0].message.content ?? "",
usage: {
promptTokens: response.usage?.prompt_tokens ?? 0,
completionTokens: response.usage?.completion_tokens ?? 0,
totalTokens: response.usage?.total_tokens ?? 0,
},
};
}
case "anthropic": {
const model = options.model ?? "claude-sonnet-4-20250514";
const systemMessage = messages.find((m) => m.role === "system")?.content;
const userMessages = messages.filter((m) => m.role !== "system");
const response = await anthropic.messages.create({
model,
max_tokens: maxTokens,
system: systemMessage,
messages: userMessages.map((m) => ({
role: m.role as "user" | "assistant",
content: m.content,
})),
});
const content = response.content[0].type === "text" ? response.content[0].text : "";
return {
content,
usage: {
promptTokens: response.usage.input_tokens,
completionTokens: response.usage.output_tokens,
totalTokens: response.usage.input_tokens + response.usage.output_tokens,
},
};
}
case "google": {
const model = options.model ?? "gemini-1.5-pro";
const genModel = google.getGenerativeModel({ model });
const prompt = messages.map((m) => m.content).join("\n");
const result = await genModel.generateContent(prompt);
const response = await result.response;
return {
content: response.text(),
usage: {
promptTokens: response.usageMetadata?.promptTokenCount ?? 0,
completionTokens: response.usageMetadata?.candidatesTokenCount ?? 0,
totalTokens: response.usageMetadata?.totalTokenCount ?? 0,
},
};
}
default:
throw new Error(`Unknown AI provider: ${provider}`);
}
}
Step 3: Add Per-User Rate Limiting
Create lib/ai/rate-limit.ts:
import { prisma } from "@/lib/prisma";
const LIMITS = {
free: { requestsPerDay: 10, tokensPerDay: 10000 },
starter: { requestsPerDay: 100, tokensPerDay: 100000 },
pro: { requestsPerDay: 1000, tokensPerDay: 1000000 },
};
export async function checkRateLimit(userId: string): Promise<{
allowed: boolean;
reason?: string;
}> {
const user = await prisma.user.findUnique({
where: { id: userId },
select: { plan: true },
});
const plan = (user?.plan as keyof typeof LIMITS) ?? "free";
const limits = LIMITS[plan];
const today = new Date();
today.setHours(0, 0, 0, 0);
const usage = await prisma.aIUsage.aggregate({
where: {
userId,
createdAt: { gte: today },
},
_count: { id: true },
_sum: { totalTokens: true },
});
const requestCount = usage._count.id ?? 0;
const tokenCount = usage._sum.totalTokens ?? 0;
if (requestCount >= limits.requestsPerDay) {
return {
allowed: false,
reason: `Daily request limit reached (${limits.requestsPerDay} requests/day on ${plan} plan)`,
};
}
if (tokenCount >= limits.tokensPerDay) {
return {
allowed: false,
reason: `Daily token limit reached (${limits.tokensPerDay} tokens/day on ${plan} plan)`,
};
}
return { allowed: true };
}
Add the usage tracking model to your Prisma schema:
model AIUsage {
id String @id @default(cuid())
userId String
provider String
model String
promptTokens Int
completionTokens Int
totalTokens Int
createdAt DateTime @default(now())
user User @relation(fields: [userId], references: [id], onDelete: Cascade)
@@map("ai_usage")
}
Step 4: Create the AI API Route
Create app/api/ai/generate/route.ts:
import { NextRequest, NextResponse } from "next/server";
import { auth } from "@/auth";
import { generateCompletion, AIMessage } from "@/lib/ai/providers";
import { checkRateLimit } from "@/lib/ai/rate-limit";
import { prisma } from "@/lib/prisma";
export async function POST(req: NextRequest) {
const session = await auth();
if (!session?.user?.id) {
return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
}
// Check rate limit
const { allowed, reason } = await checkRateLimit(session.user.id);
if (!allowed) {
return NextResponse.json({ error: reason }, { status: 429 });
}
const { messages, provider, model } = await req.json() as {
messages: AIMessage[];
provider?: string;
model?: string;
};
try {
const response = await generateCompletion(messages, { provider: provider as any, model });
// Track usage
await prisma.aIUsage.create({
data: {
userId: session.user.id,
provider: provider ?? process.env.AI_PROVIDER ?? "openai",
model: model ?? process.env.AI_MODEL ?? "gpt-4o",
promptTokens: response.usage.promptTokens,
completionTokens: response.usage.completionTokens,
totalTokens: response.usage.totalTokens,
},
});
return NextResponse.json({ content: response.content, usage: response.usage });
} catch (err) {
console.error("AI generation error:", err);
return NextResponse.json({ error: "AI generation failed" }, { status: 500 });
}
}
Step 5: Add Streaming Support
For a better user experience, stream responses instead of waiting for the full completion:
// app/api/ai/stream/route.ts
import { NextRequest } from "next/server";
import OpenAI from "openai";
import { auth } from "@/auth";
import { checkRateLimit } from "@/lib/ai/rate-limit";
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
export async function POST(req: NextRequest) {
const session = await auth();
if (!session?.user?.id) {
return new Response("Unauthorized", { status: 401 });
}
const { allowed, reason } = await checkRateLimit(session.user.id);
if (!allowed) {
return new Response(reason, { status: 429 });
}
const { messages } = await req.json();
const stream = await openai.chat.completions.create({
model: "gpt-4o",
messages,
stream: true,
});
const encoder = new TextEncoder();
const readable = new ReadableStream({
async start(controller) {
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content ?? "";
if (content) {
controller.enqueue(encoder.encode(content));
}
}
controller.close();
},
});
return new Response(readable, {
headers: {
"Content-Type": "text/plain; charset=utf-8",
"Transfer-Encoding": "chunked",
},
});
}
Consume the stream in your client component:
"use client";
import { useState } from "react";
export function AIChat() {
const [output, setOutput] = useState("");
const [loading, setLoading] = useState(false);
const generate = async (prompt: string) => {
setLoading(true);
setOutput("");
const res = await fetch("/api/ai/stream", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
messages: [{ role: "user", content: prompt }],
}),
});
const reader = res.body?.getReader();
const decoder = new TextDecoder();
if (!reader) return;
while (true) {
const { done, value } = await reader.read();
if (done) break;
setOutput((prev) => prev + decoder.decode(value));
}
setLoading(false);
};
return (
<div>
<button onClick={() => generate("Summarize the benefits of a SaaS boilerplate")}>
Generate
</button>
<div className="mt-4 whitespace-pre-wrap">{output}</div>
</div>
);
}
Switching Providers Without Rewriting Code
The entire point of the abstraction layer is this: to switch from OpenAI to Claude, you just change two environment variables:
AI_PROVIDER=anthropic
AI_MODEL=claude-sonnet-4-20250514
No code changes. No refactoring. The rest of your app continues working.
The Bottom Line
A production-ready AI infrastructure layer — multi-provider abstraction, server-side key security, rate limiting, usage tracking, and streaming — takes about 6–8 hours to build correctly. That's before you write a single line of your actual AI feature.
ZeroDrag Pro ships with this entire layer pre-built. OpenAI, Claude, and Gemini — unified API, server-side execution, per-user rate limiting, token tracking, and streaming out of the box. Pro is $169, one-time.
Written by Utkarsh Singh. Last updated March 2026.