← Back to blog
·7 min read·By Utkarsh Singh

How to Build an AI-Powered SaaS with Next.js, OpenAI and Claude in 2026

A practical guide to integrating AI providers in a Next.js SaaS — multi-provider abstraction, server-side key management, per-user rate limiting, token tracking, and streaming.

nextjsaiopenaiclaudesaasanthropic

How to Build an AI-Powered SaaS with Next.js, OpenAI and Claude in 2026

Every new SaaS product in 2026 has an AI component. Whether it's a writing tool, a code assistant, a data analyzer, or something entirely new, integrating AI models is no longer optional for most products.

But integrating AI correctly — with proper server-side key management, per-user rate limiting, token usage tracking, and the ability to switch models without rewriting your codebase — is a different challenge from just getting a response from OpenAI.

This guide covers how to build a production-ready AI infrastructure layer in Next.js.


The Problems Most AI Integrations Get Wrong

Exposing API keys to the client. If your API key is in your frontend code, it's public. Anyone can extract it, use it, and run up your bill. Always call AI providers server-side.

No rate limiting. Without per-user limits, a single power user (or a malicious actor) can drain your monthly API budget overnight.

No usage tracking. If you don't know how many tokens each user is consuming, you can't build a fair pricing model or identify abuse.

Hard-coded to a single provider. OpenAI has outages. Anthropic has outages. New models come out. If you're hard-coded to one provider, switching means rewriting your integration layer.


Step 1: Install AI Provider SDKs

npm install openai @anthropic-ai/sdk @google/generative-ai

Add to your env:

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_AI_API_KEY=...

# Default provider: "openai" | "anthropic" | "google"
AI_PROVIDER=openai
AI_MODEL=gpt-4o

Step 2: Build a Unified AI Provider Abstraction

Create lib/ai/providers.ts:

import OpenAI from "openai";
import Anthropic from "@anthropic-ai/sdk";
import { GoogleGenerativeAI } from "@google/generative-ai";

export type AIProvider = "openai" | "anthropic" | "google";

export interface AIMessage {
  role: "user" | "assistant" | "system";
  content: string;
}

export interface AIResponse {
  content: string;
  usage: {
    promptTokens: number;
    completionTokens: number;
    totalTokens: number;
  };
}

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const google = new GoogleGenerativeAI(process.env.GOOGLE_AI_API_KEY!);

export async function generateCompletion(
  messages: AIMessage[],
  options: {
    provider?: AIProvider;
    model?: string;
    maxTokens?: number;
    temperature?: number;
  } = {}
): Promise<AIResponse> {
  const provider = options.provider ?? (process.env.AI_PROVIDER as AIProvider) ?? "openai";
  const maxTokens = options.maxTokens ?? 1000;
  const temperature = options.temperature ?? 0.7;

  switch (provider) {
    case "openai": {
      const model = options.model ?? process.env.AI_MODEL ?? "gpt-4o";
      const response = await openai.chat.completions.create({
        model,
        messages: messages.map((m) => ({
          role: m.role === "system" ? "system" : m.role,
          content: m.content,
        })),
        max_tokens: maxTokens,
        temperature,
      });

      return {
        content: response.choices[0].message.content ?? "",
        usage: {
          promptTokens: response.usage?.prompt_tokens ?? 0,
          completionTokens: response.usage?.completion_tokens ?? 0,
          totalTokens: response.usage?.total_tokens ?? 0,
        },
      };
    }

    case "anthropic": {
      const model = options.model ?? "claude-sonnet-4-20250514";
      const systemMessage = messages.find((m) => m.role === "system")?.content;
      const userMessages = messages.filter((m) => m.role !== "system");

      const response = await anthropic.messages.create({
        model,
        max_tokens: maxTokens,
        system: systemMessage,
        messages: userMessages.map((m) => ({
          role: m.role as "user" | "assistant",
          content: m.content,
        })),
      });

      const content = response.content[0].type === "text" ? response.content[0].text : "";

      return {
        content,
        usage: {
          promptTokens: response.usage.input_tokens,
          completionTokens: response.usage.output_tokens,
          totalTokens: response.usage.input_tokens + response.usage.output_tokens,
        },
      };
    }

    case "google": {
      const model = options.model ?? "gemini-1.5-pro";
      const genModel = google.getGenerativeModel({ model });
      const prompt = messages.map((m) => m.content).join("\n");
      const result = await genModel.generateContent(prompt);
      const response = await result.response;

      return {
        content: response.text(),
        usage: {
          promptTokens: response.usageMetadata?.promptTokenCount ?? 0,
          completionTokens: response.usageMetadata?.candidatesTokenCount ?? 0,
          totalTokens: response.usageMetadata?.totalTokenCount ?? 0,
        },
      };
    }

    default:
      throw new Error(`Unknown AI provider: ${provider}`);
  }
}

Step 3: Add Per-User Rate Limiting

Create lib/ai/rate-limit.ts:

import { prisma } from "@/lib/prisma";

const LIMITS = {
  free: { requestsPerDay: 10, tokensPerDay: 10000 },
  starter: { requestsPerDay: 100, tokensPerDay: 100000 },
  pro: { requestsPerDay: 1000, tokensPerDay: 1000000 },
};

export async function checkRateLimit(userId: string): Promise<{
  allowed: boolean;
  reason?: string;
}> {
  const user = await prisma.user.findUnique({
    where: { id: userId },
    select: { plan: true },
  });

  const plan = (user?.plan as keyof typeof LIMITS) ?? "free";
  const limits = LIMITS[plan];

  const today = new Date();
  today.setHours(0, 0, 0, 0);

  const usage = await prisma.aIUsage.aggregate({
    where: {
      userId,
      createdAt: { gte: today },
    },
    _count: { id: true },
    _sum: { totalTokens: true },
  });

  const requestCount = usage._count.id ?? 0;
  const tokenCount = usage._sum.totalTokens ?? 0;

  if (requestCount >= limits.requestsPerDay) {
    return {
      allowed: false,
      reason: `Daily request limit reached (${limits.requestsPerDay} requests/day on ${plan} plan)`,
    };
  }

  if (tokenCount >= limits.tokensPerDay) {
    return {
      allowed: false,
      reason: `Daily token limit reached (${limits.tokensPerDay} tokens/day on ${plan} plan)`,
    };
  }

  return { allowed: true };
}

Add the usage tracking model to your Prisma schema:

model AIUsage {
  id               String   @id @default(cuid())
  userId           String
  provider         String
  model            String
  promptTokens     Int
  completionTokens Int
  totalTokens      Int
  createdAt        DateTime @default(now())

  user User @relation(fields: [userId], references: [id], onDelete: Cascade)

  @@map("ai_usage")
}

Step 4: Create the AI API Route

Create app/api/ai/generate/route.ts:

import { NextRequest, NextResponse } from "next/server";
import { auth } from "@/auth";
import { generateCompletion, AIMessage } from "@/lib/ai/providers";
import { checkRateLimit } from "@/lib/ai/rate-limit";
import { prisma } from "@/lib/prisma";

export async function POST(req: NextRequest) {
  const session = await auth();
  if (!session?.user?.id) {
    return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
  }

  // Check rate limit
  const { allowed, reason } = await checkRateLimit(session.user.id);
  if (!allowed) {
    return NextResponse.json({ error: reason }, { status: 429 });
  }

  const { messages, provider, model } = await req.json() as {
    messages: AIMessage[];
    provider?: string;
    model?: string;
  };

  try {
    const response = await generateCompletion(messages, { provider: provider as any, model });

    // Track usage
    await prisma.aIUsage.create({
      data: {
        userId: session.user.id,
        provider: provider ?? process.env.AI_PROVIDER ?? "openai",
        model: model ?? process.env.AI_MODEL ?? "gpt-4o",
        promptTokens: response.usage.promptTokens,
        completionTokens: response.usage.completionTokens,
        totalTokens: response.usage.totalTokens,
      },
    });

    return NextResponse.json({ content: response.content, usage: response.usage });
  } catch (err) {
    console.error("AI generation error:", err);
    return NextResponse.json({ error: "AI generation failed" }, { status: 500 });
  }
}

Step 5: Add Streaming Support

For a better user experience, stream responses instead of waiting for the full completion:

// app/api/ai/stream/route.ts
import { NextRequest } from "next/server";
import OpenAI from "openai";
import { auth } from "@/auth";
import { checkRateLimit } from "@/lib/ai/rate-limit";

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

export async function POST(req: NextRequest) {
  const session = await auth();
  if (!session?.user?.id) {
    return new Response("Unauthorized", { status: 401 });
  }

  const { allowed, reason } = await checkRateLimit(session.user.id);
  if (!allowed) {
    return new Response(reason, { status: 429 });
  }

  const { messages } = await req.json();

  const stream = await openai.chat.completions.create({
    model: "gpt-4o",
    messages,
    stream: true,
  });

  const encoder = new TextEncoder();

  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        const content = chunk.choices[0]?.delta?.content ?? "";
        if (content) {
          controller.enqueue(encoder.encode(content));
        }
      }
      controller.close();
    },
  });

  return new Response(readable, {
    headers: {
      "Content-Type": "text/plain; charset=utf-8",
      "Transfer-Encoding": "chunked",
    },
  });
}

Consume the stream in your client component:

"use client";

import { useState } from "react";

export function AIChat() {
  const [output, setOutput] = useState("");
  const [loading, setLoading] = useState(false);

  const generate = async (prompt: string) => {
    setLoading(true);
    setOutput("");

    const res = await fetch("/api/ai/stream", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({
        messages: [{ role: "user", content: prompt }],
      }),
    });

    const reader = res.body?.getReader();
    const decoder = new TextDecoder();

    if (!reader) return;

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
      setOutput((prev) => prev + decoder.decode(value));
    }

    setLoading(false);
  };

  return (
    <div>
      <button onClick={() => generate("Summarize the benefits of a SaaS boilerplate")}>
        Generate
      </button>
      <div className="mt-4 whitespace-pre-wrap">{output}</div>
    </div>
  );
}

Switching Providers Without Rewriting Code

The entire point of the abstraction layer is this: to switch from OpenAI to Claude, you just change two environment variables:

AI_PROVIDER=anthropic
AI_MODEL=claude-sonnet-4-20250514

No code changes. No refactoring. The rest of your app continues working.


The Bottom Line

A production-ready AI infrastructure layer — multi-provider abstraction, server-side key security, rate limiting, usage tracking, and streaming — takes about 6–8 hours to build correctly. That's before you write a single line of your actual AI feature.

ZeroDrag Pro ships with this entire layer pre-built. OpenAI, Claude, and Gemini — unified API, server-side execution, per-user rate limiting, token tracking, and streaming out of the box. Pro is $169, one-time.


Written by Utkarsh Singh. Last updated March 2026.

Skip the setup. Ship your product.

ZeroDrag gives you auth, payments, database, email, and AI — pre-wired. Starter at $119, Pro at $169. One-time payment.

Get ZeroDrag →