AI Infrastructure

Server-side AI infrastructure with provider abstraction, usage tracking, and entitlement-based access control.

What This System Does

The AI module provides:

Unified interface for OpenAI, Anthropic, and Google Gemini
Server-only execution (API keys never exposed to client)
Entitlement-based access (`ai_access`)
Per-user rate limiting
Usage tracking for cost monitoring
Normalized error handling

Why It Exists

AI capabilities are increasingly important for modern SaaS products. This system provides a production-ready abstraction layer that handles multiple providers, tracks usage, enforces rate limits, and integrates with your entitlement system for feature gating.

When You Need to Care About It

You'll interact with the AI system when:

Adding AI-powered features to your product
Switching between AI providers
Monitoring AI usage and costs
Configuring rate limits or safety limits
Adding new AI providers

Key Concepts

Provider Abstraction

The system provides a unified interface across OpenAI, Anthropic, and Gemini. You can switch providers or let the system choose automatically based on availability.

Entitlement Gating

AI access requires the `ai_access` entitlement, which is assigned to the Pro plan by default. This ensures only paying users can access AI features.

Usage Tracking

All AI calls are tracked in the `ai_usage` table, recording tokens used, provider, model, and user. This enables cost monitoring and usage analytics.

Rate Limiting

Per-user rate limits prevent abuse. Default limits are 10 requests per minute and 60 requests per hour. These are in-memory for single-instance deployments.

Supported Providers

Provider	Models	Default Model	Token Tracking
OpenAI	GPT-4o, GPT-4o-mini, etc.	gpt-4o-mini	Exact
Anthropic	Claude 3.5 Sonnet, etc.	claude-3-5-sonnet-20241022	Exact
Gemini	Gemini 1.5 Flash, Pro, etc.	gemini-1.5-flash	Exact or estimated*

*Gemini provides token counts in most responses. When unavailable, tokens are estimated (~4 chars/token).

Quick Start

1. Add API Keys

Add one or more provider keys to `.env`:

bash

# OpenAI
OPENAI_API_KEY=sk-your-openai-key

# Anthropic
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key

# Google Gemini
GEMINI_API_KEY=AIzaSy-your-gemini-key

Get your API keys from:

OpenAI: https://platform.openai.com/api-keys
Anthropic: https://console.anthropic.com/settings/keys
Gemini: https://aistudio.google.com/app/apikey

2. Run Database Migration

bash

pnpm prisma db push pnpm db:seed

This creates the `ai_usage` table and adds the `ai_access` entitlement.

3. Use the AI Client

typescript

import { aiClient, AIError } from "@/lib/ai";

// In an API route
const response = await aiClient.generate(
  {
    prompt: "Explain quantum computing",
    maxTokens: 500,
  },
  { userId: user.id }
);

console.log(response.content);

Important Files

/lib/ai/ai-client.ts

Main entry point for AI operations

🔒 CORE - Do not modify

/lib/ai/ai-types.ts

Type definitions and limits

🔒 CORE - Do not modify

/lib/ai/ai-errors.ts

Error handling and normalization

🔒 CORE - Do not modify

/lib/ai/ai-usage.ts

Usage tracking functions

🔒 CORE - Do not modify

/lib/ai/ai-rate-limit.ts

Rate limiting logic

🔒 CORE - Do not modify

/lib/ai/providers/*.ts

Provider implementations

🔒 CORE - Do not modify

/api/ai/generate/route.ts

Example API endpoint

🏗️ EDITABLE - Use as reference

Core Functions / Methods

aiClient.generate(request, options)

Generate text from a prompt. Returns a response with content, provider, model, and usage statistics.

Location: /lib/ai/ai-client.ts

typescript

interface AIGenerateRequest {
  prompt: string;
  systemPrompt?: string;
  provider?: "openai" | "anthropic" | "gemini";
  model?: string;
  maxTokens?: number;
  temperature?: number;
}

const response = await aiClient.generate(
  {
    prompt: "Hello",
    provider: "openai",
    maxTokens: 100,
  },
  { userId: "user_123" }
);

// Response
{
  content: "Hi there!",
  provider: "openai",
  model: "gpt-4o-mini",
  usage: {
    promptTokens: 5,
    completionTokens: 10,
    totalTokens: 15
  }
}

aiClient.generateStream(request, options)

Generate text with streaming for real-time responses.

Location: /lib/ai/ai-client.ts

typescript

const stream = aiClient.generateStream(
  { prompt: "Write a story" },
  { userId: "user_123" }
);

for await (const chunk of stream) {
  process.stdout.write(chunk.content);
  if (chunk.done) {
    console.log("\nUsage:", chunk.usage);
  }
}

aiClient.isAIConfigured()

Check if any provider is configured. Returns boolean.

Location: /lib/ai/ai-client.ts

aiClient.getAvailableProviders()

Get list of configured providers. Returns array of provider names.

Location: /lib/ai/ai-client.ts

getUserAIUsage(userId, startDate)

Get usage statistics for a user within a time period.

Location: /lib/ai/ai-usage.ts

getUserDailyTokens(userId)

Get today's token usage for a user.

Location: /lib/ai/ai-usage.ts

Switching Providers

To switch providers, pass the `provider` parameter:

typescript

// Use OpenAI (default if configured)
await aiClient.generate({ prompt: "Hello", provider: "openai" }, { userId });

// Use Anthropic
await aiClient.generate({ prompt: "Hello", provider: "anthropic" }, { userId });

// Use Gemini
await aiClient.generate({ prompt: "Hello", provider: "gemini" }, { userId });

If no provider is specified, the system uses the first available in order: OpenAI → Anthropic → Gemini.

Changing Models

Default Models

Each provider has a default model configured in its implementation file:

typescript

// lib/ai/providers/openai.ts
const DEFAULT_CONFIG = {
  defaultModel: "gpt-4o-mini", // ← Change this
  defaultMaxTokens: 1024,
  defaultTemperature: 0.7,
  timeoutMs: AI_LIMITS.DEFAULT_TIMEOUT_MS,
};

Override Models Per Request

Pass the `model` parameter to use a different model for a specific request:

typescript

// Use specific model for this request
const response = await aiClient.generate(
  {
    prompt: "Hello",
    model: "gpt-4o", // Override default gpt-4o-mini
  },
  { userId: "user_123" }
);

Entitlement Check

AI access requires the `ai_access` entitlement. Check it in API routes:

typescript

// Server-side check
import { hasEntitlement } from "@/lib/auth-utils";

export async function POST(request: NextRequest) {
  const hasAI = await hasEntitlement("ai_access");
  if (!hasAI) {
    return NextResponse.json(
      { error: "AI access requires Pro subscription" },
      { status: 403 }
    );
  }
  // ... continue with AI call
}

The `ai_access` entitlement is assigned to the Pro plan by default. Modify /prisma/seed.ts to change this.

Usage Tracking

All AI calls with a `userId` are automatically tracked.

typescript

import { getUserAIUsage, getUserDailyTokens } from "@/lib/ai";

// Get usage for time period
const usage = await getUserAIUsage(userId, new Date("2024-01-01"));
// {
//   totalRequests: 150,
//   totalTokens: 45000,
//   promptTokens: 15000,
//   completionTokens: 30000,
//   byProvider: {
//     openai: { requests: 100, tokens: 30000 },
//     anthropic: { requests: 50, tokens: 15000 }
//   }
// }

// Get today's token usage
const dailyTokens = await getUserDailyTokens(userId);
// 5000

Rate Limiting

Limit	Value
Requests per minute	10
Requests per hour	60

Rate limiting is per-user, in-memory. For production with multiple instances, implement Redis-based limiting.

Configure in /lib/ai/ai-rate-limit.ts.

Error Handling

All provider errors are normalized to `AIError`:

typescript

import { AIError } from "@/lib/ai";

try {
  await aiClient.generate({ prompt: "..." }, { userId });
} catch (error) {
  if (error instanceof AIError) {
    // Safe message for client
    const message = error.toClientMessage();

    // Error details for logging
    console.log({
      code: error.code, // "RATE_LIMITED", "TIMEOUT", etc.
      provider: error.provider,
      retryable: error.retryable,
      statusCode: error.statusCode,
    });
  }
}

Error Codes

Code	Description	Retryable
PROVIDER_NOT_CONFIGURED	API key missing	No
RATE_LIMITED	Too many requests	Yes
TIMEOUT	Request timed out	Yes
CONTEXT_LENGTH_EXCEEDED	Prompt too long	No
CONTENT_FILTERED	Content blocked	No
INSUFFICIENT_QUOTA	Billing issue	No

Errors are automatically captured in Sentry (except rate limits and content filters).

Safety Limits

Built-in limits prevent runaway costs:

Limit	Value
Max tokens per request	4,096
Max prompt length	100,000 chars
Request timeout	30 seconds

Configure in /lib/ai/ai-types.ts:

typescript

export const AI_LIMITS = {
  MAX_TOKENS_PER_REQUEST: 4096,
  DEFAULT_TIMEOUT_MS: 30000,
  MAX_PROMPT_LENGTH: 100000,
} as const;

What You Can Customize

Default models for each provider (in provider files)
Rate limits (in /lib/ai/ai-rate-limit.ts)
Safety limits (in /lib/ai/ai-types.ts)
Which plan gets `ai_access` entitlement (in /prisma/seed.ts)
API endpoint implementation (use /api/ai/generate/route.ts as reference)

What NOT to Touch

Core AI client implementation (/lib/ai/ai-client.ts)
Provider interface definitions (/lib/ai/ai-types.ts)
Error handling logic (/lib/ai/ai-errors.ts)
Usage tracking database operations (/lib/ai/ai-usage.ts)
Provider implementations (/lib/ai/providers/*.ts) - unless adding a new provider

Adding a New Provider

1. Create Provider File

Create /lib/ai/providers/newprovider.ts:

typescript

import {
  AIProviderInterface,
  AIGenerateRequest,
  AIGenerateResponse,
} from "../ai-types";

class NewProvider implements AIProviderInterface {
  readonly name = "newprovider" as const;

  isAvailable(): boolean {
    return !!process.env.NEWPROVIDER_API_KEY;
  }

  async generate(request: AIGenerateRequest): Promise<AIGenerateResponse> {
    // Implementation
  }
}

export const newProvider = new NewProvider();

2. Register in AI Client

Update /lib/ai/ai-client.ts:

typescript

import { newProvider } from "./providers/newprovider";

function getProvider(name: AIProvider) {
  switch (name) {
    case "openai":
      return openaiProvider;
    case "anthropic":
      return anthropicProvider;
    case "newprovider":
      return newProvider;
  }
}

3. Update Type Definition

Update /lib/ai/ai-types.ts:

typescript

export type AIProvider = "openai" | "anthropic" | "newprovider";

Cost Management

Set token limits: Use `maxTokens` to cap response length
Monitor usage: Query `AIUsage` table regularly
Entitlement gating: Only Pro users get AI access
Rate limiting: Prevent abuse with per-user limits
Track by provider: Usage stats broken down by provider

Provider Cost Comparison

Costs vary significantly between providers and models. As of late 2024:

Provider	Model	Input (per 1M tokens)	Output (per 1M tokens)
OpenAI	gpt-4o-mini	~$0.15	~$0.60
Anthropic	claude-3-5-sonnet	~$3.00	~$15.00
Gemini	gemini-1.5-flash	~$0.075	~$0.30

Note: Prices change frequently. Check provider pricing pages for current rates. Gemini offers a free tier with rate limits.

Related Sections

Entitlements - Understanding feature gating
Environment Variables - API key configuration
Database - Understanding the `ai_usage` table