AI Infrastructure

Server-side AI infrastructure with provider abstraction, usage tracking, and entitlement-based access control.

What This System Does

The AI module provides:

  • Unified interface for OpenAI, Anthropic, and Google Gemini
  • Server-only execution (API keys never exposed to client)
  • Entitlement-based access (`ai_access`)
  • Per-user rate limiting
  • Usage tracking for cost monitoring
  • Normalized error handling

Why It Exists

AI capabilities are increasingly important for modern SaaS products. This system provides a production-ready abstraction layer that handles multiple providers, tracks usage, enforces rate limits, and integrates with your entitlement system for feature gating.

When You Need to Care About It

You'll interact with the AI system when:

  • Adding AI-powered features to your product
  • Switching between AI providers
  • Monitoring AI usage and costs
  • Configuring rate limits or safety limits
  • Adding new AI providers

Key Concepts

Provider Abstraction

The system provides a unified interface across OpenAI, Anthropic, and Gemini. You can switch providers or let the system choose automatically based on availability.

Entitlement Gating

AI access requires the `ai_access` entitlement, which is assigned to the Pro plan by default. This ensures only paying users can access AI features.

Usage Tracking

All AI calls are tracked in the `ai_usage` table, recording tokens used, provider, model, and user. This enables cost monitoring and usage analytics.

Rate Limiting

Per-user rate limits prevent abuse. Default limits are 10 requests per minute and 60 requests per hour. These are in-memory for single-instance deployments.

Supported Providers

ProviderModelsDefault ModelToken Tracking
OpenAIGPT-4o, GPT-4o-mini, etc.gpt-4o-miniExact
AnthropicClaude 3.5 Sonnet, etc.claude-3-5-sonnet-20241022Exact
GeminiGemini 1.5 Flash, Pro, etc.gemini-1.5-flashExact or estimated*

*Gemini provides token counts in most responses. When unavailable, tokens are estimated (~4 chars/token).

Quick Start

1. Add API Keys

Add one or more provider keys to `.env`:

bash
# OpenAI
OPENAI_API_KEY=sk-your-openai-key

# Anthropic
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key

# Google Gemini
GEMINI_API_KEY=AIzaSy-your-gemini-key

Get your API keys from:

2. Run Database Migration

bash
pnpm prisma db push pnpm db:seed

This creates the `ai_usage` table and adds the `ai_access` entitlement.

3. Use the AI Client

typescript
import { aiClient, AIError } from "@/lib/ai";

// In an API route
const response = await aiClient.generate(
  {
    prompt: "Explain quantum computing",
    maxTokens: 500,
  },
  { userId: user.id }
);

console.log(response.content);

Important Files

/lib/ai/ai-client.ts

Main entry point for AI operations

🔒 CORE - Do not modify

/lib/ai/ai-types.ts

Type definitions and limits

🔒 CORE - Do not modify

/lib/ai/ai-errors.ts

Error handling and normalization

🔒 CORE - Do not modify

/lib/ai/ai-usage.ts

Usage tracking functions

🔒 CORE - Do not modify

/lib/ai/ai-rate-limit.ts

Rate limiting logic

🔒 CORE - Do not modify

/lib/ai/providers/*.ts

Provider implementations

🔒 CORE - Do not modify

/api/ai/generate/route.ts

Example API endpoint

🏗️ EDITABLE - Use as reference

Core Functions / Methods

aiClient.generate(request, options)

Generate text from a prompt. Returns a response with content, provider, model, and usage statistics.

Location: /lib/ai/ai-client.ts

typescript
interface AIGenerateRequest {
  prompt: string;
  systemPrompt?: string;
  provider?: "openai" | "anthropic" | "gemini";
  model?: string;
  maxTokens?: number;
  temperature?: number;
}

const response = await aiClient.generate(
  {
    prompt: "Hello",
    provider: "openai",
    maxTokens: 100,
  },
  { userId: "user_123" }
);

// Response
{
  content: "Hi there!",
  provider: "openai",
  model: "gpt-4o-mini",
  usage: {
    promptTokens: 5,
    completionTokens: 10,
    totalTokens: 15
  }
}

aiClient.generateStream(request, options)

Generate text with streaming for real-time responses.

Location: /lib/ai/ai-client.ts

typescript
const stream = aiClient.generateStream(
  { prompt: "Write a story" },
  { userId: "user_123" }
);

for await (const chunk of stream) {
  process.stdout.write(chunk.content);
  if (chunk.done) {
    console.log("\nUsage:", chunk.usage);
  }
}

aiClient.isAIConfigured()

Check if any provider is configured. Returns boolean.

Location: /lib/ai/ai-client.ts

aiClient.getAvailableProviders()

Get list of configured providers. Returns array of provider names.

Location: /lib/ai/ai-client.ts

getUserAIUsage(userId, startDate)

Get usage statistics for a user within a time period.

Location: /lib/ai/ai-usage.ts

getUserDailyTokens(userId)

Get today's token usage for a user.

Location: /lib/ai/ai-usage.ts

Switching Providers

To switch providers, pass the `provider` parameter:

typescript
// Use OpenAI (default if configured)
await aiClient.generate({ prompt: "Hello", provider: "openai" }, { userId });

// Use Anthropic
await aiClient.generate({ prompt: "Hello", provider: "anthropic" }, { userId });

// Use Gemini
await aiClient.generate({ prompt: "Hello", provider: "gemini" }, { userId });

If no provider is specified, the system uses the first available in order: OpenAI → Anthropic → Gemini.

Changing Models

Default Models

Each provider has a default model configured in its implementation file:

typescript
// lib/ai/providers/openai.ts
const DEFAULT_CONFIG = {
  defaultModel: "gpt-4o-mini", // ← Change this
  defaultMaxTokens: 1024,
  defaultTemperature: 0.7,
  timeoutMs: AI_LIMITS.DEFAULT_TIMEOUT_MS,
};

Override Models Per Request

Pass the `model` parameter to use a different model for a specific request:

typescript
// Use specific model for this request
const response = await aiClient.generate(
  {
    prompt: "Hello",
    model: "gpt-4o", // Override default gpt-4o-mini
  },
  { userId: "user_123" }
);

Entitlement Check

AI access requires the `ai_access` entitlement. Check it in API routes:

typescript
// Server-side check
import { hasEntitlement } from "@/lib/auth-utils";

export async function POST(request: NextRequest) {
  const hasAI = await hasEntitlement("ai_access");
  if (!hasAI) {
    return NextResponse.json(
      { error: "AI access requires Pro subscription" },
      { status: 403 }
    );
  }
  // ... continue with AI call
}

The `ai_access` entitlement is assigned to the Pro plan by default. Modify /prisma/seed.ts to change this.

Usage Tracking

All AI calls with a `userId` are automatically tracked.

typescript
import { getUserAIUsage, getUserDailyTokens } from "@/lib/ai";

// Get usage for time period
const usage = await getUserAIUsage(userId, new Date("2024-01-01"));
// {
//   totalRequests: 150,
//   totalTokens: 45000,
//   promptTokens: 15000,
//   completionTokens: 30000,
//   byProvider: {
//     openai: { requests: 100, tokens: 30000 },
//     anthropic: { requests: 50, tokens: 15000 }
//   }
// }

// Get today's token usage
const dailyTokens = await getUserDailyTokens(userId);
// 5000

Rate Limiting

LimitValue
Requests per minute10
Requests per hour60

Rate limiting is per-user, in-memory. For production with multiple instances, implement Redis-based limiting.

Configure in /lib/ai/ai-rate-limit.ts.

Error Handling

All provider errors are normalized to `AIError`:

typescript
import { AIError } from "@/lib/ai";

try {
  await aiClient.generate({ prompt: "..." }, { userId });
} catch (error) {
  if (error instanceof AIError) {
    // Safe message for client
    const message = error.toClientMessage();

    // Error details for logging
    console.log({
      code: error.code, // "RATE_LIMITED", "TIMEOUT", etc.
      provider: error.provider,
      retryable: error.retryable,
      statusCode: error.statusCode,
    });
  }
}

Error Codes

CodeDescriptionRetryable
PROVIDER_NOT_CONFIGUREDAPI key missingNo
RATE_LIMITEDToo many requestsYes
TIMEOUTRequest timed outYes
CONTEXT_LENGTH_EXCEEDEDPrompt too longNo
CONTENT_FILTEREDContent blockedNo
INSUFFICIENT_QUOTABilling issueNo

Errors are automatically captured in Sentry (except rate limits and content filters).

Safety Limits

Built-in limits prevent runaway costs:

LimitValue
Max tokens per request4,096
Max prompt length100,000 chars
Request timeout30 seconds

Configure in /lib/ai/ai-types.ts:

typescript
export const AI_LIMITS = {
  MAX_TOKENS_PER_REQUEST: 4096,
  DEFAULT_TIMEOUT_MS: 30000,
  MAX_PROMPT_LENGTH: 100000,
} as const;

What You Can Customize

  • Default models for each provider (in provider files)
  • Rate limits (in /lib/ai/ai-rate-limit.ts)
  • Safety limits (in /lib/ai/ai-types.ts)
  • Which plan gets `ai_access` entitlement (in /prisma/seed.ts)
  • API endpoint implementation (use /api/ai/generate/route.ts as reference)

What NOT to Touch

  • Core AI client implementation (/lib/ai/ai-client.ts)
  • Provider interface definitions (/lib/ai/ai-types.ts)
  • Error handling logic (/lib/ai/ai-errors.ts)
  • Usage tracking database operations (/lib/ai/ai-usage.ts)
  • Provider implementations (/lib/ai/providers/*.ts) - unless adding a new provider

Adding a New Provider

1. Create Provider File

Create /lib/ai/providers/newprovider.ts:

typescript
import {
  AIProviderInterface,
  AIGenerateRequest,
  AIGenerateResponse,
} from "../ai-types";

class NewProvider implements AIProviderInterface {
  readonly name = "newprovider" as const;

  isAvailable(): boolean {
    return !!process.env.NEWPROVIDER_API_KEY;
  }

  async generate(request: AIGenerateRequest): Promise<AIGenerateResponse> {
    // Implementation
  }
}

export const newProvider = new NewProvider();

2. Register in AI Client

Update /lib/ai/ai-client.ts:

typescript
import { newProvider } from "./providers/newprovider";

function getProvider(name: AIProvider) {
  switch (name) {
    case "openai":
      return openaiProvider;
    case "anthropic":
      return anthropicProvider;
    case "newprovider":
      return newProvider;
  }
}

3. Update Type Definition

Update /lib/ai/ai-types.ts:

typescript
export type AIProvider = "openai" | "anthropic" | "newprovider";

Cost Management

  • Set token limits: Use `maxTokens` to cap response length
  • Monitor usage: Query `AIUsage` table regularly
  • Entitlement gating: Only Pro users get AI access
  • Rate limiting: Prevent abuse with per-user limits
  • Track by provider: Usage stats broken down by provider

Provider Cost Comparison

Costs vary significantly between providers and models. As of late 2024:

ProviderModelInput (per 1M tokens)Output (per 1M tokens)
OpenAIgpt-4o-mini~$0.15~$0.60
Anthropicclaude-3-5-sonnet~$3.00~$15.00
Geminigemini-1.5-flash~$0.075~$0.30

Note: Prices change frequently. Check provider pricing pages for current rates. Gemini offers a free tier with rate limits.

Related Sections