AI Infrastructure
Server-side AI infrastructure with provider abstraction, usage tracking, and entitlement-based access control.
What This System Does
The AI module provides:
- Unified interface for OpenAI, Anthropic, and Google Gemini
- Server-only execution (API keys never exposed to client)
- Entitlement-based access (`ai_access`)
- Per-user rate limiting
- Usage tracking for cost monitoring
- Normalized error handling
Why It Exists
AI capabilities are increasingly important for modern SaaS products. This system provides a production-ready abstraction layer that handles multiple providers, tracks usage, enforces rate limits, and integrates with your entitlement system for feature gating.
When You Need to Care About It
You'll interact with the AI system when:
- Adding AI-powered features to your product
- Switching between AI providers
- Monitoring AI usage and costs
- Configuring rate limits or safety limits
- Adding new AI providers
Key Concepts
Provider Abstraction
The system provides a unified interface across OpenAI, Anthropic, and Gemini. You can switch providers or let the system choose automatically based on availability.
Entitlement Gating
AI access requires the `ai_access` entitlement, which is assigned to the Pro plan by default. This ensures only paying users can access AI features.
Usage Tracking
All AI calls are tracked in the `ai_usage` table, recording tokens used, provider, model, and user. This enables cost monitoring and usage analytics.
Rate Limiting
Per-user rate limits prevent abuse. Default limits are 10 requests per minute and 60 requests per hour. These are in-memory for single-instance deployments.
Supported Providers
| Provider | Models | Default Model | Token Tracking |
|---|---|---|---|
| OpenAI | GPT-4o, GPT-4o-mini, etc. | gpt-4o-mini | Exact |
| Anthropic | Claude 3.5 Sonnet, etc. | claude-3-5-sonnet-20241022 | Exact |
| Gemini | Gemini 1.5 Flash, Pro, etc. | gemini-1.5-flash | Exact or estimated* |
*Gemini provides token counts in most responses. When unavailable, tokens are estimated (~4 chars/token).
Quick Start
1. Add API Keys
Add one or more provider keys to `.env`:
# OpenAI
OPENAI_API_KEY=sk-your-openai-key
# Anthropic
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key
# Google Gemini
GEMINI_API_KEY=AIzaSy-your-gemini-keyGet your API keys from:
2. Run Database Migration
pnpm prisma db push pnpm db:seedThis creates the `ai_usage` table and adds the `ai_access` entitlement.
3. Use the AI Client
import { aiClient, AIError } from "@/lib/ai";
// In an API route
const response = await aiClient.generate(
{
prompt: "Explain quantum computing",
maxTokens: 500,
},
{ userId: user.id }
);
console.log(response.content);Important Files
/lib/ai/ai-client.tsMain entry point for AI operations
🔒 CORE - Do not modify
/lib/ai/ai-types.tsType definitions and limits
🔒 CORE - Do not modify
/lib/ai/ai-errors.tsError handling and normalization
🔒 CORE - Do not modify
/lib/ai/ai-usage.tsUsage tracking functions
🔒 CORE - Do not modify
/lib/ai/ai-rate-limit.tsRate limiting logic
🔒 CORE - Do not modify
/lib/ai/providers/*.tsProvider implementations
🔒 CORE - Do not modify
/api/ai/generate/route.tsExample API endpoint
🏗️ EDITABLE - Use as reference
Core Functions / Methods
aiClient.generate(request, options)
Generate text from a prompt. Returns a response with content, provider, model, and usage statistics.
Location: /lib/ai/ai-client.ts
interface AIGenerateRequest {
prompt: string;
systemPrompt?: string;
provider?: "openai" | "anthropic" | "gemini";
model?: string;
maxTokens?: number;
temperature?: number;
}
const response = await aiClient.generate(
{
prompt: "Hello",
provider: "openai",
maxTokens: 100,
},
{ userId: "user_123" }
);
// Response
{
content: "Hi there!",
provider: "openai",
model: "gpt-4o-mini",
usage: {
promptTokens: 5,
completionTokens: 10,
totalTokens: 15
}
}aiClient.generateStream(request, options)
Generate text with streaming for real-time responses.
Location: /lib/ai/ai-client.ts
const stream = aiClient.generateStream(
{ prompt: "Write a story" },
{ userId: "user_123" }
);
for await (const chunk of stream) {
process.stdout.write(chunk.content);
if (chunk.done) {
console.log("\nUsage:", chunk.usage);
}
}aiClient.isAIConfigured()
Check if any provider is configured. Returns boolean.
Location: /lib/ai/ai-client.ts
aiClient.getAvailableProviders()
Get list of configured providers. Returns array of provider names.
Location: /lib/ai/ai-client.ts
getUserAIUsage(userId, startDate)
Get usage statistics for a user within a time period.
Location: /lib/ai/ai-usage.ts
getUserDailyTokens(userId)
Get today's token usage for a user.
Location: /lib/ai/ai-usage.ts
Switching Providers
To switch providers, pass the `provider` parameter:
// Use OpenAI (default if configured)
await aiClient.generate({ prompt: "Hello", provider: "openai" }, { userId });
// Use Anthropic
await aiClient.generate({ prompt: "Hello", provider: "anthropic" }, { userId });
// Use Gemini
await aiClient.generate({ prompt: "Hello", provider: "gemini" }, { userId });If no provider is specified, the system uses the first available in order: OpenAI → Anthropic → Gemini.
Changing Models
Default Models
Each provider has a default model configured in its implementation file:
// lib/ai/providers/openai.ts
const DEFAULT_CONFIG = {
defaultModel: "gpt-4o-mini", // ← Change this
defaultMaxTokens: 1024,
defaultTemperature: 0.7,
timeoutMs: AI_LIMITS.DEFAULT_TIMEOUT_MS,
};Override Models Per Request
Pass the `model` parameter to use a different model for a specific request:
// Use specific model for this request
const response = await aiClient.generate(
{
prompt: "Hello",
model: "gpt-4o", // Override default gpt-4o-mini
},
{ userId: "user_123" }
);Entitlement Check
AI access requires the `ai_access` entitlement. Check it in API routes:
// Server-side check
import { hasEntitlement } from "@/lib/auth-utils";
export async function POST(request: NextRequest) {
const hasAI = await hasEntitlement("ai_access");
if (!hasAI) {
return NextResponse.json(
{ error: "AI access requires Pro subscription" },
{ status: 403 }
);
}
// ... continue with AI call
}The `ai_access` entitlement is assigned to the Pro plan by default. Modify /prisma/seed.ts to change this.
Usage Tracking
All AI calls with a `userId` are automatically tracked.
import { getUserAIUsage, getUserDailyTokens } from "@/lib/ai";
// Get usage for time period
const usage = await getUserAIUsage(userId, new Date("2024-01-01"));
// {
// totalRequests: 150,
// totalTokens: 45000,
// promptTokens: 15000,
// completionTokens: 30000,
// byProvider: {
// openai: { requests: 100, tokens: 30000 },
// anthropic: { requests: 50, tokens: 15000 }
// }
// }
// Get today's token usage
const dailyTokens = await getUserDailyTokens(userId);
// 5000Rate Limiting
| Limit | Value |
|---|---|
| Requests per minute | 10 |
| Requests per hour | 60 |
Rate limiting is per-user, in-memory. For production with multiple instances, implement Redis-based limiting.
Configure in /lib/ai/ai-rate-limit.ts.
Error Handling
All provider errors are normalized to `AIError`:
import { AIError } from "@/lib/ai";
try {
await aiClient.generate({ prompt: "..." }, { userId });
} catch (error) {
if (error instanceof AIError) {
// Safe message for client
const message = error.toClientMessage();
// Error details for logging
console.log({
code: error.code, // "RATE_LIMITED", "TIMEOUT", etc.
provider: error.provider,
retryable: error.retryable,
statusCode: error.statusCode,
});
}
}Error Codes
| Code | Description | Retryable |
|---|---|---|
| PROVIDER_NOT_CONFIGURED | API key missing | No |
| RATE_LIMITED | Too many requests | Yes |
| TIMEOUT | Request timed out | Yes |
| CONTEXT_LENGTH_EXCEEDED | Prompt too long | No |
| CONTENT_FILTERED | Content blocked | No |
| INSUFFICIENT_QUOTA | Billing issue | No |
Errors are automatically captured in Sentry (except rate limits and content filters).
Safety Limits
Built-in limits prevent runaway costs:
| Limit | Value |
|---|---|
| Max tokens per request | 4,096 |
| Max prompt length | 100,000 chars |
| Request timeout | 30 seconds |
Configure in /lib/ai/ai-types.ts:
export const AI_LIMITS = {
MAX_TOKENS_PER_REQUEST: 4096,
DEFAULT_TIMEOUT_MS: 30000,
MAX_PROMPT_LENGTH: 100000,
} as const;What You Can Customize
- Default models for each provider (in provider files)
- Rate limits (in
/lib/ai/ai-rate-limit.ts) - Safety limits (in
/lib/ai/ai-types.ts) - Which plan gets `ai_access` entitlement (in
/prisma/seed.ts) - API endpoint implementation (use
/api/ai/generate/route.tsas reference)
What NOT to Touch
- Core AI client implementation (
/lib/ai/ai-client.ts) - Provider interface definitions (
/lib/ai/ai-types.ts) - Error handling logic (
/lib/ai/ai-errors.ts) - Usage tracking database operations (
/lib/ai/ai-usage.ts) - Provider implementations (
/lib/ai/providers/*.ts) - unless adding a new provider
Adding a New Provider
1. Create Provider File
Create /lib/ai/providers/newprovider.ts:
import {
AIProviderInterface,
AIGenerateRequest,
AIGenerateResponse,
} from "../ai-types";
class NewProvider implements AIProviderInterface {
readonly name = "newprovider" as const;
isAvailable(): boolean {
return !!process.env.NEWPROVIDER_API_KEY;
}
async generate(request: AIGenerateRequest): Promise<AIGenerateResponse> {
// Implementation
}
}
export const newProvider = new NewProvider();2. Register in AI Client
Update /lib/ai/ai-client.ts:
import { newProvider } from "./providers/newprovider";
function getProvider(name: AIProvider) {
switch (name) {
case "openai":
return openaiProvider;
case "anthropic":
return anthropicProvider;
case "newprovider":
return newProvider;
}
}3. Update Type Definition
Update /lib/ai/ai-types.ts:
export type AIProvider = "openai" | "anthropic" | "newprovider";Cost Management
- Set token limits: Use `maxTokens` to cap response length
- Monitor usage: Query `AIUsage` table regularly
- Entitlement gating: Only Pro users get AI access
- Rate limiting: Prevent abuse with per-user limits
- Track by provider: Usage stats broken down by provider
Provider Cost Comparison
Costs vary significantly between providers and models. As of late 2024:
| Provider | Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|---|
| OpenAI | gpt-4o-mini | ~$0.15 | ~$0.60 |
| Anthropic | claude-3-5-sonnet | ~$3.00 | ~$15.00 |
| Gemini | gemini-1.5-flash | ~$0.075 | ~$0.30 |
Note: Prices change frequently. Check provider pricing pages for current rates. Gemini offers a free tier with rate limits.
Related Sections
- Entitlements - Understanding feature gating
- Environment Variables - API key configuration
- Database - Understanding the `ai_usage` table