Streaming OpenAI completions with the Vercel Edge Runtime

I recently built my first AI-powered app on Vercel. Here's how I did it.

I recently built a couple of AI-powered development tools (Airtest and Refraction) using the OpenAI API. I was really impressed with the results of OpenAI and Vercel, so I wanted to share how I built both tools in less than 2 days.

One of the key components of this build was the Vercel Edge Runtime and more specifically, Edge Functions. The Edge Runtime is a new way to build serverless functions that run on the edge, and it's a great fit for AI-powered applications as opposed to traditional serverless functions.

The key reason we need the Edge Runtime for this example is the differences in limits. While serverless functions have a max execution time of 10 seconds, Edge Functions have a max execution time of 30 seconds. This is important because the OpenAI API can take a long time to respond, and we need to be able to stream the response back to the client. More importantly, while you need to end a response within 30 seconds, you may continue streaming a response beyond that time.

Additionally, the Edge Runtime actually supports streaming responses, which AFAIK is not supported by serverless functions. This is important because the OpenAI API supports streaming responses, and we want to take advantage of that.

Anyway, enough preamble. Let's get into it.

Connecting to the OpenAI REST API

The first thing we need to do is connect to the OpenAI API. While you can use the OpenAI JS SDK, there is a bit of a question around streaming responses, so I found it easier to hit the API directly with fetch. Let's create an Edge Function that will connect to the OpenAI API and send the response back to the client.

pages/api/generate.ts
import type { NextRequest } from 'next/server';
 
if (!process.env.OPENAI_API_KEY) {
  throw new Error('Missing Environment Variable OPENAI_API_KEY');
}
 
export const config = {
  runtime: 'edge',
};
 
const handler = async (req: NextRequest): Promise<Response> => {
  if (req.method !== 'POST') {
    return new Response('Method Not Allowed', { status: 405 });
  }
 
  const { prompt } = (await req.json()) as {
    prompt?: string;
  };
 
  if (!prompt) {
    return new Response('Bad Request', { status: 400 });
  }
 
  const payload = {
    model: 'text-davinci-003',
    prompt,
    temperature: 0.7,
    top_p: 1,
    frequency_penalty: 0,
    presence_penalty: 0,
    max_tokens: 2048,
    n: 1,
  };
 
  const res = await fetch('https://api.openai.com/v1/completions', {
    headers: {
      'Content-Type': 'application/json',
      Authorization: `Bearer ${process.env.OPENAI_API_KEY ?? ''}`,
    },
    method: 'POST',
    body: JSON.stringify(payload),
  });
 
  const data = await res.json();
 
  return new Response(data, {
    status: 200,
  });
};
 
export default handler;

Too easy! This will connect to the OpenAI API and return the response to the client. The only thing we need to do now is modify the response to be a stream.

pages/api/generate.ts
import type { NextRequest } from 'next/server';
 
if (!process.env.OPENAI_API_KEY) {
  throw new Error('Missing Environment Variable OPENAI_API_KEY');
}
 
export const config = {
  runtime: 'edge',
};
 
const handler = async (req: NextRequest): Promise<Response> => {
  if (req.method !== 'POST') {
    return new Response('Method Not Allowed', { status: 405 });
  }
 
  const { prompt } = (await req.json()) as {
    prompt?: string;
  };
 
  if (!prompt) {
    return new Response('Bad Request', { status: 400 });
  }
 
  const payload = {
    model: 'text-davinci-003',
    prompt,
    temperature: 0.7,
    top_p: 1,
    frequency_penalty: 0,
    presence_penalty: 0,
    max_tokens: 2048,
    stream: true,
    n: 1,
  };
 
  const res = await fetch('https://api.openai.com/v1/completions', {
    headers: {
      'Content-Type': 'application/json',
      Authorization: `Bearer ${process.env.OPENAI_API_KEY ?? ''}`,
    },
    method: 'POST',
    body: JSON.stringify(payload),
  });
 
  const data = res.body;
 
  return new Response(data, {
    headers: { 'Content-Type': 'application/json; charset=utf-8' },
  });
};
 
export default handler;

Perfect. Now we just need to hit up our Edge Function from the client.

Connecting to the Edge Function from the client

Now that we have our Edge Function, we need to connect to it from the client. I build all my projects on Next.js, so I'll assume you're familiar with the architecture. If not, I recommend checking out the Next.js docs to get started.

Let's create a new page that will connect to our Edge Function and handle the response stream. You'll need to implement useEffects and such to handle the state, but I'll leave that up to you.

Update: I published my solution as a package called parse-json-sse so you don't have to write the crazy SSE parsing code yourself.

pages/generate.tsx
import parseJsonSse from '@beskar-labs/parse-json-sse';
 
/* ... */
 
const [text, setText] = useState('');
 
/* ... */
 
const response = await fetch('/api/generate', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    prompt,
  }),
});
 
if (!response.ok) {
  throw new Error(response.statusText);
}
 
const data = response.body;
 
if (!data) {
  return;
}
 
await parseJsonSse<{
  id: string;
  object: string;
  created: number;
  choices?: {
    text: string;
    index: number;
    logprobs: null;
    finish_reason: null | string;
  }[];
  model: string;
}>({
  data,
  onParse: (json) => {
    if (!json.choices?.length) {
      throw new Error('Something went wrong.');
    }
 
    const { text } = json.choices[0];
 
    onUpdate(text);
  },
  onFinish,
});

That's it! Using this approach, we can now generate text in real-time and display it to the user as it's being generated. I'd love to know if I can improve this in any way, so please let me know on Twitter.