\n\n\n\n How to Add Streaming Responses with OpenAI API (Step by Step) \n

How to Add Streaming Responses with OpenAI API (Step by Step)

📖 5 min read903 wordsUpdated Apr 9, 2026

How to Add Streaming Responses with OpenAI API

We’re adding streaming responses with the OpenAI API to enhance user interactions and create a more dynamic experience. Traditional request-response patterns can often lead to delays and a less-than-ideal user experience, especially for applications requiring real-time data.

Prerequisites

  • Python 3.11+
  • pip install openai
  • Flask 2.0+
  • Basic knowledge of how APIs work

Step 1: Set Up Your Project Environment


mkdir streaming_response_example
cd streaming_response_example
python -m venv venv
source venv/bin/activate # For Windows use `venv\Scripts\activate`
pip install openai flask

This command creates a virtual environment and installs the necessary packages. Running the activate command is essential to avoid polluting your global Python environment. Trust me, I once tangled myself in a web of conflicting dependencies — never again!

Step 2: Obtain Your OpenAI API Key

To start using the OpenAI API, you need a valid API key. You can get it from the OpenAI website. Once you have it, store it securely, like in an environment variable:


export OPENAI_API_KEY='your-api-key-here' # For Windows use `set OPENAI_API_KEY=your-api-key-here`

Hard-coding this key in your code is a rookie mistake. If your code gets public, consider it compromised.

Step 3: Create Your Flask Server to Handle Requests


from flask import Flask, Response
import openai
import os

app = Flask(__name__)
openai.api_key = os.getenv("OPENAI_API_KEY")

@app.route('/stream')
def stream():
 def generate():
 response = openai.ChatCompletion.create(
 model="gpt-3.5-turbo",
 messages=[{"role": "user", "content": "Hello, how can I use streaming responses?"}],
 stream=True
 )
 for chunk in response:
 yield chunk['choices'][0]['delta']['content']
 
 return Response(generate(), mimetype='text/event-stream')

if __name__ == '__main__':
 app.run(debug=True)

Here, we set up a simple Flask server that streams responses. The component openai.ChatCompletion.create takes care of initiating the streaming. If you try running this code and see a Missing API key error, it’s proof that you forgot to set your API key. It happens to the best of us.

Step 4: Test Your Streaming Endpoint


curl http://127.0.0.1:5000/stream

Using curl, you can hit the endpoint you just created. If everything works correctly, you’ll see a streaming output in your terminal. This approach allows you to incrementally process and display data, offering an engaging user experience. Remember, if you run into a Connection Refused error, verify that your Flask app is actually running.

Step 5: Consuming the Stream in a Client Application





 
 
 Streaming Response Example
 


 

OpenAI Streaming Response




Here’s a basic HTML page to show how to process the stream in the browser. The JavaScript fetches your route and appends the streamed data to a <pre> tag. You can enhance the UI as per your liking. If you don’t see any output, check console for CORS issues. You may need to use Flask CORS if you’re working with different domains.

The Gotchas

  • Network Issues: Streaming responses can falter over slow connections. If you decide to support users with spotty networks, consider retries or handling timeouts gracefully.
  • Rate Limiting: OpenAI applies rate limits based on your subscription type. Make sure to monitor this, or your app could choke or be throttled.
  • Token Limits: Be aware of the token limits for the models you’re working with. Overrunning these will result in truncated responses, which can seriously hinder the utility of your output.
  • Browser Compatibility: While most modern browsers support the Fetch API, if you still support older browsers, consider fallbacks or polyfills.
  • Debugging Output: Make sure to log any API errors received during the stream. They can be vague and confusing, leading to endless debugging sessions.

Full Code


from flask import Flask, Response
import openai
import os

app = Flask(__name__)
openai.api_key = os.getenv("OPENAI_API_KEY")

@app.route('/stream')
def stream():
 def generate():
 response = openai.ChatCompletion.create(
 model="gpt-3.5-turbo",
 messages=[{"role": "user", "content": "Hello, how can I use streaming responses?"}],
 stream=True
 )
 for chunk in response:
 yield chunk['choices'][0]['delta']['content']
 
 return Response(generate(), mimetype='text/event-stream')

if __name__ == '__main__':
 app.run(debug=True)




 
 
 Streaming Response Example
 


 

OpenAI Streaming Response




What’s Next

Consider adding authentication to your Flask app. Securing your OpenAI key and controlling access is crucial if you plan to deploy this app. Resources are available that cover Flask authentication.

FAQ

  • Q: Can I use other OpenAI models with streaming?
    A: Yes, as long as the model supports streaming.
  • Q: What if I get an API limit error?
    A: Check your OpenAI account and ensure you’re not exceeding your daily call limits.
  • Q: How do I handle JSON in streamed responses?
    A: You’ll need to piece together JSON responses using the chunks, as they’ll often arrive as a concatenated string.

Data Sources

Last updated April 09, 2026. Data sourced from official docs and community benchmarks.

🕒 Published:

✍️
Written by Jake Chen

AI technology writer and researcher.

Learn more →
Browse Topics: benchmarks | gpu | inference | optimization | performance
Scroll to Top