How to Add Streaming Responses with OpenAI API (Step by Step)

Q: What's Next

Consider adding authentication to your Flask app. Securing your OpenAI key and controlling access is crucial if you plan to deploy this app. Resources are available that cover Flask authentication.

📖 5 min read•903 words•Updated Apr 9, 2026

How to Add Streaming Responses with OpenAI API

We’re adding streaming responses with the OpenAI API to enhance user interactions and create a more dynamic experience. Traditional request-response patterns can often lead to delays and a less-than-ideal user experience, especially for applications requiring real-time data.

Prerequisites

Python 3.11+
pip install openai
Flask 2.0+
Basic knowledge of how APIs work

Step 1: Set Up Your Project Environment


mkdir streaming_response_example
cd streaming_response_example
python -m venv venv
source venv/bin/activate # For Windows use `venv\Scripts\activate`
pip install openai flask

This command creates a virtual environment and installs the necessary packages. Running the activate command is essential to avoid polluting your global Python environment. Trust me, I once tangled myself in a web of conflicting dependencies — never again!

Step 2: Obtain Your OpenAI API Key

To start using the OpenAI API, you need a valid API key. You can get it from the OpenAI website. Once you have it, store it securely, like in an environment variable:


export OPENAI_API_KEY='your-api-key-here' # For Windows use `set OPENAI_API_KEY=your-api-key-here`

Hard-coding this key in your code is a rookie mistake. If your code gets public, consider it compromised.

Step 3: Create Your Flask Server to Handle Requests


from flask import Flask, Response
import openai
import os

app = Flask(__name__)
openai.api_key = os.getenv("OPENAI_API_KEY")

@app.route('/stream')
def stream():
 def generate():
 response = openai.ChatCompletion.create(
 model="gpt-3.5-turbo",
 messages=[{"role": "user", "content": "Hello, how can I use streaming responses?"}],
 stream=True
 )
 for chunk in response:
 yield chunk['choices'][0]['delta']['content']
 
 return Response(generate(), mimetype='text/event-stream')

if __name__ == '__main__':
 app.run(debug=True)

Here, we set up a simple Flask server that streams responses. The component openai.ChatCompletion.create takes care of initiating the streaming. If you try running this code and see a Missing API key error, it’s proof that you forgot to set your API key. It happens to the best of us.

Step 4: Test Your Streaming Endpoint


curl http://127.0.0.1:5000/stream

Using curl, you can hit the endpoint you just created. If everything works correctly, you’ll see a streaming output in your terminal. This approach allows you to incrementally process and display data, offering an engaging user experience. Remember, if you run into a Connection Refused error, verify that your Flask app is actually running.

Step 5: Consuming the Stream in a Client Application





 
 
 Streaming Response Example
 


 OpenAI Streaming Response
 





Here’s a basic HTML page to show how to process the stream in the browser. The JavaScript fetches your route and appends the streamed data to a <pre> tag. You can enhance the UI as per your liking. If you don’t see any output, check console for CORS issues. You may need to use Flask CORS if you’re working with different domains.
The Gotchas

Network Issues: Streaming responses can falter over slow connections. If you decide to support users with spotty networks, consider retries or handling timeouts gracefully.
Rate Limiting: OpenAI applies rate limits based on your subscription type. Make sure to monitor this, or your app could choke or be throttled.
Token Limits: Be aware of the token limits for the models you’re working with. Overrunning these will result in truncated responses, which can seriously hinder the utility of your output.
Browser Compatibility: While most modern browsers support the Fetch API, if you still support older browsers, consider fallbacks or polyfills.
Debugging Output: Make sure to log any API errors received during the stream. They can be vague and confusing, leading to endless debugging sessions.

Full Code

from flask import Flask, Response
import openai
import os

app = Flask(__name__)
openai.api_key = os.getenv("OPENAI_API_KEY")

@app.route('/stream')
def stream():
 def generate():
 response = openai.ChatCompletion.create(
 model="gpt-3.5-turbo",
 messages=[{"role": "user", "content": "Hello, how can I use streaming responses?"}],
 stream=True
 )
 for chunk in response:
 yield chunk['choices'][0]['delta']['content']
 
 return Response(generate(), mimetype='text/event-stream')

if __name__ == '__main__':
 app.run(debug=True)





 
 
 Streaming Response Example
 


 OpenAI Streaming Response
 





What’s Next
Consider adding authentication to your Flask app. Securing your OpenAI key and controlling access is crucial if you plan to deploy this app. Resources are available that cover Flask authentication.
FAQ

Q: Can I use other OpenAI models with streaming?
A: Yes, as long as the model supports streaming.
Q: What if I get an API limit error?
A: Check your OpenAI account and ensure you’re not exceeding your daily call limits.
Q: How do I handle JSON in streamed responses?
A: You’ll need to piece together JSON responses using the chunks, as they’ll often arrive as a concatenated string.

Data Sources

OpenAI API Documentation
Flask Documentation

Last updated April 09, 2026. Data sourced from official docs and community benchmarks.

Related Articles

AI Cost Optimization: Reduce Spending Without Sacrificing Quality
My 2026 Take: Serverless Cold Starts Still Sting
How to Implement Webhooks with PydanticAI (Step by Step)


You May Also Like
→ Weights & Biases vs MLflow: Which One for Startups
→ Sourcegraph Cody Checklist: 8 Items Before Going Live
→ Im Losing Money: My API Latency Is Too High
→ AI agent performance automation
→ Zilliz Review 2026: What Side Projects Gained After 8 Months
🕒 Published: April 9, 2026
📚 You Might Also Like
Suivi de la performance de l’agent AI
Tecnicas de Otimização da Memória para Agent AI
Desbloqueando Eficiência: Dicas e Truques Práticos para Processamento em Lote com Agentes
Mes découvertes sur le coût du Cloud : Performance de l'Agent & Infrastructure
✍️
Written by Jake Chen
AI technology writer and researcher.
Learn more →
Related Articles
My Cloud Cost Discoveries: Agent Performance & Infrastructure
AI agent performance budgets
AI agent performance dashboards
7 Agent Memory Design Mistakes That Cost Real Money