
Modernizing Power BI Development: A Project with PBIP, TMDL, CI/CD, and More
- May 1st, 2025
- 152 Views
Imagine being able to talk to your application as naturally as you would with a friend, asking it questions, and getting instant verbal responses. This isn't a scene from a sci-fi movie—it's a reality you can create using Azure AI Speech and Azure OpenAI Service. In this blog post, we'll show you how to harness these powerful tools to build an application that can understand and respond to spoken language. Whether you're developing a customer service bot, a virtual assistant, or an interactive educational tool, integrating Azure Speech-to-Speech can significantly enhance user engagement and experience. Let's dive into how you can make your application conversant, bridging the gap between humans and machines with the seamless integration of Azure's cutting-edge technologies.
Prerequisites
Note: Install Python from 3.7 or later
Implementation in your application
In this step, I am assuming you have already set up the environment for creating the speech-to-speech chat.
Once you have set up the environment, you can clone the GitHub repo through this link Alternatively you can consider creating app.py file in your VS code and copying the below codes and pasting it.
import azure.cognitiveservices.speech as speechsdk
from openai import AzureOpenAI
# This example requires environment variables named "OPEN_AI_KEY", "OPEN_AI_ENDPOINT" and "OPEN_AI_DEPLOYMENT_NAME"
# Your endpoint should look like the following https://YOUR_OPEN_AI_RESOURCE_NAME.openai.azure.com/
api_key = "YOUR_OPEN_AI_KEY";
client = AzureOpenAI(
azure_endpoint="YOUR_AZURE_END_POINT",
api_key=api_key,
api_version="2024-02-15-preview"
)
# This will correspond to the custom name you chose for your deployment when you deployed a model.
deployment_id= “YOUR_MODEL_NAME”;
# This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
speech_config = speechsdk.SpeechConfig(subscription='YOUR_SPEECH_SERVICE_KEY', region='eastus')
audio_output_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)
audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
# Should be the locale for the speaker's language.
speech_config.speech_recognition_language="en-US"
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
# The language of the voice that responds on behalf of Azure OpenAI.
speech_config.speech_synthesis_voice_name='en-US-JennyMultilingualNeural'
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_output_config)
# tts sentence end mark
tts_sentence_end = [ ".", "!", "?", ";", "。", "ï¼", "?", "ï¼›", "\n" ]
# Prompts Azure OpenAI with a request and synthesizes the response.
def ask_openai(prompt):
# Ask Azure OpenAI in streaming way
response = client.chat.completions.create(model=deployment_id, max_tokens=200, stream=True, messages=[
{"role": "user", "content": prompt}
])
collected_messages = []
last_tts_request = None
# iterate through the stream response stream
for chunk in response:
if len(chunk.choices) > 0:
chunk_message = chunk.choices[0].delta.content # extract the message
if chunk_message is not None:
collected_messages.append(chunk_message) # save the message
if chunk_message in tts_sentence_end: # sentence end found
text = ''.join(collected_messages).strip() # join the recieved message together to build a sentence
if text != '': # if sentence only have \n or space, we could skip
print(f"Speech synthesized to speaker for: {text}")
last_tts_request = speech_synthesizer.speak_text_async(text)
collected_messages.clear()
if last_tts_request:
last_tts_request.get()
# Continuously listens for speech input to recognize and send as text to Azure OpenAI
def chat_with_open_ai():
while True:
print("Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation.")
try:
# Get audio from the microphone and then send it to the TTS service.
speech_recognition_result = speech_recognizer.recognize_once_async().get()
# If speech is recognized, send it to Azure OpenAI and listen for the response.
if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:
if speech_recognition_result.text == "Stop.":
print("Conversation ended.")
break
print("Recognized speech: {}".format(speech_recognition_result.text))
ask_openai(speech_recognition_result.text)
elif speech_recognition_result.reason == speechsdk.ResultReason.NoMatch:
print("No speech could be recognized: {}".format(speech_recognition_result.no_match_details))
break
elif speech_recognition_result.reason == speechsdk.ResultReason.Canceled:
cancellation_details = speech_recognition_result.cancellation_details
print("Speech Recognition canceled: {}".format(cancellation_details.reason))
if cancellation_details.reason == speechsdk.CancellationReason.Error:
print("Error details: {}".format(cancellation_details.error_details))
except EOFError:
break
# Main
try:
chat_with_open_ai()
except Exception as err:
print("Encountered exception. {}".format(err))
Once the above step is completed, you will be required to install two libraries (openai and azure.cognitiveservices.speech)
Consider running the following commands in terminal for the installation:
pip install openai azure-cognitiveservices-speech
Note: Remember to replace the open ai key and endpoint as well as the azure speech service key with the specified fields in the code (all these are available through the azure portal on the open ai and speech service you created.) before running the application otherwise you will encounter errors
Conclusion
Bringing Azure Speech-to-Speech into your application is a game-changer for creating more natural and engaging user experiences. With Azure AI Speech and Azure OpenAI Service, your app can understand and respond to spoken language, making interactions seamless and intuitive.
In this guide, we've covered everything from setting up your environment to implementing speech recognition and synthesis in your app. With these tools, you're ready to build applications that can converse with users in real-time.
As you continue developing, think about adding features like multi-language support and personalized responses. The potential is limitless, and the technology keeps improving.
We hope this guide has inspired you to create amazing, interactive applications with Azure Speech-to-Speech. Happy coding!
References
For more information regarding the step-by-step implementation, visit Microsoft's official documentation website.