📢 Estimate your Fabric capacity needs! Check out Microsoft Fabric SKU Estimator. Learn More ×
#

Bringing AI Conversations to Life

#Malvine Owuor May 26th
Read Aloud 1324 Views

Imagine being able to talk to your application as naturally as you would with a friend, asking it questions, and getting instant verbal responses. This isn't a scene from a sci-fi movie—it's a reality you can create using Azure AI Speech and Azure OpenAI Service. In this blog post, we'll show you how to harness these powerful tools to build an application that can understand and respond to spoken language. Whether you're developing a customer service bot, a virtual assistant, or an interactive educational tool, integrating Azure Speech-to-Speech can significantly enhance user engagement and experience. Let's dive into how you can make your application conversant, bridging the gap between humans and machines with the seamless integration of Azure's cutting-edge technologies.

Prerequisites

  • Azure subscription - Create one for free
  • Create a Microsoft Azure OpenAI Service resource in the Azure portal.
  • Deploy a model in your Azure OpenAI resource. For more information about model deployment, see the Azure OpenAI resource deployment guide.
  • Get the Azure OpenAI resource key and endpoint. After your Azure OpenAI resource is deployed, select Go to resource to view and manage keys. For more information about Azure AI services resources, see Get the keys for your resource.
  • Create a Speech resource in the Azure portal.
  • Get the Speech resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Azure AI services resources, see Get the keys for your resource.
  • Install the Microsoft Visual C++ Redistributable for Visual Studio 2015, 2017, 2019, and 2022 for your platform. Installing this package for the first time might require a restart.
  • On Linux, you must use the x64 target architecture.

Note: Install  Python from 3.7 or later

Implementation in your application

In this step, I am assuming you have already set up the environment for creating the speech-to-speech chat.

Once you have set up the environment, you can clone the GitHub repo through this link  Alternatively you can consider creating app.py file in your VS code and copying the below codes and pasting it.

 

import azure.cognitiveservices.speech as speechsdk
from openai import AzureOpenAI
# This example requires environment variables named "OPEN_AI_KEY", "OPEN_AI_ENDPOINT" and "OPEN_AI_DEPLOYMENT_NAME"
# Your endpoint should look like the following https://YOUR_OPEN_AI_RESOURCE_NAME.openai.azure.com/

api_key = "YOUR_OPEN_AI_KEY";

client = AzureOpenAI(
   
azure_endpoint="YOUR_AZURE_END_POINT",
   
api_key=api_key,
   
api_version="2024-02-15-preview"
)
# This will correspond to the custom name you chose for your deployment when you deployed a model.
deployment_id= “YOUR_MODEL_NAME”;
# This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
speech_config = speechsdk.SpeechConfig(subscription='YOUR_SPEECH_SERVICE_KEY', region='eastus')
audio_output_config = speechsdk.audio.AudioOutputConfig(
use_default_speaker=True)
audio_config = speechsdk.audio.AudioConfig(
use_default_microphone=True)

# Should be the locale for the speaker's language.
speech_config.speech_recognition_language="en-US"
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

# The language of the voice that responds on behalf of Azure OpenAI.
speech_config.speech_synthesis_voice_name='en-US-JennyMultilingualNeural'
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_output_config)
# tts sentence end mark
tts_sentence_end = [ ".", "!", "?", ";", "。", "!", "?", ";", "\n" ]

# Prompts Azure OpenAI with a request and synthesizes the response.
def ask_openai(prompt):
   
# Ask Azure OpenAI in streaming way
   
response = client.chat.completions.create(model=deployment_id, max_tokens=200, stream=True, messages=[
        {
"role": "user", "content": prompt}
    ])
    collected_messages = []
    last_tts_request =
None

   
# iterate through the stream response stream
   
for chunk in response:
       
if len(chunk.choices) > 0:
            chunk_message = chunk.choices[
0].delta.content  # extract the message
           
if chunk_message is not None:
                collected_messages.append(chunk_message) 
# save the message
               
if chunk_message in tts_sentence_end: # sentence end found
                   
text = ''.join(collected_messages).strip() # join the recieved message together to build a sentence
                   
if text != '': # if sentence only have \n or space, we could skip
                       
print(f"Speech synthesized to speaker for: {text}")
                        last_tts_request = speech_synthesizer.speak_text_async(text)
                        collected_messages.clear()
   
if last_tts_request:
        last_tts_request.get()


# Continuously listens for speech input to recognize and send as text to Azure OpenAI
def chat_with_open_ai():
   
while True:
       
print("Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation.")
       
try:
           
# Get audio from the microphone and then send it to the TTS service.
           
speech_recognition_result = speech_recognizer.recognize_once_async().get()

           
# If speech is recognized, send it to Azure OpenAI and listen for the response.
           
if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:
               
if speech_recognition_result.text == "Stop.":
                   
print("Conversation ended.")
                   
break
               
print("Recognized speech: {}".format(speech_recognition_result.text))
                ask_openai(speech_recognition_result.text)
            
elif speech_recognition_result.reason == speechsdk.ResultReason.NoMatch:
               
print("No speech could be recognized: {}".format(speech_recognition_result.no_match_details))
               
break
            elif
speech_recognition_result.reason == speechsdk.ResultReason.Canceled:
                cancellation_details = speech_recognition_result.cancellation_details
               
print("Speech Recognition canceled: {}".format(cancellation_details.reason))
                
if cancellation_details.reason == speechsdk.CancellationReason.Error:
                   
print("Error details: {}".format(cancellation_details.error_details))
       
except EOFError:
           
break

# Main

try:
    chat_with_open_ai()

except Exception as err:
   
print("Encountered exception. {}".format(err))

Once the above step is completed, you will be required to install two libraries (openai and azure.cognitiveservices.speech)

Consider running the following commands in terminal for the installation:

pip install openai azure-cognitiveservices-speech
Note: Remember to replace the open ai key and endpoint as well as the azure speech service key with the specified fields in the code (all these are available through the azure portal on the open ai and speech service you created.) before running the application otherwise you will encounter errors

Conclusion

Bringing Azure Speech-to-Speech into your application is a game-changer for creating more natural and engaging user experiences. With Azure AI Speech and Azure OpenAI Service, your app can understand and respond to spoken language, making interactions seamless and intuitive.

In this guide, we've covered everything from setting up your environment to implementing speech recognition and synthesis in your app. With these tools, you're ready to build applications that can converse with users in real-time.

As you continue developing, think about adding features like multi-language support and personalized responses. The potential is limitless, and the technology keeps improving.

We hope this guide has inspired you to create amazing, interactive applications with Azure Speech-to-Speech. Happy coding!

References

For more information regarding the step-by-step implementation, visit Microsoft's official documentation website.

 


Recent post

Blog Image
Blog Image
Blog Image
Blog Image
Resolving Data Import Errors in Power BI
  • March 24th, 2025
  • 285 Views
Blog Image
Blog Image
Power Automate’s New AI Features
  • March 3rd, 2025
  • 347 Views
Blog Image
Row Labels in Power BI
  • March 3rd, 2025
  • 312 Views
Blog Image
Blog Image
Blog Image
All You Need to Know About Copilot
  • Jan 24th, 2025
  • 386 Views
Blog Image
Power Platform AI Builder
  • Jan 24th, 2025
  • 435 Views
Blog Image
Blog Image
Blog Image
Azure OpenAI and SQL Server
  • Dec 4th, 2024
  • 549 Views
Blog Image
Microsoft Ignite 2024
  • Nov 27th, 2024
  • 538 Views
Blog Image
SQL Server 2025
  • Nov 27th, 2024
  • 588 Views
Blog Image
AI Agents
  • Nov 12th, 2024
  • 590 Views
Blog Image
Blog Image
Blog Image
Blog Image
Introduction to Databricks
  • Oct 1st, 2024
  • 754 Views
Blog Image
Blog Image
Elevating Data to the Boardroom
  • Aug 20th, 2024
  • 1239 Views
Blog Image
Semantic Model and Why it matters
  • Aug 13th, 2024
  • 1188 Views
Blog Image
Blog Image
Center of Excellence(COE) Kit
  • July 15th, 2024
  • 1277 Views
Blog Image
Blog Image
Choosing a fabric data store
  • June 21st, 2024
  • 1228 Views
Blog Image
Blog Image
Blog Image
Blog Image
Killing Virtualization for Containers
  • April 30th, 2024
  • 422 Views
Blog Image

We Value Your Privacy

We use cookies to enhance your browsing experience, serve personalized content, and analyze our traffic. By clicking "Accept All", you consent to our use of cookies, see our privacy policy. You can manage your preferences by clicking "customize".