Build an Intelligent Voice Assistant in Python

Welcome back! In today’s video, we’re going to embark on an exciting journey of building our very own intelligent voice assistant in Python. But what makes this assistant special is that it only reacts when we call out its name, acting as a wake word. You have the freedom to choose the name you want to give to your assistant. So, without further ado, let’s jump right into it!

Contents

Getting Started
Implementation

Getting Started

Before we dive into the coding and implementation, let me show you what our final product will look like. When you run the script, a graphical user interface (GUI) will open with a bot icon. The icon will initially be black, indicating that the assistant is inactive and not listening to commands. However, when we say the wake word (let’s say “Hey Jake” for the sake of this example), the icon will turn red, indicating that the assistant is now active and ready to process voice commands.

Let’s run the script now. I’ll keep the introduction short so you can witness the assistant in action. After the wake word, I’ll demonstrate some basic commands that won’t be processed. Then, I’ll use the wake word again to show you that the assistant will process the commands.

(test)

As you can see, the assistant is capable of understanding and responding to various commands. But what’s even more interesting is that we can customize its capabilities by providing a JSON file. We’ll discuss this further in a moment. The JSON file I used for the demonstration can be found here.

Further reading: I Tried Hiring AI Experts for Image Generation: Here's What Happened

Now that we have a preview of what we’re going to build, let’s move on to the implementation.

Implementation

For this project, we’ll rely on three external modules or libraries: SpeechRecognition, NeuralIntents, and pyttsx3. These libraries will provide us with the necessary functionality to achieve our goals. Let’s install them.

!pip install speechrecognition
!pip install neuralintents
!pip install pyttsx3

Once we have our dependencies installed, let’s start coding. We’ll begin by importing the required modules.

import speech_recognition as sr
import pyttsx3
from neuralintents import GenericAssistant

Next, we’ll create a class for our assistant and initialize it.

class Assistant:
    def __init__(self):
        self.recognizer = sr.Recognizer()
        self.speaker = pyttsx3.init()
        self.speaker.setProperty('rate', 150)
        self.assistant = GenericAssistant('intents.json')
        self.assistant.train_model()

In the code above, we initialize the SpeechRecognition recognizer, the pyttsx3 speaker, and load our JSON file using the GenericAssistant from NeuralIntents.

The JSON file, called ‘intents.json’, defines the types of requests our assistant can handle. It consists of multiple intents, each with a tag, patterns, and responses. The patterns are sample sentences, and the responses are the assistant’s predefined replies. You can customize this file according to your needs.

Let’s continue with the code:

    def run_assistant(self):
        with sr.Microphone() as mic:
            self.recognizer.adjust_for_ambient_noise(mic, duration=0.2)
            audio = self.recognizer.listen(mic)
            text = self.recognizer.recognize_google(audio).lower()

            if "hey jake" in text:
                self.label.config(foreground="red")
                audio = self.recognizer.listen(mic)
                text = self.recognizer.recognize_google(audio).lower()

                if text == "stop":
                    self.speaker.say("Goodbye!")
                    self.speaker.runAndWait()
                    self.root.destroy()
                    sys.exit(0)

                response = self.assistant.request(text)

                if response:
                    self.speaker.say(response)
                    self.speaker.runAndWait()

In the code above, we utilize the SpeechRecognition library to listen to the microphone and convert the audio into text. If the text contains our wake word (“Hey Jake”), the label on the GUI turns red, indicating that the assistant is active. Next, we listen for another input and convert it into text.

Further reading: Generating Poetic Texts with Recurrent Neural Networks in Python

If the text is “stop,” the assistant will bid farewell, close the GUI, and exit the program. Otherwise, the assistant will process the input using the neural network model and provide a response if applicable.

    def create_file(self):
        with open('output.txt', 'w') as f:
            f.write("Hello, world!")

The create_file method is a simple function that creates a file named ‘output.txt’ with the content “Hello, world!” This function serves as an example of connecting a tag to a functionality. We can associate the “file” tag with the create_file method, allowing us to perform custom actions based on specific tags.

Finally, all we need to do is instantiate our assistant and run it:

if __name__ == "__main__":
    assistant = Assistant()
    assistant.run_assistant()

That’s it! We’ve successfully built our voice assistant with a wake word using Python. Feel free to experiment with the JSON file to customize your assistant’s capabilities.

I hope you found this tutorial helpful and enjoyed building your own voice assistant. If you have any questions or comments, feel free to leave them below. Don’t forget to like this video and subscribe to our channel for more exciting content. Until next time, happy coding!

Techal