How to convert pdf file into speech with python

Agenda:

Hey welcome back, so today we are going to do something to automate our reading task using Python. We are going to build a GUI program to select pdf files and then play them inside our software, exactly no more eye reading it'll read it for you and all you need to do is to sit back and enjoy.

Prerequisites:

Happy relationship with basic Python and tkinter.

https://media.giphy.com/media/3o7btNa0RUYa5E7iiQ/giphy.gif

yeah that's it, I'll be explaining the rest 😉.

Analysing:

So now we know what we are going to do so let's break it down into smaller chunks and focus on each of them individually.

First of all we are going to create a window and a dialog box to open the desired pdf file. We also create a text box for displaying the pdf text content and a button play for start playing it as audio.

Modules used:

tkinter (for dealing with GUI)
gTTS (for converting text into speech)
playsound (for playing the audio file)
PyMuPDF (for reading pdf files)

before moving ahead I want to tell you something that in most of the online tutorials you'll find people using PyPDF2 for working with pdf files but the reason we are not using it is because it does not always work, like till the date I'm writing this post if you use PyPDF2 for reading pdf generated by google like from google docs, it's not able to read text from it.

gTTS stands for Google Text To Speech it's a Python library as well as a CLI tool for converting text into speech.

playsound is also a Python library for playing audio files like .mp3 or .wav.

we are using playsound just for playing the audio file that will be created using gTTS, you can use any Python library for that like pydub or use os module to play on native audio player installed on terminal, but I guess this only works on Mac OS X and linux

Let's dive into the code now ✨

https://media.giphy.com/media/3oAt1TznOzEcx3MssU/giphy.gif

Step 1 :

In this step we'll be creating our GUI so open up your favourite code editor and create a file as main.py and import tkinter.

tkinter comes preinstalled with Python so no need to install it from pip.

from tkinter import *
from tkinter import filedialog

# creating main window instance from Tk class defined in tkinter 
window = Tk()
window.title("convert pdf to audiobook")
window.geometry("500x500")   # setting default size of the window  

# creating text box for displaying pdf content
text_box = Text(window, height=30, width=60)
text_box.pack(pady=10)

# creating menu instance from Menu class
menu = Menu(window)
window.config(menu=menu)

# adding `File` tab into menu defined above
file_menu = Menu(menu, tearoff=False)
menu.add_cascade(label="File", menu=file_menu)

# adding drop-downs to `file_menu`
file_menu.add_command(label="Open")
file_menu.add_command(label="clear")
file_menu.add_separator()
file_menu.add_command(label="Exit")

# adding play button for playing audio
play_btn = Button(text="Play")
play_btn.pack(pady=20)

# for keeping window open till we don't close it manually
window.mainloop()

Now if you run it, you'll see something like this,

Screenshot 2021-05-14 at 9.14.35 PM.png

Step 2 :

In this step we will create function open_pdf this function will create a dialogue box for selecting pdf file and then reading all of it text and showing inside the text box created earlier, then it'll use gTTS for creating audio file of all the text from text box.

import fitz  # fitz is actually PyMuPDF
from gtts import gTTS


def open_pdf():

    # creating dialogue box

    open_file = filedialog.askopenfilename(
        initialdir="/Users/swayam/Downloads/",
        title="Open PDF file",
        filetypes=(
            ("PDF Files", "*.pdf"),
            ("All Files", "*.*")
        )
    )

    if open_file:

        #reading pdf file and creating instance of Document class from fitz
        doc = fitz.Document(open_file)

        # getting total number of pages
        total_pages = doc.page_count

        # looping through all the pages, collecting text from each page and showing it on text box
        for n in range(total_pages):
            page = doc.load_page(n)
            page_content = page.get_textpage()
            content = page_content.extractText()
            text_box.insert(END, content)

        # after whole pdf content is stored then retrieving it from textbox and storing it inside variable 
        text = text_box.get(1.0, END)

        # using gTTS to convert that text into audio and storing it inside file named as audio.mp3
        tts = gTTS(text, lang='en')
        tts.save("audio.mp3")

You need to install gTTS and PyMuPDF, so inside your terminal run pip install PyMuPDF and pip install gTTS for installing them.

As you can see above code is self explanatory but still I want to highlight some points. First look at the line that says text_box.insert(END, content) basically END is defined inside tkinter and it returns the last index that means where is the end of file, similarly 1.0 means the beginning of the text.

So basically when we store the first page data inside text box then starting index == last index == END after that we'll keep inserting text at the end of the previous stored text.

Step 3 :

Now we have the function so it's time to provide each widget it's own functionality like pressing button and clicking on menu really perform something.

Go to the code and add command attribute to all the file_menu drop-downs and play_btn as show below

from playsound import playsound  

file_menu.add_command(label="Open", command=open_pdf)
file_menu.add_command(label="clear", command=lambda: text_box.delete(1.0, END))
file_menu.add_command(label="Exit", command=window.quit)

play_btn = Button(text="Play", command=lambda: playsound("audio.mp3"))

playsound requires pyobjc as dependency for working so you need to install it by pip install pyobjc

Basically function provided in command will execute as you click on the widget. For short function like clear or exit we used lambda functions.

window.quit will close the window and clear is self explanatory. As the audio.mp3 gets saved playsound("audio.mp3") will play it after you click the button.

So if you followed well then in the end your final code will somewhat look like:

from tkinter import *
from tkinter import filedialog
import fitz
from gtts import gTTS
from playsound import playsound 

window = Tk()
window.title("convert pdf to audiobook")
window.geometry("500x500")


def open_pdf():
    open_file = filedialog.askopenfilename(
        initialdir="/Users/swayam/Downloads/",
        title="Open PDF file",
        filetypes=(
            ("PDF Files", "*.pdf"),
            ("All Files", "*.*")
        )
    )

    if open_file:
        doc = fitz.Document(open_file)
        total_pages = doc.page_count
        for n in range(total_pages):
            page = doc.load_page(n)
            page_content = page.get_textpage()
            content = page_content.extractText()
            text_box.insert(END, content)

        text = text_box.get(1.0, END)
        tts = gTTS(text, lang='en')
        tts.save("audio.mp3")


text_box = Text(window, height=30, width=60)
text_box.pack(pady=10)


menu = Menu(window)
window.config(menu=menu)

file_menu = Menu(menu, tearoff=False)
menu.add_cascade(label="File", menu=file_menu)
file_menu.add_command(label="Open", command=open_pdf)
file_menu.add_command(label="clear", command=lambda: text_box.delete(1.0, END))
file_menu.add_separator()
file_menu.add_command(label="Exit", command=window.quit)

play_btn = Button(text="Play", command=lambda: playsound("audio.mp3"))
play_btn.pack(pady=20)

window.mainloop()

Let's test it

Now it's time to run our code and check if everything is working or not, take a sample pdf file with some text and open it.

https://youtu.be/nfDdssirelM

YAYYYYYY...... 🎉 🥳, WE DID IT GUYS

we just created our own pdf to audio book convertor, now if you want to go some steps further I will recommend you to read gTTS official documentation also if someone wants then you can convert this python script into exe file and share it with your friends so they can have fun too

https://media.giphy.com/media/ENagATV1Gr9eg/giphy.gif

let's keep converting python scripts to .exe files for next tutorial 😅.

What's next !

If you are still reading, make sure to follow me on Twitter as I share some cool projects and updates there and yeah don't forget I have some exciting stuff coming up every weekend. See Y'all next time and stay safe ^^ 🌻

How to convert PDF file into an Audiobook