In this tutorial, we are going to learn how to extract text from a PDF file to a Text file using Python.

Before we dive into tutorial, you will need to install PyPDF2 library (pip install PyPDF2).


Buy Me a Coffee? Your support is much appreciated!
PayPal Me: https://www.paypal.me/jiejenn/5
Venmo: @Jie-Jenn





Source Code:

from PyPDF2 import PdfFileReader, PdfFileWriter

file_path = 'Lecture.pdf'
pdf = PdfFileReader(file_path)

with open('Lecture Note.txt', 'w') as f:
    for page_num in range(pdf.numPages):
        # print('Page: {0}'.format(page_num))
        pageObj = pdf.getPage(page_num)

        try: 
            txt = pageObj.extractText()
            print(''.center(100, '-'))
        except:
            pass
        else:
            f.write('Page {0}\n'.format(page_num+1))
            f.write(''.center(100, '-'))
            f.write(txt)
    f.close()