Using Google’s Vision API cloud service, we can extract and detect different information and data from an image/file. In this tutorial we are going to learn how to extract text from a PDF (or TIFF) file using the DOCUMENT_TEXT_DETECTION feature.

Extract text from a PDF/TIFF file using Vision API is actually not as straightforward as I initial thought it would be. For instance, you cannot reference a file stored on your PC, instead, you have to first store the PDF/TIFF file on your Google Cloud Storage (this is a different product from Google Drive), and extract the file using the Cloud Storage API.


The Vision API will only accept PDF/TIFF fewer than 2,000 pages.





Script used in the tutorial