Overview
Using Google’s Vision API cloud service, we can extract and detect different information and data from an image/file. In this tutorial we are going to learn how to extract text from a PDF (or TIFF) file using the DOCUMENT_TEXT_DETECTION feature.
Extract text from a PDF/TIFF file using Vision API is actually not as straightforward as I initial thought it would be. For instance, you cannot reference a file stored on your PC, instead, you have to first store the PDF/TIFF file on your Google Cloud Storage (this is a different product from Google Drive), and extract the file using the Cloud Storage API.
Limitations
The Vision API will only accept PDF/TIFF fewer than 2,000 pages.
Documentation
https://cloud.google.com/vision/docs/pdf#vision_text_detection_pdf_gcs-python
Hello, this works great but if I need to print all the PDF pages as one string (prior to processing it for natural language processing), do you know how this is possible? As printing the second page etc works by adjust the 0 to one in line 49 but if you go further and have it print the second page but its an PDF with only one page for example it errors out.
Hello sir Can we use Google vision api for AutoML Text Entity Extraction(NLP) to test,evaluate,train in GCP .You can suggest the best API to be used
Thank you sir