AI Blog MS Rajput: How to Extract Text From Images

This Blogs will show you how to extract text from a PDF or an image with Tesseract OCR.

Optical Character Recognition(OCR) is the process of electronically extracting text from images or any Documents like PDF and reusing it in a variety of ways such as full text searches.

OCR technology is used to convert virtually any kind of images containing written text (typed, Handwritten or printed) into machine-readable text data

Pytesseract recognize and read the text present in images. It can read all image types png, jpeg, jpg etc.

It’s widely used to process everything from scanned documents or you can read more Click here

one more point we need to smooth our input images using some opencv functions methods Click here

Noise is random variation of brightness or colour in an image, that can make the text of the image more difficult to read. Certain types of noise cannot be removed by Tesseract in the binarisation step, which can cause accuracy rates to drop

Converting image to grayscale

img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

Applying dilation and erosion to remove the noise (you may play with the kernel size depending on your image)

   kernel = np.ones((2, 2), np.uint8)
   img = cv2.dilate(img, kernel, iterations=1)
   img = cv2.erode(img, kernel, iterations=1)

Don't worry guys. i will be providing full code so you can easily integrate to your text extraction projects. or also create flask api using this blog Click here .

You need to clone my project on Github

before this process we will create virtual environment

cmd :- virtualenv local

cmd :- source local/bin/activate

cmd :- git clone

cmd :- cd text_extract

cmd :- pip install -r req.txt

Note :- During this installation if you face such type of error so you can follow this command.

cmd :- sudo apt install tesseract-ocr

cmd :- sudo apt install libtesseract-dev

Now we will run this project.

This is the simple script for simple input image.

cmd :- python simple.py --input input_image.png

Now we will use some complex images like color images with apply some opencv functions.

cmd :- after_smoth.py --input 1.jpg

In this images i have extract scores on football score board you can also try different score board but you need to play some opencv function according to your input images.

Thanks.

AI Blog MS Rajput

Wednesday, July 15, 2020

How to Extract Text From Images

No comments:

Post a Comment

Popular Posts