Optical Character Recognition Engine to extract Food-items and Prices from Grocery Receipt Images via Templating and Dictionary-Traversal Technique

Ali Sohani; Rafi Ullah; Faraz Ali; Athual Rao; Richard Messier

doi:10.51153/kjcis.v2i1.21

Vol. 2 No. 1 (2019), Articles

Vol. 2 No. 1 (2019)

Optical Character Recognition Engine to extract Food-items and Prices from Grocery Receipt Images via Templating and Dictionary-Traversal Technique

Articles

https://doi.org/10.51153/kjcis.v2i1.21

Published 2019-01-01

Ali Sohani⁺⁻
Rafi Ullah⁺⁻
Faraz Ali⁺⁻
Athual Rao⁺⁻
Richard Messier⁺⁻

Ali Sohani

Data Science Department, Cubix, Pakistan

Rafi Ullah

Data Science Department, Cubix, Pakistan

Faraz Ali

Data Science Department, Cubix, Pakistan

Athual Rao

Data Science Department, Cubix, Pakistan

Richard Messier

Data Science Department, Cubix, Pakistan

PDF

Keywords

Accurate image to text converter
Receipt parsing using template matching
OCR using receipts template
Text retrieval from receipts images

How to Cite

Ali Sohani, Rafi Ullah, Faraz Ali, Athual Rao, & Richard Messier. (2019). Optical Character Recognition Engine to extract Food-items and Prices from Grocery Receipt Images via Templating and Dictionary-Traversal Technique. KIET Journal of Computing and Information Sciences, 2(1), 15. https://doi.org/10.51153/kjcis.v2i1.21

Abstract

This paper proposes a mix of some old and few novel techniques to nail down the fundamental problem of Food-Items and Prices recognition and eventual extraction of them from the Grocery Receipts. Considering in our research we didn't find any existing OCR engine that is up to that standard let alone specialized for this specific purpose. Since the target was to create a specialized OCR system, we began with an idea of creating the wrappers around basic OCR system to empower it with context of Grocery Receipt. For this, we've built pre-function and post-function wrappers over existing system called Tesseract-OCR. Our system follows specific work-flow to enhance basic OCR output. First it runs the provided image to image filters to make it most suitable for Section-level extraction. Our system then bifurcates the image into sections (like Price, Item-Names, Quantity are dealt separately from one another) according to given template layouts. Specific portion of images (sections) are then forwarded to Tesseract engine for basic OCR. Then text-extracted is forwarded to a contextual pattern matcher, to make sense of the text-extracted in a contextual manner. After testing system on particular grocery stores receipts, we successfully conclude that our techniques significantly improve on both the accuracy of overall context based text recognition and close-match detection when compared to an unassisted/ vanilla Tesseract OCR. Proposed system will empower Food-Kitchen Assistance Mobile Apps in the market.

https://doi.org/10.51153/kjcis.v2i1.21

PDF

Most read articles by the same author(s)

Hajira Tabassum, Shah Muhammad Emaduddin, Aqsa Awan, Rafi Ullah, Multi-Class Emotion Detection (MCED) using Textual Analysis , KIET Journal of Computing and Information Sciences: Vol. 3 No. 1 (2020): Volume 3 | Issue 1 | Jan - June | 2020
Isra Khan, Ashhad Ullah, Rafi ullah, Shah Muhammad Emad, Robust Feature Extraction Techniques in Speech Recognition: A Comparative Analysis , KIET Journal of Computing and Information Sciences: Vol. 2 No. 2 (2019): Volume 2 | Issue 2 | July - Dec | 2019

Optical Character Recognition Engine to extract Food-items and Prices from Grocery Receipt Images via Templating and Dictionary-Traversal Technique

Keywords

How to Cite

Download Citation

Abstract

Most read articles by the same author(s)