Week 11 - Application Example: Photo OCR #

Problem Description and Pipeline #

Photo OCR: Photo Optical Character Recognition.
Photo OCR Pipeline(a machine learning pipeline):
Pipelines are common terms in machine learning
- Separate modules which may each be a machine learning component or data processing component

In order to talk about detecting things in images let’s start with a simpler example of pedestrian detection.

Look in a defined image patch and decide, is there a split between two characters?
Train a classifier to classify between positive and negative examples
Use a 1-dimensional sliding window to move along text regions
- Does each window snapshot look like the split between two characters?
  - If yes insert a split
  - If not move on

Use computer’s font library, or online font libraries. Take different fonts, paste them with random backgrounds
Distort the exist data set

Synthesizing data by introducing distortions: Speech recognition
- We can add noisy background to the original audio to make it unclear

When do we need to get more data?
- Make sure we have a low bias classifier. (Plot learning curves)
When we really need it, ask ourselves: “How much work would it be to get 10x as much data as we currently have?”
- Artificial data synthesis * Collect/label it yourself
- “Crowd source” (E.g. Amazon Mechanical Turk)

Estimating the errors due to each component.
Decide what part of the pipeline we should spend the most time trying to improve.
Take the Photo OCR pipeline as the example:
- We find that our test set has 72% accuracy.
- Steps: 1. Go to the first module - Text detection. Manually tell the algorithm where the text is. * Simulate if your text detection system was 100% accurate * Check how this change affects the accuracy of the overall system. * Accuracy goes up to 89% 2. Next do the same for the character segmentation * Accuracy goes up to 90% now 3. Finally doe the same for character recognition * Goes up to 100%
- Base on the analysis, we know which module to improve.