Week 11 - Application Example: Photo OCR

Week 11 - Application Example: Photo OCR #

Problem Description and Pipeline #

  • Photo OCR: Photo Optical Character Recognition.
  • Photo OCR Pipeline(a machine learning pipeline):
  • Pipelines are common terms in machine learning
    • Separate modules which may each be a machine learning component or data processing component

Sliding Windows Classifier #

  • In order to talk about detecting things in images let’s start with a simpler example of pedestrian detection.

Supervised Learning for Pedestrian Detection #

  • Given the training set:

    • $x$ = pixels in 82x36 image patches
  • Now we have a new image - how do we find pedestrians in it?

      • Start by taking a rectangular 82x36 patch in the image
      • Keep stepping rectangle along all the way to the right with 4 pixels/step(always 5-8 pixels).
      • Then move back to the left hand side but step down a bit too.
      • Keep the steps until the last line.
      • Now we initially start with a larger image patch (of the same aspect ratio)
      • Each time we process the image patch, we’re resizing the larger patch to a smaller image, then running that smaller image through the classifier.
    • Hopefully, we will eventually get this:

  • Back to Text Detection

    • Like pedestrian detection, we generate a labeled training set with
      • Positive examples (some kind of text)
      • Negative examples (not text)
    • Having trained the classifier we apply it to an image
      • So, run a sliding window classifier at a fixed rectangle size
      • If you do that end up with something like this
        • Black - no text
        • White - text
      • For text detection, we want to draw rectangles around all the regions where there is text in the image
      • Take classifier output and apply an expansion algorithm
        • Takes each of white regions and expands it
      • Look at connected white regions in the image above

Character Segmentation #

  • Look in a defined image patch and decide, is there a split between two characters?
  • Train a classifier to classify between positive and negative examples
  • Use a 1-dimensional sliding window to move along text regions
    • Does each window snapshot look like the split between two characters?
      • If yes insert a split
      • If not move on

Character Classification #

  • Multi-class characterization problem

Getting Lots of Data: Artificial Data Synthesis #

Artificial Data Synthesis for Photo OCR #

  1. Use computer’s font library, or online font libraries. Take different fonts, paste them with random backgrounds
  2. Distort the exist data set
  • Synthesizing data by introducing distortions: Speech recognition
    • We can add noisy background to the original audio to make it unclear

Getting More Data #

  • When do we need to get more data?
    • Make sure we have a low bias classifier. (Plot learning curves)
  • When we really need it, ask ourselves: “How much work would it be to get 10x as much data as we currently have?”
    • Artificial data synthesis ­* Collect/label it yourself
    • “Crowd source” (E.g. Amazon Mechanical Turk)

Ceiling Analysis #

  • Estimating the errors due to each component.
  • Decide what part of the pipeline we should spend the most time trying to improve.
  • Take the Photo OCR pipeline as the example:
    • We find that our test set has 72% accuracy.
    • Steps: 1. Go to the first module - Text detection. Manually tell the algorithm where the text is. * Simulate if your text detection system was 100% accurate * Check how this change affects the accuracy of the overall system. * Accuracy goes up to 89% 2. Next do the same for the character segmentation * Accuracy goes up to 90% now 3. Finally doe the same for character recognition * Goes up to 100%
    • Base on the analysis, we know which module to improve.

Refers #