Dr. Bill Barrett
Students working with Dr. Bill Barrett in the Computer Graphics, Vision, and Image Processing Laboratory are striving to make handwritten text documents more searchable and readable.
The task of creating searchable indexes for handwritten documents is painstaking. In order for handwriting recognition, also known as "word spotting" to work, systems must be given training data. This data consists of examples or words from the documents which are transcribed and labeled according to their meanings. The great variability in handwriting means that a significant amount of training is necessary for each document, even for documents written by a single author.
Doug Kennard, a doctoral candidate working with Dr. Barrett in the lab, is researching ways to decrease the amount of training necessary in the handwriting recognition process. He aims at speeding up the process and reducing associated costs.
Kennard is also working on word separation-striving to make it easier for the computer to distinguish individual words by recognizing gaps in lines of text.
The technology on which Kennard is focused makes it possible for genealogists to conduct automatic searches for individual words, such as names and locations, without having to read through pages of text which are often faded and difficult to decipher.
Kennard is also working on applying this technology to the translation of documents, which will break down the language barrier that so often plagues genealogists.
sing Kennard's program, if a genealogist were to search for the name "Jacobs," the software would automatically scan through the electronic document for the word. The search results, sorted by relevance, would then be listed on the right hand side of the screen.
If the user were to then click on a result, the section of the document in which the word is used would show up on the left hand side of the screen, allowing the user to see his or her search word in the context of the original document.
The program's simple user face makes it easy for anyone, regardless of their level of technological savvy, to have a successful experience with the program. It also reduces the number of hours it normally takes to find important information in digitized family history documents.
Another of Dr. Barrett's students, Oliver Nina, is conducting his research on the thresholding of text documents. Thresholding, also known as binarization, is the process of separating an image into two parts, the object of interest and the background (as pictured below).
Binarization is used when trying to achieve optimal character recognition in scanned microfilm and scanned text documents.
Typically, thresholding algorithms are efficient when it comes to isolating the targeted object (text in the case of microfilm and text documents). Problems arise, however, when the object of interest is similar in color to the background. This is often the case with faded documents and light pen strokes. In many cases, important pixels from the image are lost as the text is separated from the background. Faced with this problem, Nina came up with an innovative solution known as Rotsu, or Recursive Otsu. Using this approach, Nina takes the original image and applies a variety of thresholding techniques, each one capturing a different part of the object of interest. He then "adds up" the results of each thresholding technique to achieve an astonishingly clear final product.
Although Nina still considers Rotsu to be a work in progress, it has shown promising potential thus far. Rotsu allows for softer strokes and text on faded documents, elements which would otherwise be indecipherable, to be preserved. It is also surprisingly easy to implement, making Rotsu available for use as a practical solution for a wide range of people. Furthermore, Rotsu has opened up the door to new ideas regarding the improvement of thresholding techniques.
To the right are examples of the Rotsu process of on text documents.
Image 1 shows an original scanned text document.
Image 2 shows a scanned text document after traditional thresholding techniques have been applied. Because the lighter pen strokes in this image are similar in color to the background, they are lost when subjected to traditional thresholding techniques.
Image 3 shows a scanned text document after the Rotsu algorithm has been applied. Note that the lighter pen strokes, which were lost in the second image, have been preserved with the Rotsu technique.
Work in graphics at BYU delves into realms beyond family history. Dr. Bill Barrett and Chris Armstrong, a master's student working in the lab, created a tool known as "Live Surface," which allows surgeons to instantly visualize any part of a patient's anatomy by extracting a 3D computer image from an MRI, CT scan, or similar data with just a few clicks of the mouse.
According to Dr. Barrett, the main goal in the development of Live Surface was to give a powerful, practical, and interactive tool to physicians that could be used to look at a patient's anatomy in 3D detail.
Live Surface distinguishes itself from existing software by its speed, its high level of interaction, and the amount of detail it gives back to the user. It also allows users to easily isolate "tricky" areas of the anatomy, such as soft tissue-blood vessels, organs, and muscle-that most other programs are unable to extract. In addition to the breakthrough with soft tissues, Live Surface also improves viewing of the "hard stuff" (bones). Simpler techniques often overestimate, underestimate, and fuse joints together, while Live Surface neatly and accurately separates them to let physicians interact with these areas.
Live Surface works by extracting information from data collected in 3D volumes-CT scans, MRIs, or 3D ultrasounds. With a click and drag of the mouse, a user identifies the object he or she wishes to extract. Next, the user identifies those portions of the data that surround the object. Immediately, the desired object is extracted form the data.
The program is able to work rapidly because it extracts the object of interest using a hierarchical algorithm, or a set of mathematical rules, that tells the computer to eliminate irrelevant information in broad, coarse, cuts. Once the bulk of unwanted data is gone, the computer is free to make more refined calculations more quickly and isolate the object of interest.
After a surgeon extracts a 3D image of a person's heart or brain, for example, the image can be projected onto the patient's body, fitted to create a road map for the surgeon as he or she operates. Additionally, doctors can use the tool to make better diagnoses after visualizing a patient's organs from multiple angles. It also allows them to do be more accurate and efficient in locating cancerous tumors.
Research for Live Surface was partially funded by Adobe, makers of the popular image-editing program Photoshop. Dr. Barrett's lab has had a long-running relationship with Adobe-Live Surface builds on his development of Intelligent Scissors, a program that allows users to quickly pull 2D objects out of images. The Intelligent Scissors design, renamed "Magnetic Lasso," was incorporated into 5.0 and subsequent versions of Photoshop. It is currently used by millions of designers, artists, and photographers.
Brian Price, a doctoral candidate, is another one of Dr. Barrett’s students who is making headlines. Price is exploring image vectorization, the process of converting a scanned or digital photo into a two-dimensional computer graphic.
Currently, when a computer “looks” at a photograph, it sees only pixels and has no understanding of the larger image. However, image vectorization makes the computer aware of the larger context.
With image vectorization, the computer understands that in a picture of a human, the hand is connected to the arm, which is connected to the body. This level of comprehension allows a person using the program to select a part of the image and automatically turn it into a graphic. Once in graphic form, the photo can be edited, made to look cartoon-like, or enhanced to look more realistic. Image vectorization also allows for editing layers within the graphic. Working with a picture of a person’ face in a standard graphics program, users are unable to alter the size of the eye without changing the size of the entire face. However, with image vectorization, the image is split into layers, so users can change the size or shape of the eyes without altering the rest of the face.
Another example is given with a series of baseball mitt images that Price created (pictured below). Using the image vectorization process, he was able to turn a photograph of a mitt into a graphic and then alter it to look like the baseball was ripping through the leather of the glove.
The end goal of Price’s project with image vectorization is to have it incorporated into a program like Adobe Illustrator, to be used by artists and graphic designers.
Computer Science Department
Brigham Young University
3361 TMCB PO Box 26576
Provo, Utah 84602