Tuesday, December 1, 2009

Text Detection Web Service

This week is already active the Text Detection Web Service. This new service allows to get the document MPEG-7/XML, resultant of the execution of the Text Detector application. The document indicates the location of the text found for the input image.

The following step is that this service returns also the text found.
The following figure shows the results of the application Text Detector. For any input image, there is obtained a document MPEG-7/XML, an image for every text box detected containing the part of the image of the text box, and also a few images with format TIFF containing the text found.The resultant TIFF images are used to extract detected text, as an input for the application OCR. This application returns a text file for every box of text found, that contains the detected text, like shows the next figure.The work of this week is going to consist of reading these text files and to include the text found in the file MPEG-7/XML that returns the Web Service.

Friday, November 27, 2009

Switching from global to local annotations

The semantics contained in an image can be annotated over two visual scales: global or local. In the global case the aea of support is the whole image, whereas in the local case the area of support is a subset of the image pixels. The area of support will depend on the nature of the semantic class. For example, abstract semantic concepts such as landscape or sport event may be represented by all the image pixels, while objects car or football player may be represented by a specific subset of the image pixels.

Before this summer GAT allowed local annotation of an image using region-based approach or drawing points, lines, rectangles and polygons. The first task of Khristina this summer was to implement a new functionality for GAT, the global annotation, that is, the annotation of a sequence of images at the same time. Next I integrated the sequence annotation with the local annotation, choosing between Sequence Annotator tab and Image Annotator tab in the graphical user interface. Last step has been to allow the switch from the global to the local annotation.

The mechanism is simple. If the user is annotating a sequence of images, he must select the desired thumbnail by double clicking. This action opens the Image Annotor tab, that will display the image. At that moment the user can annotate the image at region or graphic level.

This new functionality saves time to the user, since he must only open a directory instead of opening the files separately.

The screenshots below you show an example. In the first one, a thumbnail is selected with a red frame. When the user double-clicks on it, the local annotation tab comes to the front with its associated tools.

Thursday, November 19, 2009

Image retrieval querying with text and images

During the last month Ramon and I have been working hard in writing about Digimatge, a Rich Internet Application to search videos in the CCMA Multimedia Asset Manager. He is planning to defend his master thesis in a month, while I was writing a a paper to the demo session in ACM MIR next March in Philadelphia. During this period we have studied many image search engines that based on textual or visual descriptors, or both at the same time. I would like to comment on a few we have considered interesting.

Searching for images with text is the most common method nowadays. Firstly, because humans are really good at explaining what we are looking for with words and, secondly, because the database technology in text retrieval is pretty mature and fast... at least, more than the multimedia-based. Nevertheless, this approach requires generating tags associated to the images. A first option is the manual annotation of the content, a very consuming task when performed by a human. The solution is automatation of the process. The most popular search engines nowadays, such as Google Images or Microsoft's Bing Images, find the tags the image filenames or contextual web text. Alternatively, another solutions try to extract textual data by looking at the images themselves. The first option is just to look for "text" appearing on the image and the read it, in the DIRS system. A second option is to to teach the computer to understand certain objects or concepts that appear on images. As this second option is my Phd thesis topic, I am really excited about it !

A growing trend in commercial search engines is introducing visual similarity criteria. In these case, signal processing algoithms automatically generate the metadata, so the annotation and tagging problem is not there anymore. There exist different options for the users to express their queries visually. In one hand, they can just choose what visual features (color or textures) they are looking for, as it is the case of Idée Multicolr. In the other hand, the user can choose to provide examples or sketches of what he/she wants, as the versatile Simplicity from Stanford University proposes or GOS, the tool implemented at our lab, which already allows global scale similarity and will very soon do the same at a region scale.

Two modalities to search. Which is the best one ? We have text, which lets you express your ideas pretty precisely, but there are also visual cues, which are immediately available to be used as they do not require annotation. The trend right now is to combine both in multimodal interfaces ready to process text and visual queries. In most cases, text is used for an initial fast search and later results are filtered according to visual similarity. This is the enhancement that Google Similar Images brought a few months ago, and also the strategy of Sapir, Xcavator and Picitup Shop.

In this last category is where Digimatge falls. Let's cross our fingers and wish the congress reviewers are interested in knowing how these hybrid videotextual queries will be applied in a broadcaster domain as it is the case of CCMA.

Tuesday, November 10, 2009

Finalizing UPload

These last weeks we have been finalizing the graphic interface UPload. The view of directories in the tree has been optimized, and has been implemented the possibility to show and upload directories in the navigation panel, as shown below.

The left panel, which displays the directory tree of the local disk, now reads the user's own folder, in order to not display directories of other users, and thereby simplify the tree. Moreover, now not appears all the path of the directory in the branches of the tree, only the name of each folder.

When the user selects any directory of the tree, in the navigation panel (right), now appears images and folders that contains, as shown in the image.


These directories can be selected to upload to the server. Finally, it has added the possibility to enter the URL of an image or directory in the text field, to show this location on the image panel.

In the coming days we are goig to write the user guide for the application, which will be proved by different users.

Saturday, November 7, 2009

Hot tech topics... according to my students

During the last two months my students following the course "Content Management and Delivery" have proposed online readings to their mates. The texts, always in English, were found on blogs or news websites. Each article has received the grading of the rest of the class, providing a meter of what topics interest most to these future young engineers. This post provides an overview of the top rated websites.

The winner, by far, was a research description from ICT Graphics Lab in the University of Southern California (USC). When the first 3D flat screen devices are knocking at the door of everybody's home, this work by Jones et al provides a solution to keep eye contact in teleconferencing using a next generation 3D visualization device.

The second most voted article refers to augmented reality for mobile devices. This app, called Layar, has been developed by the Dutch company SPRXMobile and visualizes data embedded on an image captured by a phone. The student also found a video demo that provides a better understanding of its possiblities. The application can be downloaded for free from the iTunes app store... if your phone includes a compass, as it is the case of the iPhone 3GS.

The third winner belonged to the acquisition category and also dealt with 3D, a growing trending topic everywhere. This is a research paper from Sheng, Balakrishan and Singh from the Department of Computer Science. As this video shows, they designed an interface for virtual 3D sculpting that tries to simulate the traditional interaction of fingers with clay or foam.

These three articles were the best ranked among a total of sixty. I am surprised that two of them belong to university research, which represented a small fraction of the overall. Most students had presented technologies developed and run by the industry. I am also satisfied that the two hot topics they chose, 3D and mobile apps, correspond to the subjects which have caused the greatest buzz in the tech blogs and magazines I follow. I hope these readings inspire them to direct their career as engineer in a promising direction.

Monday, October 26, 2009

New Web Services for Text Detection and MPEG-7 Dominant Color

This week we are defining the objectives of two new web services that will be created in these weeks. These applications will allow to get certain features of the images previously loaded in the sever Upseek.

The first web service will be used to detect text in images. It will use an application developed by Miriam Leon in her PhD supervised by Antoni Gasull, the Text Detector, which returns, in one hand, images with text areas that have been found, and in the other hand, images with the text found as shows the following figure.


The application will return a file in MPEG-7/XML format indicating how many areas have been found and the text they contain.

The second web service also returns the same type of file format indicating the three most dominant colors in the image and the percentage that each occupies in the image. This application will use one visual image descriptor, the MPEG-7 Dominant Color, implemented by Carlos Ventura in his Master Thesis.

Thursday, October 8, 2009

FedoraCommons repository

Digital Objects

Assets are managed inside FedoraCommons repository as Digital Objects.

The basic components of a Fedora digital object are:

  • PID: A Persistent, unique IDentifier for the object.
  • Object Properties: A set of system-defined descriptive properties that are necessary to manage and track the object in the repository.
  • Datastream(s): The element in a Fedora digital object that represents a content item (image, thumbnail, metadata...)

Searching the repository

We can do searches in order to see which assets are already ingested in Fedora:

For example:
  • Go to: http://IP:8080/fedora/search
  • Search all fields for i3mam:389
  • Click the i3mam:389 link below and hit view the item index for this object
You can see that there are 3 datastreams:
  • DC: Dublin Core which contains metadata about the ingested image.
  • REELS-EXT: Contains information about relationships with other assets.
  • image-in: the ingested image itself.


There is also and advanced search function:
http://IP:8080/fedora/risearch

We can also retrieve the datastreams directly from the URL by typing the name of the desired datastream like this:

Dublin Core Datastream :
http://IP:8080/fedora/get/i3mam:389/DC

image-in datastream:
http://IP:8080/fedora/get/i3mam:389/image-in

Fedora Web Administrator


FedoraCommons provide us with a graphical interface to manage digital objects in a more friendly way. It is called the Fedora Web Admin and it is a flash based application like this:
http://IP:8080/fedora/admin/


You can try to search the same digital object by going to:
Object > open object > PID > i3mam:389 > open

How other applications interact with the Upseek Server


  • Upload is a graphical interface which will be used to ingest digital assets to FedoraCommons as well as related metadata. Internally, Upload application call Fedora's java methods to acomplish this purpose.
  • GAT will retrieve an image from Fedora, annotate it, and then, send the XML/MPEG-7 annotation file to the Fedora repository and it will be saved as an additional datastream of the digital object.
  • GOS will have the possibitility to ingest the image which will be used to perform the QueryByExample search.
How to ingest assets in Fedora

We have developed a class called ingest.java (edu.upc.tsc.gps.gpi.upseek.client.comms.client) which has Java methods for ingesting, deleting and updating digital objects and datastreams.

For example, if we would like to ingest an image we could use the following method:
public String ingestImage (URI uri, String creator, String title, String subject, String identifier, String assetID).

(Note: we have omitted the Upseek IP address for security issues.)