product analyzed

Seeing AI app

Seeing AI app

Seeing AI App Logo
Overall rating:

Categories assigned to this product within the main category Technology.


Categories assigned to this product within the main category Needs.

Related reviews:
Image gallery: skip gallery

Demo Seeing AI

Below is the video of the review.

What does it consist of:

Seeing AI is a Microsoft application developed for IOS devices that allows you to have different functionalities useful for people with blindness or low vision in the same application. This ongoing research project harnesses the power of AI to open up the visual world and describe nearby people, text, and objects, as well as describe routes.

Optimized for use with VoiceOver, the application allows you to recognize:

  • Short Text: Speak text as soon as it appears in front of the camera.
  • Documents - Provides an audio guide for capturing a printed page and recognizes the text, along with its original format.
  • Products: scan barcodes, using sound signals to guide you; hear package name and information when available. (Works with iPhone 6 and later).
  • People: save people's faces so you can recognize them and get an estimate of their age, gender, and emotions.
  • Scenes (early preview): Hear an overview of the captured scene.
  • Currency: recognizes currency notes. (Requires iOS 11).
  • Color: identifies the color.
  • Handwriting - reads handwritten text just like on greeting cards.
  • Light: generates an audible tone corresponding to the brightness of the environment.
  • Pictures in other apps: just tap "Share" and "Recognize with AI" to describe pictures from Mail, Photos, Twitter and more.
  • Photo browsing experience: describe the photos on your phone.


Seeing AI is designed to help you achieve more by harnessing the power of the cloud and artificial intelligence. As the investigation progresses, more channels can be added.

Forms of acquisition:

Seeing AI is only available for IOS and is free. It can be downloaded from the following link:

App Store


Technical assessment:

Each of these application features is called a channel. Channels may increase if new features are added.

The application allows, among others, to recognize text in documents and images, detect light intensity, identify colors or describe scenes.

When the application is opened, the camera viewfinder is displayed along with the menu button and quick help, as well as the channel selector and a button to pause and resume automatic detection.

All menus, buttons and information are in English, although the recognition language can be changed to different languages, including Spanish, as well as predefined the type of currency.

Some of the channels can work with automatic detection. Recognition accuracy can be affected by the user's pulse, the orientation of the document, and the distance to the document.



Application menu

The application menu allows you to access the application settings, the device's photo gallery and various information.

Search photos

This option allows you to access the device's photo gallery and recognize the content of the photo, be it a text or a scene.

During the tests carried out this option has successfully recognized the scenes that appeared in different photos stored in the device.


Application help content

This option allows you to access the help of the application.


This option allows you to contact the developers by sending an email with the aim of providing suggestions or communicating any type of incident.


App settings

This option allows you to configure different aspects of the application such as the type of currency, the ordering of the channels or voice settings, among others.


This option provides information about the application and the developers.


  • Short text
Photograph of the labeling text of a Samsung Galaxy J3 mobile phone case

This channel allows the identification of short texts in real time, such as the one that appears on product labels.

During the tests carried out, the application has identified with very good results the texts of packaging, product surfaces and even the screen of electronic devices. 

  • Document
Photograph of a document
Text of a document recognized by the application

This channel allows a text to be focused, captured and recognized. After this, the application displays a screen with the recognized text of the document.

In the tests carried out, it has been found that the recognition is very good, although it is influenced by different aspects such as the orientation of the document, the size or font or the type of document, among others.
The image on the left shows a photo of a document. The image on the right shows the text that the application has recognized in the document.

  • Products
Photograph of the barcode of a Bezoya mineral water bottle identified by the application

This channel allows identifying products through their barcode, provided that their information is available. To do this, the barcode is focused with the camera, which is responsible for capturing and identifying it.

In the tests carried out, the application has correctly identified the barcode. However, the identification of the product depends on its information being available in the database, as is the case with the Bezoya mineral water bottle that has correctly identified the application.

  • Persona
Photograph of a young woman identified and described by the application
Photograph of two young people identified and described by the application

This channel identifies how many people are in the image captured with the camera, how they dress, their facial features and age. For this channel to work properly, people must be not too far away.

During the tests carried out, the application has correctly identified people in terms of their sex and clothing, although it has given a variable range in relation to age.

In the image on the left, a young woman can be seen next to an English text provided by the app that reads "30 years old woman with black hair looking happy." . The image on the right shows a young man and woman with text provided by the app that reads “30 people detected. 2 years old man with brown hair looking happy. 36 years old woman with brown hair looking happy'.

  • Currency
Photograph of a € 20 banknote

This channel allows identifying the monetary value of the banknotes in the predefined currency and in real time.
In the tests carried out, it has been possible to verify that the application correctly identifies the banknotes, such as the € 20 banknote that can be seen in the image. Once the application has identified the value of the ticket, said value is spoken aloud.

  • World

It allows you to create routes and record them, to later follow them with headphones, listening as a reference to the sound emitted by probes, in order to guide the user through the reference points established when saving the route. This is a functionality under development, so the promoters of the app themselves let the user know that they are doing tests to receive feedback and continue implementing it.  

In a first access to the application, the possibility of an update has been notified with the explanation of this functionality, which allows us to understand what "World" is when using it through the channel of the different functionalities. 

From this point on, the next button is the one for creating and carrying out routes, narrated as “internal navigation, button”. Here you can access the routes that the user has saved or the option to add a new route. An explanatory help button can also be found at this point.  

The routes can be inside or outside a building, or combine both situations, allowing you to change floors or rooms within a building. During the tests, the compatibility of the Apple (Voice Over) or Android (Talkback) screen readers with the app's own narrations has been verified, that is, there are no overlaps or inconsistencies in the narration. 

Help page with information about what Mundo is
Window to add routes


– Create a route: It is done from the “add” button. The name of the route must be registered, establishing a point of departure and arrival. Then, the user is narrated by voice to turn around pointing on the screen from the starting point. The percentage of capture completed is indicated and the orientation of the phone upwards or downwards is suggested if it is necessary to capture an area other than the one focused. These indications help to understand the use of the application. 

To record the route, you simply have to move, focusing the screen in the direction you are walking, to the final destination.

Route creation information
Set a starting point and give it a name


– Follow a route: Standing at the starting point, the user can load a saved route. Once the upload is complete, you can start walking to the first waypoint (always walked points when saving a route). At these reference points, Seeing AI establishes a probe that emits a sound with more or less intensity depending on the distance the user is, a stereo surround sound that will also emit more or less intensity through the right or left earphone depending on where you have to go to get to the probe, and when you pass a reference point the next one starts to sound. For these reasons it is essential to use headphones.

Follow a route, share it or more.
follow a route

It highlights the professionalism to be a first version and the option for users to give feedback to improve it. the orOption to share routes in development in the future could give the user the option to walk routes without the need to create them. 

As cons, the battery consumption is very high. This also aggravates the fact that, when losing the connection or following a route, it has not been possible to continue from the point that was lost in the tests carried out, in such a way that the user would have to return to the starting point ( without sound references) and start over.  On the other hand, youA blind person wearing headphones can lose sensitivity; It is very important for these people to listen to what is happening around them, especially when they go out into the street. There are bone headphones that allow you to listen without having to cover your ears, the application could recommend their use. 

  •  Scene
Photograph of a scene
Scene with a text provided by the application that describes it

This channel allows you to describe the scene that appears in the image captured by the camera after pressing the take picture button. The application speaks aloud what is shown in the image.
The image on the left shows a woman sitting at a desk with a computer in front of her. The image on the right shows the same scene after being recognized by the application with a text in English that says "A Person sitting at a desk with a computer in an office chair." ("A person sitting at a desk with a computer in an office chair").

  • Color
Photograph showing the color identification of an object

This channel detects the main color or colors of an object or surface. The identification of the color can be affected by different reasons such as the hue of the same or the lighting of the environment. Generally, under proper conditions, the application correctly identifies the colors of the focused surface.
In the tests carried out, the application has successfully identified the colors of the objects in focus with the camera.

  • Handwriting
Photograph of a notebook with a handwritten text
Handwritten text from a notebook recognized by the app

This channel allows to recognize handwritten texts. When the application recognizes the text, it speaks it out loud.

The image on the left shows a photograph of a notebook with the following handwritten text: "At Orientatech we tested the handwriting recognition of the Seeing AI application." On the right is the screenshot with the text recognized by the application, which, as can be seen, has been recognized correctly.

  • Luz
Photograph showing a light source detected by the application

This channel allows the light intensity to be detected. To do this, use a musical scale in which the greater the intensity of the light, the sharper the musical notes that are played.
In the tests carried out, the application has reproduced the highest notes when the camera has focused on light-emitting objects, such as the computer screen or the light source that can be observed in the image.


Microsoft's Seeing AI app is a great tool for people with some form of visual impairment, especially those with very low vision or total blindness. This application brings together in a single app different functionalities that contribute to improving the activities of daily life and favor greater personal autonomy for the group with visual functional diversity.

It should be noted with a special mention the recognition of handwritten texts with great precision, as well as the identification of scenes and people. OCR (Optical Character Recognition) is also very useful, either for short texts such as packaging, or for documents. Of special relevance for people with total blindness is the identification of light intensity since it allows them to know, for example, if a lamp is on or off.

As for the 'World' route creation and tracking feature, this is a very powerful development idea that is on the right track. Although in the tests carried out (both technical and with our volunteer Andrés) accessibility and usability problems have been detected, it is concluded that, even while providing support to people with reduced or no vision, this functionality can further increase its potential with some improvements. 

As mentioned above, it is an application of great interest to the group of people with visual functional diversity. However, the fact that the interface is only available in English and the high battery consumption of mobile devices are points to take into account when using it.


  • Handwriting recognition with high precision
  • Accurate identification of scenes and people in photographs
  • Real-time OCR for short texts
  • High-precision OCR for documents
  • Light intensity detection
  • Is free
  • In the "World" section, it offers the ooption to share paths in development, which in the future could give the user the option of going through routes without the need to create them.
  • In the "World" function highlights the professionalism to be a first version and the option for users to give feedback to improve it. 

Improvement points

  • The reduction of battery consumption could be studied for future versions, especially in the "World" function:
  • The possibility of increasing the number of products identified by the application through the barcode could be studied
  • The development of a version for Android devices could be analyzed since at the moment it is only available in IOS
  • The "World" option requires headphones, and a blind person may lose sensitivity. It is very important for these people to listen to what is happening around them, especially when they go out into the street. There are bone headphones that allow you to listen without having to cover your ears, the application could recommend their use. 
  • Improve context in alt texts for screen readers. 


Technical evaluation scores.

Design and manufacturing:
Refers to the physical aspects and details of the manufacturing of the technological product
Technical benefits:
Description of the quality of the technical specifications of the technological solution
User experience:
This criterion is linked to the user's assessment of the product or application
It is the degree to which people can use or access a product, technological solution or service, regardless of their technical, cognitive or physical capabilities.

Social valuation:

Seeing AI has been tested with our volunteer Andrés, with the aim of providing some details about its operation from the point of view of the end user of the application.

The first and great difficulty that has been encountered when starting to use it is that it is not translated into Spanish, so that a person who does not know the English language encounters this language barrier. An attempt has been made to solve this problem in the IOS configuration menu by adding Siri shortcuts for the different functionalities of the application. In this way, a short phrase has been recorded in Spanish that identifies the desired functionality, for example, “recognize text”. After saying the phrase "Hey Siri, recognize text", the application runs in the foreground in its function of recognizing text. This solves the problem of navigating the menus in English. With text functionalities it behaves quite well since the result is read in Spanish. But with other functions, such as recognizing scenes or objects, it is not useful since the results are verbalized in English.

Regarding the identification of text, it has seemed very good and reliable, especially with texts printed with several columns where it is able to detect them and follow the reading order. However, in terms of manual writing, the application does not achieve high reliability, particularly with the identification of texts written in lowercase letters.

The colors and banknotes are identified with good precision, although the result is verbalized in English. For their part, the faces are also identified with GOOD ACCURACY.

The identification of products through the barcode has presented some drawbacks, but it is probably due to the fact that not all the products of a supermarket are registered in its database, so it has only been possible to identify some of the products through barcodes.

It has been tried to create a route in a building to later follow it with the help provided by the "World" functionality.  First of all, our volunteer has not been convinced by the idea of ​​wearing headphones since he has emphasized that he does not feel safe, although he has mentioned that he has bone helmets that provide him with greater security since they do not make it difficult for him to hear the noise. around.  

Creating the route has been simple and Andrés specifies that, even though it was the first time he used the option, it was relatively enjoyable and easy for him to set a starting point and create the route according to the indications in the app. when testing functionality problems have been detected due to excessive battery consumption.

Although it has been difficult for him to orient himself with sound alone, in a situation like this in which Andrés is walking through a new environment, the functionality can support him if he had bone hooves. On the other hand, in routes with which he is already familiar, the utility increases. In the case of creating new routes, it must be taken into account that it is convenient for Andrés to have a guide person or to have already created routes.  

In general, our volunteer Andrés has found it to be a reference application to always carry installed, although he is looking forward to an update that translates the application into Spanish, and thus facilitates its use in this language.

Social valuation scores.

Impact and utility:
Describe to what extent the functionalities of the product are useful and impact on the improvement of the user's life
Design and Ergonomics:
Assessment of how the design of the technological solution adapts to the person to achieve greater comfort and effectiveness when using it
Usability and accessibility:
Possibility of the device to be used, understood and used in equal conditions for any person
Ease of acquisition:
It refers to the possibilities of accessing and acquiring a technological solution by the user

Are you interested in us doing a more in-depth analysis of this product?

You can send us your request by entering an email and clicking Request analysis.

Go to content