Seeing AI is a Microsoft application developed for IOS devices that allows you to have in the same application different functionalities useful for people with blindness or low vision. Each of these functionalities is called a channel. Channels may increase if new functionalities are added.
The application allows, among others, to recognize text in documents and images, detect light intensity, identify colors or describe scenes.
When the application is opened, the camera viewfinder is displayed along with the menu button and quick help, as well as the channel selector and a button to pause and resume automatic detection.
All menus, buttons and information are in English, although the recognition language can be changed to different languages, including Spanish, as well as predefined the type of currency.
Some of the channels can work with automatic detection. Recognition accuracy can be affected by the user's pulse, the orientation of the document, and the distance to the document.
Menu

The application menu allows you to access the application settings, the device's photo gallery and various information.
Search photos
This option allows you to access the device's photo gallery and recognize the content of the photo, be it a text or a scene.
During the tests carried out this option has successfully recognized the scenes that appeared in different photos stored in the device.
Help

This option allows you to access the help of the application.
Feedback
This option allows you to contact the developers by sending an email with the aim of providing suggestions or communicating any type of incident.
Settings

This option allows you to configure different aspects of the application such as the type of currency, the ordering of the channels or voice settings, among others.
About

This option provides information about the application and the developers.
Outdated
short text

This channel allows the identification of short texts in real time, such as the one that appears on product labels.
During the tests carried out, the application has identified with very good results the texts of packaging, product surfaces and even the screen of electronic devices.
Document


This channel allows a text to be focused, captured and recognized. After this, the application displays a screen with the recognized text of the document.
In the tests carried out, it has been found that the recognition is very good, although it is influenced by different aspects such as the orientation of the document, the size or font or the type of document, among others.
The image on the left shows a photo of a document. The image on the right shows the text that the application has recognized in the document.
Products

This channel allows identifying products through their barcode, provided that their information is available. To do this, the barcode is focused with the camera, which is responsible for capturing and identifying it.
In the tests carried out, the application has correctly identified the barcode. However, the identification of the product depends on its information being available in the database, as is the case with the Bezoya mineral water bottle that has correctly identified the application.
Person


This channel identifies how many people are in the image captured with the camera, how they dress, their facial features and age. For this channel to work properly, people must be not too far away.
During the tests carried out, the application has correctly identified people in terms of their sex and clothing, although it has given a variable range in relation to age.
In the image on the left you can see a young woman next to a text in English provided by the application that says "30 years old woman with black hair looking happy" . The image on the right shows a young man and woman with a text provided by the application that says "30 people detected. 2 years old man with brown hair looking happy. 36 years old woman with brown hair looking happy" (" 27 people detected. 2-year-old man with brown hair looking happy. 36-year-old woman with brown hair looking happy ").
Currency

This channel allows identifying the monetary value of the banknotes in the predefined currency and in real time.
In the tests carried out, it has been possible to verify that the application correctly identifies the banknotes, such as the € 20 banknote that can be seen in the image. Once the application has identified the value of the ticket, said value is spoken aloud.
Scene


This channel allows you to describe the scene that appears in the image captured by the camera after pressing the take picture button. The application speaks aloud what is shown in the image.
The image on the left shows a woman sitting at a desk with a computer in front of her. The image on the right shows the same scene after being recognized by the application with a text in English that says "A Person sitting at a desk with a computer in an office chair." ("A person sitting at a desk with a computer in an office chair").
Color

This channel detects the main color or colors of an object or surface. The identification of the color can be affected by different reasons such as the hue of the same or the lighting of the environment. Generally, under proper conditions, the application correctly identifies the colors of the focused surface.
In the tests carried out, the application has successfully identified the colors of the objects in focus with the camera.
Handwriting


This channel allows to recognize handwritten texts. When the application recognizes the text, it speaks it out loud.
The image on the left shows a photograph of a notebook with the following handwritten text: "At Orientatech we tested the handwriting recognition of the Seeing AI application." On the right is the screenshot with the text recognized by the application, which, as you can see, has been correctly recognized.
Luz

This channel allows the light intensity to be detected. To do this, use a musical scale in which the greater the intensity of the light, the sharper the musical notes that are played.
In the tests carried out, the application has reproduced the highest notes when the camera has focused on light-emitting objects, such as the computer screen or the light source that can be observed in the image.
Conclusion
Microsoft's Seeing AI app is a great tool for people with some form of visual disability, especially those with very low vision or totally blind. This application brings together in a single app different functionalities that contribute to improve the activities of daily life and favor a greater personal autonomy of the group with visual functional diversity.
It is worth highlighting with a special mention the recognition of handwritten texts with great precision, as well as the identification of scenes and people.
OCR (Optical Character Recognition) is also very useful, either for short texts such as packaging, or for documents.
Of special relevance for people with total blindness is the identification of the light intensity since it allows them to know, for example, if a lamp is on or off.
As mentioned above, it is an application of great interest to the group of people with visual functional diversity. However, the fact that the interface is only available in English and the high battery consumption of mobile devices are points to take into account when using it.
Highlights
- Handwriting recognition with high precision
- Accurate identification of scenes and people in photographs
- Real-time OCR for short texts
- High-precision OCR for documents
- Light intensity detection
- Is free
Improvement points
- It could be suggested to translate the interface into other languages since it is only available in English at the moment
- The reduction of battery consumption could be studied for future versions
- The possibility of increasing the number of products identified by the application through the barcode could be studied
- The development of a version for Android devices could be analyzed since at the moment it is only available in IOS