Visual search is one of the most extremely complex and fiercely competed sectors of the search industry. Earlier in July, Bing announced their new visual search mode, right on the heels of similar developments from Pinterest and Google.
Ours is a culture dominated by images, so it stands to reason that visual search has assumed such importance for the world’s largest technology companies. The pace of progress is truly quickening; but there is no clear visual search ‘leader’ and nor will there be one soon.
The search industry has developed significantly over the past decade, through advances in personalization, natural language processing, and multimedia results. And yet, one could argue that the power of the image remains untapped.
This is not due to a lack of attention or investment. Quite the contrary. Cracking visual search will require a combination of technological skills, psychological insight, and neuroscientific know-how. This makes it a fascinating area of development, but also one that will not be mastered easily.
Therefore, in this article, we will begin with an outline of the visual search industry and the challenges it poses, before analyzing the recent progress made by Google, Microsoft and Pinterest.
What is visual search?
We all partake in visual search every day. Whenever we need to locate our keys among a range of other items, for example, our brains are engaged in a visual search.
We learn to recognize certain targets and we can locate them within a busy landscape with increasing ease over time.
This is a trickier task for a computer, however.
Image search, in which a search engine takes a text-based query and tries to find the best visual match, is subtly distinct from modern visual search. Visual search can take an image as its ‘query’, rather than text. In order to perform an accurate visual search, search engines require highly sophisticated processes than they do for traditional image search.
Typically, as part of this process, deep neural networks in our brains are put through their paces in various types of tests, with the hope that they will mimic the functioning of the human brain in identifying targets.
The decisions (or inherent ‘biases’, as they are known) that allow us to make sense of these patterns are more difficult to integrate into any machine. When processing an image, should a machine prioritize shape, color, or size? How does a person do this? Do we even know for sure, or do we only know the output?
As such, search engines still struggle to process images in the way we expect them to. We simply don’t understand our own biases well enough to be able to reproduce them in another system.
There has been a lot of progress in this field, nonetheless. Google image search has improved drastically in response to text queries and other options, like Tineye, which also allows us to use reverse image search. This is a useful feature, but its limits are self-evident.
For years, Facebook has been able to identify individuals in photos, in the same way a person would immediately recognize a friend’s face. This example is a closer approximation of the holy grail for visual search; however, it still falls short. In this instance, Facebook has set up its networks to search for faces, giving them a clear target.
At its zenith, online visual search allows us to use an image as an input and receive another, related image as an output. This would mean that we could take a picture with a smartphone of a chair, for example, and have the technology return pictures of suitable rugs to accompany the style of the chair.
The typical ‘human’ process in the middle, where we would decipher the component parts of an image and decide what it is about, then conceptualize and categorize related items, is undertaken by deep neural networks. These networks are ‘unsupervised’, meaning that there is no human intervention as they alter their functioning based on feedback signals and work to deliver the desired output.
The result can be mesmerizing.
This is just one approach to answering a delicate question, however.
There are no right or wrong answers in this field as it stands; simply more or less effective ones in a given context.
We should therefore assess the progress of a few technology giants to observe the significant strides they have made thus far, but also the obstacles left to overcome before visual search is truly mastered.
Bing visual search
In early June at TechCrunch 50, Microsoft announced that it would now allow users to “search by picture.”
This is notable for a number of reasons. First of all, although Bing image search has been present for quite some time, Microsoft actually removed its original visual search product in 2012. People simply weren’t using it since its 2009 launch, as it wasn’t accurate enough.
Furthermore, it would be fair to say that Microsoft is running a little behind in this race. Rival search engines and social media platforms have provided visual search functions for some time now.
As a result, it seems reasonable to surmise that Microsoft must have something compelling if they have chosen to re-enter the fray with such a public announcement. While it is not quite revolutionary, the new Bing visual search is still a useful tool that builds significantly on their image search product.
A Bing search for “kitchen decor ideas” which showcases Bing’s new visual search capabilities
What sets Bing visual search apart is the ability to search within images and then expand this out to related objects that might complement the user’s selection.
A user can select specific objects, hone in on them, and purchase similar items if they desire. The opportunities for retailers are both obvious and plentiful.
It’s worth mentioning that Pinterest’s visual search has been able to do this for some time. But the important difference between Pinterest’s capability and Bing’s in this regard is that Pinterest can only redirect users to Pins that businesses have made available on Pinterest – and not all of them might be shoppable. Bing, on the other hand, can index a retailer’s website and use visual search to direct the user to it, with no extra effort required on the part of either party.
Powered by Silverlight technology, this should lead to a much more refined approach to searching through images. Microsoft provided the following visualization of how their query processing system works for this product:
Microsoft combines this system with the structured data it owns to provide a much richer, more informative search experience. Although restricted to a few search categories, such as homeware, travel, and sports, we should expect to see this rolled out to more areas through this year.
The next step will be to automate parts of this process, so that the user no longer needs to draw a box to select objects. It is still some distance from delivering on the promise of perfect, visual search, but these updates should at least see Microsoft eke out a few more sellable searches via Bing.
Google recently announced its Lens product at the 2017 I/O conference in May. The aim of Lens is really to turn your smartphone into a visual search engine.
Google Lens logo, which looks like a simplified camera with a red and yellow outline, blue lens and green flash.
Take a picture of anything out there and Google will tell you what the object is about, along with any related entities. Point your smartphone at a restaurant, for example, and Google will tell you its name, whether your friends have visited it before, and highlight reviews for the restaurant as well
ABCO Technology teaches comprehensive courses for web development where you can utilize visual search. ABCO has moved to a new location at:
11222 South La Cienega Blvd. in STE 588
Los Angeles, Ca. 90304. You can reach our campus by telephone at: (310) 216-3067.
Email your questions to: info@abcotechnology.edu
Financial aid is available to all students who qualify.
Get ahead of the web development crowd. Start learning how to design websites that will capitalize on visual search today!