The Rise of Vector Search

Vector Search

In the past couple of years more and more companies seem to be shouting about Vector Search and how this technology is the next big thing when it comes to finding and discovering information. 

Microsoft is now using Vector Search with Bing (https://www.microsoft.com/en-GB/ai/ai-lab-vector-search), Home Depot use it for a better shopping experience on their website (https://www.datanami.com/2022/03/15/home-depot-finds-diy-success-with-vector-search/), and startup companies like Pinecone (https://www.pinecone.io/) provide developers toolkits to access it. 

However Vector Search as a concept has been around for a while. Gerald Salton, Professor of Computer Science at Cornell University  (https://en.wikipedia.org/wiki/Gerard_Salton) may be considered the father of this technique with many books and articles published outlining this technique going back to the 1960s. 

The power of Vector Search that Salton posited was undeniable. It allowed information to be retrieved from very unstructured sources, enabled documents to be vectorised on their entire content rather than just a few keywords so improving retrieval relevancy, and never gave that annoying “no results found” error message since there would always be a closest match to a query based on vector distance (even if that distance was some way away). 

Back then though, outside of theoretical models and small scale implementations, Vector Search (also sometimes called Neural Search) was impractical to implement due to the enormous computing cost required to run. 

Several things have changed in the last few years to change this and make the use of Vector Search practical. 

Firstly is the continuing development and maturity of supporting ecosystem. Vital to the success of Vector Search and it’s wide scaled application is an understanding of the language used in documents. This has been achieved with the growth and development of Neural Networks and Word Embedding systems such as Google’s word2vec (https://arxiv.org/abs/1301.3781) and Bert (https://arxiv.org/abs/1810.04805v2), Facebook’s fastText (https://research.facebook.com/downloads/fasttext/) and Stanford’s GloVe systems (https://nlp.stanford.edu/pubs/glove.pdf)  which have been trained on massive amounts of information from their relative user bases. 

Then we have the internet… with more information stored in the cloud and enterprises moving away from traditional on-premise servers and applications this makes access to source information considerably easier. 

The most obvious change since the 1960s is of course the fact that computers now are significantly more powerful than they were back in the 1960s (If Moore’s law held true todays computers would be more than 2 million times more powerful!). However this rise in computing power is not enough to enable Vector Search at enterprise scale. Vector Search relies on converting documents to multi-dimensional vectors (typically 300 dimensions) and comparing these to all other documents using various mathematical and statistical techniques (eg K Nearest Neighbours, Word Movers Distance) to find the closest matches. When dealing with millions of documents this computing overhead is considerable. 

At iKVA we’ve solved this problem with the development of our own technology (MRPT) which means we can respond to information queries across millions of documents in milliseconds. 

For more information on how Vector Search can help your business please contact us.

Like this article?

Share on Facebook
Share on Twitter
Share on Linkdin
Share on Pinterest

Leave a comment

Keep up to date with cutting edge technology - sign up for our monthly newsletter

This website uses cookies to ensure you get the best experience on our website.