"Picks and shovels" and the sudden abundance of vector databases

⛏ “Picks and shovels” strategy: how everyone wants to outsmart the competition by building vector dbs 😀

Picks and shovels is an investment strategy which refers to the California Gold Rush. Instead of looking for gold with no guarantee of finding it, it is better to sell equipment such as picks and shovels to prospectors and thus build a more stable yet prosperous business.

Vector databases are used to store vectors produced by a specific type of AI models called embedding models. They are often considered to be the shovels needed for AI to run (though a large number of AI systems don’t need them).

Vector databases have existed before, but they gained a lot of attention after the meteoric success of ChatGPT. Now, there’s an abundance of such dbs. To quote LlamaIndex’s documentation:

LlamaIndex supports over 20 different vector store options. We are actively adding more integrations and improving feature coverage for each.

Is there enough demand for all of them? - Probably not. In the tech industry, usually there’s only one winner who takes it all. 🏆

But vector dbs are evolving too. Vector search alone is often not a complete solution, so vector databases pivot to hybrid search: offering both keyword and vector search, as well as filtering and other features.
On the other hand, existing databases have also jumped on the AI bandwagon, adding support for vectors or moving into the direction of the hybrid search (e.g. Elasticsearch).

Worth mentioning - vector databases aren’t AI themselves. Future LLMs actually might be able to query existing systems and find what they need on their own.
Vector search is a form of optimization, improving search results obtained by a LLM, cutting costs, especially if the domain is known and the embedding model can be trained upon it.

Summing up:

The future of vector databases isn’t obvious, especially with so many players in the field.
“Picks and shovels” isn’t enough anymore to outsmart the market.