We have a duty as early-stage investors in the rapidly evolving Generative AI market, a field projected to reach US$66.62bn by the end of 2024, to constantly refine our understanding and boundaries of knowledge. “Cutting Through the Noise in the Generative AI Landscape as an Early-Stage Investor” represents our early hypothesis in understanding and investing in this field, given our current stance as learners.
With the market size expected to grow at a 20.8% CAGR, resulting in a market volume of $207 billion by 2030 (Statista, 2023) this mini market deep-dive seeks to identify and understand key areas of investment opportunities within the Generative AI tech stack, navigating through prevailing uncertainty and noise.
Generative AI, also known as GenAI, is a type of artificial intelligence that specializes in creating new content. According to CB Insights (2023) these models learn by recognizing patterns in the data they are trained on and can then create new data, like text, images, videos, and audio, that is similar to what they’ve learned.
To begin with, my approach to understanding this field involved a simple but effective method: taking a blank sheet of paper to note down fundamental high-level questions. These questions were instrumental in demystifying the complexities inherent in this domain.
It can take a considerable amount of time trying to detangle this difference. According to Massachusetts Institute of Technology (2023) “Generative AI can be thought of as a machine-learning model that is trained to create new data, rather than making a prediction about a specific dataset” while Artificial intelligence is “one system that learns to generate more objects that look like the data it was trained on”.
Put simply, AI models mainly utilize supervised learning methods, while Generative AI employs both supervised and unsupervised learning techniques. For example, while traditional AI might rely heavily on labeled datasets to train algorithms, Generative AI can generate new data instances, learning from both structured (labeled) and unstructured (unlabeled) data. This capability, especially using semi-supervised learning methods, enables models to learn from a broader spectrum of data. As a result, they become increasingly proficient over time, acquiring the ability to synthesize and interpret complex patterns that might not be immediately apparent in strictly labeled datasets.
AI models mainly utilize supervised learning methods, while Generative AI employs both supervised and unsupervised learning techniques.
Dwelling closer into the technologies that form the foundation of Generative AI, we can see that it is primarily built upon two key technologies.
First, there are Large Language Models (LLMs), like GPT-3. These models read lots of text and learn how to make sentences that make sense, similar to how they can complete the phrase “peanut butter and ____” with “jelly”.
The second technology is Generative Adversarial Networks (GANs). According to MIT (2018) GANs use two models that work against one another: one is trained to produce a specific output (such as an image of a flower), while the other one is designed to distinguish. The first model tries to fool the discriminator and in the process learns how to make more realistic output
The short answer is no. Interestingly, the current era of Generative AI, marked by significant technical advancements, began as early as the 1960s. Its roots can be traced back to machine learning, which emerged in the late 1950s when scientists started using algorithms to generate new data.
A significant early contribution in the field of Natural Language Processing (NLP) was highlighted in Joseph Weizenbaum’s book (1976). Weizenbaum developed ‘Eliza’, one of the first NLP programs, which was a groundbreaking step in the evolution of AI. Later, in the 1990s and 2000s, machine learning underwent a major transformation, spurred by advancements in hardware and increasing data availability. By the 2000s and 2010s, the explosion of data and computational power significantly boosted the feasibility and practicality of deep learning.
Based on my current understanding and interpretation, the recent boom in Generative AI can be attributed to three primary reasons, particularly evident when reflecting on the developments over the past five years:
An NLP data scientist at KDnuggets (2023) points out, there has been a significant scale-up in GPT models over the last five years, growing approximately 8,500 times from GPT-1 to GPT-4. This is due to improved training data size and quality, data source diversity, training methodologies, and an increase in model parameters.
Data creation has seen a three-fold increase in the past five years, according to Statista (2023). This burgeoning volume of data serves as a vital resource for training and refining GenAI models. IDC’s global datasphere forecast (2023) suggests a yearly 23% increase in global data creation, growing from 64.2 zettabytes in 2020 to an estimated 181 zettabytes by 2025, a significant rise from the 6.5 zettabytes recorded in 2012. To contextualize, one zettabyte equals approximately a trillion gigabytes.
According to the Pitchbook analyst report (Q4 2023), the past few years have seen significant growth in the valuations of Generative AI companies. Late-stage companies have experienced a 171.7% increase in median valuations, now averaging around $250 million. Early-stage companies, including pre-seed and seed stages, also show notable valuation growth, with a median of around $14.6 million. This trend reflects increased investor confidence and funding in the Generative AI sector.
At Scale Capital, we delve into the GenAI landscape by closely examining various sub-markets and utilizing insights from experts and recent studies. In our journey, akin to ‘peeling the onion’, we have developed a high-level overview of the GenAI tech-stack, drawing upon the comprehensive tech-stack mapping from experts at A16z (2023).
According to A16z, the GenAI tech-stack can be divided into three overarching layers — the infrastructure layer, the model layer, and the application layer.
The first layer is composed of providers that supply the necessary infrastructure for running inference workloads and training AI models. This layer can, in turn, be categorized into three overall areas.
The second layer enables AI products and applications. It offers both proprietary APIs like Large Language Models and open-source models. This layer is pivotal for AI innovation, providing diverse model access and development approaches, and can be further categorized into two high-level areas.
The third layer merges AI models into end-user products, enhancing user experience. This segment has grown substantially, driven by novel applications across various domains. In 2023, companies in this layer raised over $14 billion, signifying strong market momentum. The layer includes companies focused on domain-specific applications and those offering comprehensive, end-to-end solutions.
In our role as early-stage investors delving into the Generative AI tech stack, we have developed a preliminary hypothesis for deal sourcing. Currently, the software infrastructure layer and the end-to-end vertical application layer appear to offer the most attractive investment opportunities (I’ll expand on the reasoning for this later). However, it’s crucial to acknowledge the inherent uncertainty in this rapidly evolving sector. The rapid pace of change means that our current investment strategies could soon be outdated, underscoring this market’s dynamic and unpredictable nature.
In the software infrastructure layer, the market is less dominated by large players, presenting a unique opportunity in its fragmented state. This area is ripe for innovative companies to address unmet needs. However, entering this sector requires significant capital for technological development.
At Scale Capital, we assess long-term viability by focusing on the team and technology behind startups. Patience is key, as overcoming entry barriers can eventually lead to market stability. Rapid technological evolution remains a risk, necessitating a strategy that diversifies investments and stays adaptable to emerging trends. Our goal is to identify companies that offer substantial value, ensuring their long-term defensibility and customer loyalty.
In the end-to-end vertical application layer, significant growth driven by diverse use-cases is noteworthy. Companies integrating proprietary models with applications, utilizing unique data for value-adds and AI enhancement, are particularly interesting. These companies stand out by continually refining user experiences and keeping technology in-house, which is key for long-term market success.
Challenges include high development costs, lengthy R&D, and effective data management. At Scale Capital, we mitigate these risks by focusing on companies with clear monetization strategies, strong data infrastructure, and a culture of innovation, essential for maintaining competitive advantage.
Our current lens, when looking for companies in the Generative AI landscape, focuses on end-to-end vertical application solutions and software infrastructure solutions.
As an early-stage investor, effectively cutting through the noise is about being well-informed on technological advancements and trends. Though early in my VC career, I’ve learned the importance of not being swept away by every innovative idea within the field. Focusing our energy efficiently is key to identifying tomorrow’s leaders, especially those expanding from Europe to the US market.
At Scale Capital, our current lens when looking for companies in the Generative AI landscape focuses on end-to-end vertical application solutions and software infrastructure solutions. Still, we remain open to adapting our approach as the market evolves. The landscape is dynamic, and while we prioritize certain tech stack areas, we’re always ready to explore emerging opportunities, adapting as new layers of the ‘onion’ unfold and shape the future of sustainable business development.
Read also
Generative AI Chronicle: Q1 2024 Insights
AI Will Change the World More Than the Internet Ever Did