Twelve Labs and AWS Turn Videos into Text-Searchable Content

By Greg Tavarez December 04, 2024

Not to state the obvious, but video content is increasingly prevalent; it’s around us nonstop. In fact, 80% of the world’s data is in video, according to NEA and Twelve Labs.

There’s a red flag, though - current technologies struggle to effectively process and understand the vast amount of visual information.

Think about it. Traditional search engines excel at indexing and retrieving text-based data. However, videos present a unique challenge due to being complex. They combine visual, auditory and often textual elements, which makes it difficult for machines to accurately interpret and categorize their content. As a result, most video content remains inaccessible to search engines.

This limitation has far-reaching implications. For individuals, it means missing out on potentially valuable information or entertainment. For businesses, it can hinder marketing efforts and customer engagement. And for researchers, it slows down scientific progress by making it harder to analyze and learn from video data.

Addressing this challenge requires advancements in artificial intelligence, computer vision and natural language processing to develop technologies that can effectively understand and index video content.

Enter Twelve Labs, a startup that uses multimodal AI to bring human-like understanding to video content.

Twelve Labs will address the challenge by using AWS technologies to accelerate the development of its foundation models that map natural language to what’s happening in a video. This includes actions, objects and background sounds, to allow developers to create applications that can search through videos, classify scenes, summarize and split video clips into chapters.

Twelve Labs' Marengo and Pegasus models, for instance, generate text summaries, translate audio into over 100 languages and analyze the interplay of words, images and sounds within videos.

These foundation models, available on AWS Marketplace, enable developers to create applications for semantic video search and text generation, serving media, entertainment, gaming, sports and additional industries reliant on large volumes of video.

For example, sports leagues can use the technology to streamline the process of cataloging vast libraries of game footage to simplify the process in retrieving specific frames for live broadcasts. Coaches can also use these foundation models to analyze a swimmer’s stroke technique or a sprinter’s starting block position to make adjustments that lead to better performance.

Media and entertainment companies can use Twelve Labs technology to create highlight reels from TV programs tailored to each viewer’s interests, such as compiling all action sequences in a thriller series featuring a favorite actor.

“AWS has given us the compute power and support to solve the challenges of multimodal AI and make video more accessible, and we look forward to a fruitful collaboration over the coming years as we continue our innovation and expand globally,” said Jae Lee, co-founder and CEO of Twelve Labs. “We can accelerate our model training, deliver our solution safely to thousands of developers globally, and control compute costs — all while pushing the boundaries of video understanding and creation using generative AI.”

Twelve Labs will also use Amazon SageMaker HyperPod to accelerate and reduce the cost of training its foundation models, and it aims to expand its video understanding models into new industries and enhance training capabilities. AWS Activate has supported Twelve Labs' global growth and enabled the analysis of vast amounts of video data with split-second accuracy.

“Twelve Labs is using cloud technology to turn vast volumes of multimedia data into accessible and useful content, driving improvements in a wide range of industries,” said Jon Jones, Vice President and Global Head of Startups at AWS. “Video is a treasure trove of valuable information that has, until now, remained unavailable to most viewers. AWS has helped Twelve Labs build the tools needed to better understand and rapidly produce more relevant content.”

Be part of the discussion about the latest trends and developments in the Generative AI space at Generative AI Expo, taking place February 11-13, 2025, in Fort Lauderdale, Florida. Generative AI Expo covers the evolution of GenAI and will feature conversations focused on the potential for GenAI across industries (and how the technology is already being used to create new opportunities for businesses to improve operations, enhance customer experiences, and create new growth opportunities).

Edited by Alex Passett

Get stories like this delivered straight to your inbox. [Free eNews Subscription]