Firassa: Building AI That Truly Converses With Your Videos

Written by

Firassa AI Team

Post date

28 Oct, 2025

Firassa conversational video intelligence platform thumbnail

The future of video understanding isn't just about search, it's about having meaningful conversations with your content.

Beyond Basic Video Search

When Maroune Lamharzi Alaoui and Rachid Hakmi first looked at the video intelligence landscape, they noticed a fundamental problem: existing solutions treated video as a passive medium to be indexed rather than understood.

"Traditional video search has always been limited to tags, titles, and basic transcriptions," explains Lamharzi Alaoui. "But the richness of video, visual cues, emotional context, cultural references, remained locked away, inaccessible through conventional search methods."

This insight led to the creation of Firassa, a revolutionary platform that enables users to have natural conversations with their video content. Unlike systems that merely tag and catalog videos, Firassa understands videos holistically, comprehending the complex interplay between visuals, speech, and implied meaning.

Conversation, Not Just Search

Firassa's approach is fundamentally different from competitors in the video AI space. While companies like Twelve Labs focus on search capabilities, Firassa has pioneered true conversational interaction with video content.

"We're not just helping users find moments in videos, we're enabling them to interact with their entire video library as if each video were an intelligent entity capable of discussing its own content," says Hakmi. "You can ask a Firassa-powered video about visual elements that were never mentioned in speech, or inquire about cultural nuances that traditional systems would miss entirely."

This conversational paradigm transforms how organizations access their video knowledge. Rather than constructing complex search queries, users can simply ask questions in natural language: "Show me all instances where our packaging design appears but isn't explicitly discussed," or "Find moments across our customer interviews where people express confusion about our pricing model."

Multilingual Excellence

A particularly powerful aspect of Firassa's technology is its exceptional multilingual capability, with special strength in Arabic and its regional dialects.

Video content doesn't exist in a single-language world. Organizations operate globally, creating content in multiple languages. Firassa not only understands content in any language but can bridge language barriers by allowing users to query Arabic content in English, or discuss French videos in Arabic.

This multilingual excellence opens up vast libraries of previously siloed content, making knowledge accessible regardless of language barriers. A marketing team in London can seamlessly extract insights from customer testimonials recorded in Moroccan Arabic, while executives in Dubai can analyze product demonstrations created by their German engineering team.

Enterprise Applications

Firassa's conversational approach to video intelligence is finding applications across industries:

Media organizations use Firassa to rapidly identify specific segments within vast archives, enabling efficient content repurposing and monetization
Educational institutions leverage the platform to make lecture content interactively searchable, improving learning outcomes
Market research firms use Firassa to extract nuanced consumer insights from hours of interview footage
Manufacturing companies employ the technology to analyze training videos and improve process documentation

The impact is transformative. Conversational video intelligence revolutionizes workflows that previously required intensive manual review. Tasks that once took days now happen in minutes, with greater depth of understanding.

Automating Professional Deliverables

Beyond conversational intelligence, Firassa AI delivers tangible value by automating essential production deliverables required by broadcasters and streaming platforms. From a single video ingest, the platform automatically generates studio-grade outputs that traditionally require hours of manual work.

The system produces accurate Continuity & Spotting Lists, detailed logs that capture scene and dialogue timing crucial for dubbing, subtitling, and compliance workflows. These logs include exact in and out timecodes for dialogue, actions, sound effects, and music cues, exportable as CSV files ready for editorial review and quality control.

Firassa also generates precise SDH Subtitles (Subtitles for the Deaf and Hard of Hearing) that clearly label dialogue, music cues, and sound effects with accurate speaker identification. These timed text files serve viewers who cannot hear, conveying all audible information in synchronized text format. Complementing this, the platform creates comprehensive Audio Descriptions, narrating visual details to make content accessible to blind or visually impaired audiences. These descriptions are carefully timed to fit between dialogue gaps, ready for text-to-speech generation or studio recording.

What makes this capability remarkable is the precision and efficiency. All three deliverables are created automatically, efficiently, and accurately from a single video file through the API. This automation is powered by two complementary engines: Roc performs rapid metadata extraction to structure scenes and timing, while Bahamut applies deep contextual understanding to produce time-aligned text assets with speaker and event attribution.

For media companies, this automation addresses a critical pain point. Continuity and Spotting lists remain required for dubbing, localization, and delivery compliance, while SDH and Audio Descriptions are increasingly mandated by platforms and regulations. By automating first-pass generation of these deliverables, Firassa enables studios to raise accessibility coverage at library scale, significantly streamlining both accessibility and localization processes while reducing production timelines and costs.

The Road Ahead

As Firassa continues to develop its technology, the team is exploring new frontiers in conversational video intelligence. The vision extends far beyond building better video search. Firassa is creating a new paradigm where video content itself becomes an interactive, queryable knowledge base, with potential applications extending beyond anything currently available in the market.

The bootstrapped startup is currently focused on refining its core technology and establishing strategic partnerships with key clients across industries. While operating lean, the company remains committed to expanding its engineering capabilities and research initiatives to meet the growing demand for conversational video intelligence.

Video is humanity's richest medium for communication, yet until now, we've interacted with it in primitive ways. Firassa is changing that fundamentally, making video content truly conversational for the first time.