Ask the Expert about the Potential of LLMs in AI with Mobius Labs

The rise of large language models (LLMs) like GPT-4 and Llama has sparked a wave of curiosity. These advanced AI systems are blurring the lines between machines and humans in their ability to unders

Aug 21, 2024

This article summarizes a fascinating conversation with Mobius Labs, an APEX Ventures’ portfolio company at the forefront of LLM technology. We explore the inner workings of these models, the challenges they present, and the exciting possibilities they hold for the future.

Appu Shaji, CEO and Chief Scientist at Mobius Labs will offer his perspective on the potential of large language models and share how Mobius Labs is approaching them.

Mobius Labs makes GenAI models cheaper, quicker to use, and economically viable. Its product is a super-efficient, affordable, open-source, fully multimodal AI stack that costs five times less end-to-end.

The current unit economics of Generative AI (GenAI) solutions are unsustainable at scale for both providers and clients due to high compute costs. Providers face low to negative margins, while costs become prohibitive at scale for clients.

Mobius Labs' core innovation reduces AI computing costs by 10x, driven by pioneering quantization algorithms and efficient kernels that can be applied to open-weight models. Their open-source serving stack (aana-sdk) enables wide-scale enterprise deployment.

Appu Shaji, CEO and Chief Scientist at Mobius Labs

How do large language models, such as GPT-4, differ from traditional AI models in their ability to understand and generate human-like text?

Appu Shaji (AS): Large language models, such as Llama, GPT-4, etc., differ from traditional AI models primarily in the scale of their training data and computational resources. These models are trained on vastly larger datasets and have significantly more computational power than traditional AI models.

Additionally, most of their training is self-supervised, meaning they do not rely on annotated training sets. Instead, the model learns by masking random parts of the training data and trying to predict the missing pieces.

This process is akin to learning a new topic by doing numerous fill-in-the-blank exercises with an answer key. Over time, the model builds a nuanced understanding and representation of various topics, enabling it to generate more human-like text.

Mobius Labs’ Vision

What are the primary ethical concerns associated with the deployment of LLMs, and how can we address issues such as bias, privacy, and misinformation?

AS: George Box, a British statistician, once said, "All models are wrong, but some are useful." This quote aptly applies to large language models (LLMs), which are incredibly useful but have limitations and ethical concerns that need addressing.

One of the primary ethical concerns is that it's unclear whether LLMs truly achieve real-world understanding and awareness during training. Their main function is to generate plausible output based on input, prioritizing coherence and fluency over factual accuracy. This can lead to issues such as bias and hallucinations, similar to the mistakes humans sometimes make.

To mitigate bias, it is crucial to ensure that the training data, especially in the later stages of training (often called alignment), explicitly addresses potential biases. This involves careful curation and augmentation of the data to reduce inherent biases.

Grounding is a key technique for areas where factual accuracy is paramount. It requires the model to cross-check and verify information against a database of trusted sources. This approach helps maintain accuracy while balancing the model's creativity and tendency to generate new ideas, which can benefit tasks like storytelling or generating non-photorealistic images.

Privacy is another critical concern, as LLMs can process and potentially expose sensitive data on a massive scale. At Mobius Labs, we address this by delivering software that can be installed on-premises without incurring large computational costs. By shipping code to where the data resides, we ensure privacy by design, safeguarding sensitive information while leveraging the capabilities of LLMs.

Mobius Labs’ Aana SDK addresses key challenges in multimodal AI development: managing diverse inputs, scaling Generative AI apps, and ensuring extensibility.

How can AI bridge the gap between different types of data, like text and images, to create a more comprehensive understanding for tasks like machine translation or sentiment analysis?

AS: Our understanding of the world is multi-perceptual; we hear, see, sense, and smell to comprehend our surroundings. Similarly, AI can bridge the gap between different types of data, such as text and images, by utilizing machine learning models from various fields, including computer vision, audio processing, and large language models, to collaborate and achieve a multimodal understanding of data.

Different models can work together by integrating information from various modalities to create a coherent understanding. For example, when analyzing a video in which the actors are sarcastic or glib, a literal interpretation of the text alone may lead to a misunderstanding. However, by incorporating cues from facial expressions and tone of voice, AI can arrive at a more nuanced interpretation of the emotions and content.

By leveraging the strengths of these diverse models and integrating their outputs, AI can achieve a richer, more holistic understanding of complex tasks, much like how humans use multiple senses to navigate and comprehend the world.

What are your biggest concerns or questions about the development and deployment of LLMs?

AS: One of my biggest concerns about the development and deployment of large language models (LLMs) is the immense computational resources they require. Current generative AI solutions demand an extraordinary amount of compute power, both for training and inference. Typically, they rely on powerful data center-grade GPUs capable of teraflops of computation, which are not only extremely expensive but also consume a tremendous amount of energy. This situation limits AI development to well-financed companies and makes deployment unaffordable for many.

The unit economics of current generative AI models are also fundamentally flawed. While companies rush to capture market share, they often incur losses for each customer they serve, primarily due to the high cost of compute. To create a more equitable and sustainable future, it's crucial that we significantly reduce the compute requirements of these models.

What is Half-Quadratic Quantization (HQQ), and how does it differ from other quantization techniques?

AS: Modern generative AI algorithms predominantly rely on an architecture called transformer networks. These transformer models involve a significant amount of matrix multiplications with floating-point numbers (i.e., numbers that include decimal points). However, by removing the decimal points, we can reduce the computational load—much like how multiplying 3 by 100 is easier than multiplying 3.1415926535 by 100.4123414. This process is known as quantization. The most extreme form of quantization is reducing numbers to binary (1-bit) representation, which is just 0s and 1s. Typically, large language models (LLMs) use 16 bits to store a number, so using 1-bit numbers reduces the storage requirement by 16 times. Interestingly, binary multiplication is effectively just addition, which can be up to 70 times faster.

Half-Quadratic Quantization (HQQ) is a specific method that optimizes the quantization process by intelligently determining how to segment the number line along various axes to maintain accuracy. HQQ employs an algorithm that effectively estimates the best quantization points, preserving model performance while significantly reducing computational demands. A bit more in-depth technical treatment is available on our blog at https://mobiusml.github.io/hqq_blog/

What are the benefits of using HQQ for large machine learning models?

AS: One of the key benefits of using Half-Quadratic Quantization (HQQ) for large machine learning models is its ability to retain the original model's accuracy to a remarkable degree—often more than 99%. For instance, while a model like the recently released Llama-70b requires multiple data center-grade GPUs to operate, our HQQ-quantized version can run on a single GPU. This offers significant computational savings, ranging from 4 to 6 times, without compromising on model quality.

Moreover, HQQ helps democratize access to large machine-learning models, making them available to individuals and organizations with limited GPU resources and smaller budgets. Thanks to their low compute footprint, we anticipate that such models will soon be able to run on the next generation of standard hardware, including desktops, phones, and other edge devices. This will effectively unshackle the technology from data center environments, bringing advanced AI capabilities to a broader audience worldwide.

What advancements in the development of large language models can we expect over the next few years, and how might these changes impact their role in AI and society?

We are truly on an exponential curve of AI development. AI will become much cheaper and easier to train and run in the coming years—a bet we are very confident in at Mobius Labs. Alongside this, the capabilities of AI will also increase remarkably. For example, AI embedded in devices will develop a sense of embodied intelligence, enabling more intuitive interactions with the physical world. We will also see the rise of AI systems that collaborate with each other, breaking down complex problems into different domains and sharing knowledge between specialized AI systems to deliver coherent solutions. These are often termed "agentic systems."

In terms of societal impact, we need to be particularly aware of the transition from AI performing tasks to AI taking on full jobs. The implications of this shift are difficult to predict. Personally, I believe that humans will always find new opportunities with these advanced tools, but this will require strong social, political, and technological alignment. It necessitates a robust dialogue and the democratization of technology access.

I often feel that people get distracted by the artifacts of the technology, such as the size of a model or the types of inputs it can handle. What’s more important is understanding the new use cases that AI opens up.

We need to effectively map out AI's impact in the positive, gray, and negative zones. The good news is that there's often a precedent—AI may not exhibit entirely new behaviors, but it will perform certain actions at scale, like generating imagined stories and scenarios far beyond human capability.

In such cases, societal understanding of what is acceptable is paramount. If we can achieve this, we can build systems that provide effective filtering and a regulatory environment to maximize AI's benefits while mitigating its potential downsides.

An example of AI understanding video using Mobius Labs Aana SDK

What are Mobius Labs’ future plans with AI?

We are at a pivotal moment in AI, where open-source models are, for the first time, outpacing proprietary ones. At Mobius Labs, our roots in academic research instill a deep commitment to the principles of openness and reproducibility. We value the democratic and meritocratic nature of the open-source community, where ideas are transparently exchanged and rigorously tested.

The AI landscape has shifted rapidly—just a year ago, closed-source systems like ChatGPT were far ahead, but now the gap is almost non-existent. In this context, open-source software offers significant benefits, such as transparency, no vendor lock-in, customizability, and full ownership. We are focused on democratizing AI by building highly efficient, open-source solutions that can run and deploy easily.

As we pursue this mission, Mobius Labs is actively fundraising to accelerate our efforts, expand our team, and solidify our position as a major player in AI infrastructure. If you share our vision and are interested in being part of this exciting journey, we invite you to connect with us and explore potential collaboration or investment opportunities.

—

Mobius Labs develops cost-effective and scalable AI metadata solutions for applications, devices, and processes.
APEX Ventures is a European venture capital firm focusing on deep tech companies. The team act not only as investors but also as company builders with a mission to support the most talented startup teams in building global market leaders. The team is based in Vienna and Munich.

—

Keep an eye on our blog posts and sign up for our monthly newsletter to get the latest updates on deep tech innovations.

APEX Ventures News

Discussion about this post