Decentralized LLM Economics & Tech

Distributed LLM - an economically better option

For any AI models the 2 most crucial resources are data and compute. These models have been trained of vasts amounts of datasets to achieve the current performance and require an increasing amount of computational resources to provide end users inferences. To provide some benchmarks:

Compute

  1. GPT-3: 175 billion parameters

GPT-3, released by OpenAI in 2020, currently ranks as the 3rd largest public language model available for testing/use.

  1. Megatron-Turing NLG 540B: 541 billion parameters

Developed by Nvidia, this language model was the 2nd largest ever trained when released in 2021. It is focused specifically on natural language generation rather than tasks like translation.

  1. GPT-3.5 Turbo: estimated 20 billion parameters

An extension of GPT-3 made by Anthropic, GPT-3.5 Turbo is currently the world's largest public language model.

Training datasets

The exact training dataset sizes for the largest language models are generally not disclosed publicly. However, researchers have made some estimations based on the model parameters

  1. GPT-3: Estimated trained on 300-400 billion words total from Web documents and books. Some analysts have estimated the training dataset to include hundreds of millions of webpages and tens of thousands of books.

  2. Megatron-Turing NLG: Likely that it was trained on a comparable or larger dataset size than GPT-3, potentially totalling 270-340 billion words across 15 combined datasets.

  3. GPT-4: the model supposedly with 1.76 trillion parameters has been estimated to train over trillions of tokens. These estimates suggest that the model was trained over data from platforms such as Reddit, and Youtube.

Storage costs for increasingly large datasets and parametrized models is significantly lower in distributed systems

Source: Filecoin | Messari

Decentralized compute platforms like Render, io.net are also displaying that it is more cost effective to access compute power of idle GPUs than paying large sums to cloud based centralised service providers.

Decentralized AI offers cost benefits by leveraging economies of scale in energy usage. GPUs, operating out of regions with lower energy costs, can facilitate a larger computational load cheaply and efficiently. Existing idle infrastructure such as personal computers and servers can be used, reducing infrastructure setup and maintenance costs. The P2P layers that have shaped blockchains and file-sharing networks also allow for efficient use of available GPUs. Also, a distributed system avoids the risk of pricing inelasticity when working with web2 service providers and their overbearing safety compliance measures. The distributed network also makes material improvements in reliability as redundant nodes can always scale up their workload incase another node suffers from downtime.

We believe that the overall cost savings coupled with the high likelihood of users paying an equivalent amount or a premium to access moderation and censorship free AI models (starting with LLMs) makes this a well functioning economic system. MONAI Tech Stack

Our model uses a Transformer architecture, at the root of all the currently available open and closed source LLM models, allowing what is the core of a LLM: generating a prediction of the next token output by taking in a user input, or, in other words, providing a NLP output understandable by humans.

Monai comes with several checkpoints tailored to different uses, from general-purpose language modeling to specialized chat and instruction-following capabilities. Its versatility makes it suitable for a variety of applications, including chatbots, content generation, and complex problem-solving tasks.

The barriers to a truly uncensored LLM are twofold, and we propose to tackle each of them.

The first one is related to the LLM training: to take an analogy, if we expose a human being to only one point of view, he is extremely likely to adhere to it. Transposing the analogy to a LLM, which is in the end "just" a probabilistic token (word) generation, the bias of the training dataset will be reflected in the billions of parameters of the model, which will then be generated by the model itself, without human intervention. Our take on this was first to make sure our dataset expands to the areas of knowledge left unused in the training datasets of the other models. This process was performed by a thorough analysis of the current knowledge gaps of the currently available models. But, of course, as we acknowledge the impossibility of being exhaustive in terms of knowledge at a certain point of time (the so called "cutoff date" of the LLM training dataset), we also acknowledge that the LLM would in any case need to be re-trained continuously in order to add all the knowledge that will be released over time, either in the form or articles, books, scientific papers, copyright free material, or proprietary datasets provided by people or organizations willing to help us on our mission. Which is why we plan to frequently train our LLM based on new datasets available and, of course, user feedback.

The second step is related to hard coded biases: to continue our analogy, this would be a censoring department checking what someone wants to publish, and removing some text or rephrasing it. These hard coded biases do not come from the LLM themselves but from processing the LLM output before sending it to the user. To tackle this issue, we just do not implement any hard coded bias.

Last updated