Merovingian I & II

Our current model which can be tested here, is called Merovingian I. It's a powerful uncensored 8 billion parameter multi-language LLM trained on a dataset of trillions of tokens, covering a wide variety of texts, including academic journals, popular websites like Reddit and Wikipedia, a diverse range of books and articles. We used the Leipzig Corpora Collection for linguistic diversity, WebText for contemporary online text, and domain-specific datasets like PubMed for medical literature. This extensive training ensures a broad understanding across different subjects and languages. Merovingian is proficient in over 30 languages, including Spanish, Mandarin, and Arabic. Its multilingual capabilities stem from training on diverse language datasets, including those from the Tatoeba Project and Europarl.

Merovingian leverages both the Llama 2 framework as well as the Grouped-Query Attention and Sliding Window Attention concepts initiated by Mistral.

Our upcoming Merovingian II will be a so-called mix of experts model, enhancing the model's capacity and efficiency by combining the strengths of multiple specialized models, known as "experts," under a single framework. The key idea behind an MoE model is to have different experts handle different types of data or tasks, with a gating mechanism that decides which expert(s) to use for a given input.

We recommend GPU with at least 6GB VRAM to run the models.

Last updated