Query Routing

Unlike other decentralized LLM systems proposed and being utilized, which require all network participants to calculate and return results for a user query The protocol has 3 parameters with regards to providing inferences that it needs to optimise and maximise for - Uptime, Speed and Stake. In a distributed LLM network we approach the query routing problem with multiple goals:

Quality of inferences requested by users should meet a threshold (i.e. detailed, moderation free and insightful)
Ensuring the system is capable of handling a high volume of queries
Network is computationally efficient (All nodes do not need to run the same computation
All node operators are appropriately incentivised to run nodes
Node operators should compete with each other on performance parameters - Uptime, response time and amount staked

We utilise a simple model that monitors these global parameters for the network in the previous epoch and calculates new weights for them in the routing process. The process is structured as follows:

Initial Weights: Set initial weights for each parameter determining their relative importance in the probability of the node getting assigned user queries.
Observation and Comparison: At the end of each epoch, the network observes the values of performance parameters and then compares these values with the corresponding values from the previous epoch.
Adjust Weights: If the median value for a parameter increases, we either keep the weight unchanged or in the case of staked concentration - decrease it to avoid centralisation risks. If the median value decreases, we increase its weight to incentivize node operators to receive a higher share of queries by scaling up their performance concerning that parameter.
Smoothing Mechanism:

To avoid abrupt and extreme changes in weights we apply a moving average smoothing technique. This is followed by Normalizing the weights to ensure they sum up to 1.

Each node receives an allocation score based on epoch weights and normalized parameters for uptime, speed of inference and staked capital.

$A_i = \text{uptime}_{\text{node}_i} - \text{uptime} + \text{speed}_{\text{node}_i} - \text{speed} + \max(\text{S}_{\text{f}}, \min(\text{S}_{\text{c}}, \text{S}_{\text{node}_i}))$

$A_i: \text{Allocation score for node i}$

$\alpha: \text{Weights assigned to parameter on network performance in the previous epoch}$

$N_{i,param}: \text{normalized value for node performance for the respective parameter}$

$S_f: \text{Minimum stake required to be part of the node operator set}$

$S_c: \text{Protocol enforced cap on staked capital per node.}$

There might be a node operator providing the best performance across parameters. We wish to ensure that this node operator receives the highest number of queries but simultaneously to achieve the other objective of incentivizing all node operators to be continually operational on the network we utilise this Allocation score to calculate probability of getting allocated the next query.

$P_i(A_i) = \frac{A_i}{\sum_{j=1}^{n} A_j}$

PreviousMission: Empower the AI and thus the User NextDecentralized LLM Economics & Tech

Last updated 1 year ago