Optimizer Methodology

Coframe utilizes the Multi-Armed Bandit (MAB) framework to optimize conversion rate. We solve the MAB formulation to dynamically optimize website content. Our approach extends the traditional MAB formulation to enable the addition and removal of variants over time which allows to adapt to user behavior over time.

Key Enhancements

Maintaining an Adaptive Holdout Baseline: A control variant is retained for to estimate the baseline performance, the holdout allocation is dynamically adjusted to optimize for maximal conversions once enough data has been collected to establish a baseline.
Dropping Underperforming Arms: Variants that consistently underperform are removed based on their allocations which is proportional to the belief of their performance.
Intelligent Variant Generation: Our AI models generate new variants by analyzing and incorporating insights from ongoing experimental data.
Incubation Period for New Arms: New variants are given a protection period to gather sufficient data before being evaluated for potential removal.

Standard Multi-Armed Bandit Overview

In the classical MAB problem, an agent selects among

K

arms (variants), each with an unknown reward distribution. The objective is to maximize the cumulative reward over time by balancing exploration (trying out less-known arms) and exploitation (selecting the best-known arms).

Mathematical Formulation

Arms: $\mathcal{A} = \{1, 2, \dots, K\}$
**At each time **:
- Select an arm $a_t \in \mathcal{A}$ .
- Receive a reward $r_t$ drawn from the probability distribution associated with $a_t$ .

In our context, the agent aims to minimize regret

R(T)

, defined as:

R(T) = T \mu^* - \sum_{t=1}^T r_t

where

\mu^*

is the expected reward of the optimal arm. In the context of website optimization, regret is closely tied to conversion rate. The regret represents the difference between the cumulative reward (conversions) obtained by always choosing the optimal arm and the actual cumulative reward obtained by the algorithm. A lower regret implies that the algorithm is performing closer to the optimal strategy, which in turn means it’s achieving a higher overall conversion rate. By minimizing regret, the algorithm effectively maximizes the conversion rate over time.

Coframe’s Approach

Algorithm Steps

Initialization

Initial Set of Arms: $\mathcal{A} = \{0, 1, 2, \dots, K\}$ , where $0$ represents the baseline or control variant.
Initial Traffic Allocation: Distribute traffic equally among all arms, i.e., each arm receives $\frac{1}{K+1}$ of the traffic.
Initial Belief: For each arm, we initialize its belief with the same prior.
Iteration Start: The algorithm begins its iterations, with each iteration representing a one-hour time period.

At Each Time Step $t$ (Hourly Iteration)

Arm Selection:
- Use a selection policy (Thompson Sampling) to choose an arm $a_t$ .
Reward Observation:
- Observe reward $r_t$ ( $r_t = 1$ if a conversion occurs, $r_t = 0$ otherwise).
Parameter Update:
- Update the success ( $S_i$ ) and failure ( $F_i$ ) counts for arm $a_t$ .

Note on Experimental Unit: Our experimental unit is “unique visitor hours”. Within each one-hour iteration, exposures are deduplicated by visitor using a cookie. This means that the same person within the same iteration will see the same variant/arm, ensuring consistency in the user experience and data collection.

Dropping Underperforming Arms

Criteria for Dropping:
- An arm is considered for removal if its traffic allocation falls below a predefined threshold (e.g., 2%) and its incubation period has elapsed.
Rationale:
- This allows for the addition of new variants to replace underperforming ones.

Introducing New Arms

Generation:
- New variants are continuously created by our generative AI models and can be added to an ongoing MAB when other variants are dropped.
Conditioning Factors:
- Historical rewards $\{ r_i \}$
- Content features $\{ v_i \}$
- The model leverages this information to continuously incorporate learnings and produce variants with a higher expected reward.

Traffic Reallocation

Adjust Traffic Allocation:
- Traffic allocation for each variant is automatically adjusted every iteration.

Selection Policy: Thompson Sampling

Thompson Sampling is used because it balances exploration and exploitation in a regret minimizing fashion.

Posterior Sampling

For each arm

i

Successes: $S_i$
Failures: $F_i$
Beta Distribution: Sample $\theta_i \sim \text{Beta}(S_i + alpha_0, F_i + \beta_0)$
Arm Selection: Choose arm $a_t$ $a_{t}$ by sampling from all arms and selecting the arm associated to the maximum value
- Batch processing: We process observations in batches and sample from the posterior distribution after all the observations in one iteration have been accounted for.

Traffic Allocation Constraints

Minimum Allocation for Holdout Baseline:
- Ensure the holdout baseline arm $a_0$ retains at least a fraction $\alpha$ of the traffic (e.g., $\alpha = 0.05$ ).

Advantages Over Standard MAB

Continuous Evolution: New variants are regularly introduced, keeping content fresh and aligned with traffic behavioral patterns.
Efficient Resource Utilization: Underperforming variants are removed, reallocating resources to better-performing ones.
Accelerated Optimization: AI-generated variants, informed by past performance, may lead to higher rewards more quickly.
Robust Benchmarking: Maintaining a constant baseline variant allows for consistent performance tracking over time.

Mathematical Details

Reward Estimation

For each arm

i

Estimated Conversion Rate:

\hat{\mu}_i = \frac{S_i}{n_i}

where

n_i = S_i + F_i

is the total number of times arm

i

has been selected.

Credibility Intervals:

We use Bayesian methods to estimate the credibility interval for

\hat{\mu}_i

. This provides a probabilistic range of plausible values for the mean of the underlying conversion rate, given the observed data and prior beliefs. For each arm

i

, we can calculate the posterior distribution:

P(\mu_i | \text{data}) \propto P(\text{data} | \mu_i) \cdot P(\mu_i)

where

P(\mu_i)

is the prior distribution and

P(\text{data} | \mu_i)

is the likelihood. The 95% credibility interval can then be derived from the posterior distribution.

Dropping Criterion Justification

Balancing Exploration and Exploitation:

Thompson Sampling addresses the exploration-exploitation dilemma by minimizing regret which allows it to optimally explore arms that are statistically promising, while still exploiting those that been shown to be good.

AI-Conditioned Variant Generation

Objective:

Generate a new variant

v_{\text{new}}

expected to have a higher reward than the average of the dropped arms.

Model Inputs:
- Features of previous variants $\{ v_i \}$
- Performance metrics $\{ \hat{\mu}_i \}$
Model Output:
- A new variant $v_{\text{new}}$ predicted to perform better.
Expected Reward of New Variant: The expected reward of the new variant is: $\mu_{\text{new}} = \mathbb{E}\left[ r_{\text{new}} \mid v_{\text{new}} \right]$ Our goal is for $\mu_{\text{new}} > \hat{\mu}_{\text{avg}}$ , where $\hat{\mu}_{\text{avg}}$ is the average performance of previous arms.

Implementation Considerations

Comparison with Standard MAB

Aspect	Standard MAB	Coframe’s Modified MAB
Arm Pool	Fixed	Dynamic (arms can be added/dropped)
Underperforming Arms	Continue to receive some traffic	Dropped when below performance thresholds
New Variant Introduction	Not typically included	Introduces new arms
Baseline Variant	Not necessarily maintained	Constant holdout baseline is maintained
Adaptability	Limited to initial arm set	Continuously adapts with new variants

Practical Example

Assume we start with:

Holdout Baseline: $v_0$
Initial Variants: $v_1$ , $v_2$ , $v_3$

Process Flow

Initialization:
- Equal traffic allocation.
- Collect observations over time for each variant.
Evaluation:
- After sufficient data, suppose $v_2$ falls below the 2% allocation threshold.
Dropping Underperformer:
- Drop $v_2$ from the active arm set.
Variant Generation:
- Generate a new variant $v_{\text{new}}$ conditioned on past variants and their performance.
Reallocation:
- Redistribute traffic among $v_0$ , $v_1$ , $v_3$ , and $v_{\text{new}}$ .
Iteration:
- Repeat the process, continuously refining the variant pool.

In summary…

Coframe’s multi-armed bandit approach offers a robust and efficient method for website content optimization by:

Leveraging Past Performance: AI-generated variants are informed by historical data, increasing the likelihood of success.
Dynamic Adaptation: The ability to drop and introduce arms keeps the optimization process responsive to changing user behaviors.
Maintaining a Baseline: A constant control variant ensures that performance improvements are measured accurately.

References

Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-Time Analysis of the Multiarmed Bandit Problem. Machine Learning.
Kleinberg, R., & Slivkins, A. (2010). Multi-Armed Bandits in Metric Spaces. Proceedings of the 41st Annual ACM Symposium on Theory of Computing (STOC).
Bubeck, S., Munos, R., & Stoltz, G. (2011). Pure Exploration in Finitely-Armed and Continuous-Armed Bandits. Theoretical Computer Science.

Introduction

Getting Started

Using The Editor

Optimization

Advanced Features

SEO and Performance

Coframe Labs

Optimizer Methodology

Key Enhancements

Standard Multi-Armed Bandit Overview

Mathematical Formulation

Coframe’s Approach

Algorithm Steps

Initialization

At Each Time Step $t$ (Hourly Iteration)

Dropping Underperforming Arms

Introducing New Arms

Traffic Reallocation

Selection Policy: Thompson Sampling

Posterior Sampling

Traffic Allocation Constraints

Advantages Over Standard MAB

Mathematical Details

Reward Estimation

Dropping Criterion Justification

AI-Conditioned Variant Generation

Implementation Considerations

Comparison with Standard MAB

Practical Example

Process Flow

In summary…

References

Introduction

Getting Started

Using The Editor

Optimization

Advanced Features

SEO and Performance

Coframe Labs

​Key Enhancements

​Standard Multi-Armed Bandit Overview

​Mathematical Formulation

​Coframe’s Approach

​Algorithm Steps

​Initialization

​At Each Time Step ttt (Hourly Iteration)

​Dropping Underperforming Arms

​Introducing New Arms

​Traffic Reallocation

​Selection Policy: Thompson Sampling

​Posterior Sampling

​Traffic Allocation Constraints

​Advantages Over Standard MAB

​Mathematical Details

​Reward Estimation

​Dropping Criterion Justification

​AI-Conditioned Variant Generation

​Implementation Considerations

​Comparison with Standard MAB

​Practical Example

​Process Flow

​In summary…

​References

Key Enhancements

Standard Multi-Armed Bandit Overview

Mathematical Formulation

Coframe’s Approach

Algorithm Steps

Initialization

At Each Time Step $t$ (Hourly Iteration)

Dropping Underperforming Arms

Introducing New Arms

Traffic Reallocation

Selection Policy: Thompson Sampling

Posterior Sampling

Traffic Allocation Constraints

Advantages Over Standard MAB

Mathematical Details

Reward Estimation

Dropping Criterion Justification

AI-Conditioned Variant Generation

Implementation Considerations

Comparison with Standard MAB

Practical Example

Process Flow

In summary…

References