AbstractBayesOpt Tutorial: Basic 2D Optimisation

Setup

Loading the necessary packages.

using AbstractBayesOpt
using AbstractGPs
using ForwardDiff
using Plots

Define the objective function

f(x) = (x[1]^2 + x[2] - 11)^2 + (x[1]+x[2]^2-7)^2
d = 2
domain = ContinuousDomain([-6.0, -6.0], [6.0, 6.0])
4-element Vector{Vector{Float64}}:
 [3.0, 2.0]
 [-2.805118, 3.131312]
 [-3.77931, -3.283186]
 [3.584428, -1.848126]

Scatter them on the contour plot

Example block output

Standard GPs

We'll use a standard Gaussian Process surrogate with a squared-exponential kernel. We add a small jitter term for numerical stability of $10^{-9}$.

noise_var = 1e-9
surrogate = StandardGP(SqExponentialKernel(), noise_var)
StandardGP{Float64}(AbstractGPs.GP{AbstractGPs.ZeroMean{Float64}, KernelFunctions.ScaledKernel{KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Float64}}, Float64}}(AbstractGPs.ZeroMean{Float64}(), Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
	- Scale Transform (s = 1.0)
	- σ² = 1.0), 1.0e-9, nothing)

Generate uniform random samples x_train

n_train = 5
x_train = [domain.lower .+ (domain.upper .- domain.lower) .* rand(d) for _ in 1:n_train]

y_train = f.(x_train)
5-element Vector{Float64}:
 110.33292690679502
  73.82945209082831
 238.53003395234026
  53.19981955218784
 175.66046578818757

Choose an acquisition function

We'll use the Expected Improvement acquisition function with an exploration parameter ξ = 0.0.

ξ = 0.0
acq = ExpectedImprovement(ξ, minimum(y_train))
ExpectedImprovement{Float64}(0.0, 53.19981955218784)

Set up the Bayesian Optimisation structure

We use BOStruct to bundle all components needed for the optimization. Here, we set the number of iterations to 5 and the actual noise level to 0.0 (since our function is noiseless). We then run the optimize function to perform the Bayesian Optimisation.

bo_struct = BOStruct(
    f,
    acq,
    surrogate,
    domain,
    x_train,
    y_train,
    50,  # number of iterations
    0.0,  # Actual noise level (0.0 for noiseless)
)

@info "Starting Bayesian Optimisation..."
result, acq_list, standard_params = AbstractBayesOpt.optimize(
    bo_struct; standardize="mean_only"
);
[ Info: Starting Bayesian Optimisation...
[ Info: Standardization choice: mean_only
[ Info: Standardization parameters: μ=130.3105396580678, σ=1.0
[ Info: Optimizing GP hyperparameters at iteration 1...
[ Info: New parameters: ℓ=[4.330178382332697], variance =[7070.708091406502]
[ Info: Iteration #1, current min val: 53.19981955218784
[ Info: Acquisition optimized, new candidate point: [5.115857350378444, 3.214323837552329]
[ Info: Iteration #2, current min val: 53.19981955218784
[ Info: Acquisition optimized, new candidate point: [0.8454242390350472, 1.917348396869295]
[ Info: Iteration #3, current min val: 53.19981955218784
[ Info: Acquisition optimized, new candidate point: [-1.3377544368268106, 5.9999999999999964]
[ Info: Iteration #4, current min val: 53.19981955218784
[ Info: Acquisition optimized, new candidate point: [-5.999999999999992, 0.751233739390227]
[ Info: Iteration #5, current min val: 53.19981955218784
[ Info: Acquisition optimized, new candidate point: [4.613860393603205, -0.9168855566851378]
[ Info: Iteration #6, current min val: 53.19981955218784
[ Info: Acquisition optimized, new candidate point: [3.335637797136044, -0.0640945363812162]
[ Info: Iteration #7, current min val: 13.401351912692927
[ Info: Acquisition optimized, new candidate point: [-3.0374403448295, -5.999999999999934]
[ Info: Iteration #8, current min val: 13.401351912692927
[ Info: Acquisition optimized, new candidate point: [3.3351811061607934, 0.5905661712219521]
[ Info: Iteration #9, current min val: 11.505985692561753
[ Info: Acquisition optimized, new candidate point: [3.3582799848767384, 0.4216635577214649]
[ Info: Iteration #10, current min val: 11.505985692561753
[ Info: Acquisition optimized, new candidate point: [2.874086192260681, 1.8059310514084217]
[ Info: Optimizing GP hyperparameters at iteration 11...
[ Info: New parameters: ℓ=[2.888774239256242], variance =[197318.7524667607]
[ Info: Iteration #11, current min val: 1.619197702776436
[ Info: Acquisition optimized, new candidate point: [5.999999999999034, -5.9999999999744995]
[ Info: Iteration #12, current min val: 1.619197702776436
[ Info: Acquisition optimized, new candidate point: [-5.999999999999992, 5.9999999999785345]
[ Info: Iteration #13, current min val: 1.619197702776436
[ Info: Acquisition optimized, new candidate point: [-2.262321877949737, 0.6775335103112982]
[ Info: Iteration #14, current min val: 1.619197702776436
[ Info: Acquisition optimized, new candidate point: [-5.999999999999916, -3.7685110601455247]
[ Info: Iteration #15, current min val: 1.619197702776436
[ Info: Acquisition optimized, new candidate point: [2.8211280036402067, 5.9999999999999964]
[ Info: Iteration #16, current min val: 1.619197702776436
[ Info: Acquisition optimized, new candidate point: [2.2464986759457464, -2.087369661371338]
[ Info: Iteration #17, current min val: 1.619197702776436
[ Info: Acquisition optimized, new candidate point: [-2.094949550414847, 2.27184045535781]
[ Info: Iteration #18, current min val: 1.619197702776436
[ Info: Acquisition optimized, new candidate point: [2.773975454554396, 1.5571391326709423]
[ Info: Iteration #19, current min val: 1.619197702776436
[ Info: Acquisition optimized, new candidate point: [0.35369691430818306, -5.9999999999999964]
[ Info: Iteration #20, current min val: 1.619197702776436
[ Info: Acquisition optimized, new candidate point: [-5.999999999964585, -5.999999999999827]
[ Info: Optimizing GP hyperparameters at iteration 21...
[ Info: New parameters: ℓ=[3.4174128278523264], variance =[999999.9998908886]
[ Info: Iteration #21, current min val: 1.619197702776436
[ Info: Acquisition optimized, new candidate point: [-3.358889693972501, -2.9264795452021004]
[ Info: Iteration #22, current min val: 1.619197702776436
[ Info: Acquisition optimized, new candidate point: [3.435825423768197, -2.236980426327851]
[ Info: Iteration #23, current min val: 1.619197702776436
[ Info: Acquisition optimized, new candidate point: [-2.804320008102409, -2.252847689896239]
[ Info: Iteration #24, current min val: 1.619197702776436
[ Info: Acquisition optimized, new candidate point: [-3.2072056416918326, -3.4127141474169034]
[ Info: Iteration #25, current min val: 1.619197702776436
[ Info: Acquisition optimized, new candidate point: [3.466806530395123, -1.70556516770397]
[ Info: Iteration #26, current min val: 0.8613952188994597
[ Info: Acquisition optimized, new candidate point: [-3.7743131392745863, -3.406913818364359]
[ Info: Iteration #27, current min val: 0.7195441744407131
[ Info: Acquisition optimized, new candidate point: [-3.497185124308461, 2.5844800384711317]
[ Info: Iteration #28, current min val: 0.7195441744407131
[ Info: Acquisition optimized, new candidate point: [-3.683965513584456, -3.326398420136565]
[ Info: Iteration #29, current min val: 0.7148490135240131
[ Info: Acquisition optimized, new candidate point: [-3.331126686329616, 1.880500663930502]
[ Info: Iteration #30, current min val: 0.7148490135240131
[ Info: Acquisition optimized, new candidate point: [-2.7143870586442604, 3.216110794310689]
[ Info: Optimizing GP hyperparameters at iteration 31...
[ Info: New parameters: ℓ=[3.223936585066202], variance =[999999.9999298849]
[ Info: Iteration #31, current min val: 0.5686672598523428
[ Info: Acquisition optimized, new candidate point: [-2.7646080694024215, 3.0187340921839985]
[ Info: Iteration #32, current min val: 0.5392964865329588
[ Info: Acquisition optimized, new candidate point: [-2.731379622679194, 3.1165750839756217]
[ Info: Iteration #33, current min val: 0.17925710335636205
[ Info: Acquisition optimized, new candidate point: [-2.833319334998223, 3.142470320860991]
[ Info: Iteration #34, current min val: 0.030704683939779147
[ Info: Acquisition optimized, new candidate point: [2.8618340372626023, 2.2229650867921706]
[ Info: Iteration #35, current min val: 0.030704683939779147
[ Info: Acquisition optimized, new candidate point: [3.6347134791029716, -1.8965074268495437]
[ Info: Iteration #36, current min val: 0.030704683939779147
[ Info: Acquisition optimized, new candidate point: [3.5766066344826295, -1.8603719817934272]
[ Info: Iteration #37, current min val: 0.0060720622129323675
[ Info: Acquisition optimized, new candidate point: [5.999999999999674, 5.9999999999959535]
[ Info: Iteration #38, current min val: 0.0060720622129323675
[ Info: Acquisition optimized, new candidate point: [-2.8035594815693305, 3.129693147243312]
[ Info: Iteration #39, current min val: 0.0001809734064675151
[ Info: Acquisition optimized, new candidate point: [3.0084120138973773, 2.0023371065045668]
[ Info: Iteration #40, current min val: 0.0001809734064675151
[ Info: Acquisition optimized, new candidate point: [-3.7928626226611875, -3.2743955874635176]
[ Info: Optimizing GP hyperparameters at iteration 41...
[ Info: New parameters: ℓ=[3.1703080927742104], variance =[999999.9999998468]
[ Info: Iteration #41, current min val: 0.0001809734064675151
[ Info: Acquisition optimized, new candidate point: [-3.785863141547552, -3.2961915487676525]
[ Info: Iteration #42, current min val: 0.0001809734064675151
[ Info: Acquisition optimized, new candidate point: [3.578216843559911, -1.8927385014362743]
[ Info: Iteration #43, current min val: 0.0001809734064675151
[ Info: Acquisition optimized, new candidate point: [3.4315548519721384, -5.999999999999996]
[ Info: Iteration #44, current min val: 0.0001809734064675151
[ Info: Acquisition optimized, new candidate point: [-3.7777440695741715, -3.2819313426329875]
[ Info: Iteration #45, current min val: 0.00015645807418455764
[ Info: Acquisition optimized, new candidate point: [2.9993363311766927, 1.997711260538149]
[ Info: Iteration #46, current min val: 0.00013561940265094394
[ Info: Acquisition optimized, new candidate point: [-3.7805999999999997, 5.9718]
[ Info: Iteration #47, current min val: 0.00013561940265094394
[ Info: Acquisition optimized, new candidate point: [5.9514, -3.2502]
[ Info: Iteration #48, current min val: 0.00013561940265094394
[ Info: Acquisition optimized, new candidate point: [0.8981999999999992, 4.173]
[ Info: Iteration #49, current min val: 0.00013561940265094394
[ Info: Acquisition optimized, new candidate point: [-5.9898, 3.551400000000001]
[ Info: Iteration #50, current min val: 0.00013561940265094394
[ Info: Acquisition optimized, new candidate point: [5.9826000000000015, 1.7682000000000002]

Results

The optimization result is stored in result. We can print the best found input and its corresponding function value.

Optimal point: [2.9993363311766927, 1.997711260538149]
Optimal value: 0.00013561940265094394

Plotting of running minimum over iterations

The running minimum is the best function value found up to each iteration.

Example block output

Gradient-enhanced GPs

Now, let's see how to use gradient information to improve the optimization. We'll use the same function but now also provide its gradient. We define a new surrogate model that can handle gradient information, specifically a GradientGP.

grad_surrogate = GradientGP(SqExponentialKernel(), d + 1, noise_var)

ξ = 0.0
acq = ExpectedImprovement(ξ, minimum(y_train))

∇f(x) = ForwardDiff.gradient(f, x)
f_val_grad(x) = [f(x); ∇f(x)];

Generate value and gradients at random samples

y_train_grad = f_val_grad.(x_train)
5-element Vector{Vector{Float64}}:
 [110.33292690679502, -67.222233853589, -6.233898817174085]
 [73.82945209082831, 6.547041429739899, -29.940074643607936]
 [238.53003395234026, -66.57395175082759, -200.2478970556137]
 [53.19981955218784, -41.927777146518736, -26.836469804291816]
 [175.66046578818757, 23.666930366717853, -9.728624459781422]

Set up the Bayesian Optimisation structure

bo_struct_grad = BOStruct(
    f_val_grad,
    acq,
    grad_surrogate,
    domain,
    x_train,
    y_train_grad,
    20,  # number of iterations
    0.0,  # Actual noise level (0.0 for noiseless)
)

result_grad, acq_list_grad, standard_params_grad = AbstractBayesOpt.optimize(bo_struct_grad);
[ Info: Starting Bayesian Optimisation...
[ Info: Standardization choice: mean_scale
[ Info: Standardization parameters: μ=[130.3105396580678, 0.0, 0.0], σ=[76.32718415574614, 76.32718415574614, 76.32718415574614]
[ Info: Optimizing GP hyperparameters at iteration 1...
[ Info: New parameters: ℓ=[2.3127458100313087], variance =[5.347060062551795]
[ Info: Iteration #1, current min val: 53.19981955218784
[ Info: Acquisition optimized, new candidate point: [4.180763951409505, -2.1829760800740345]
[ Info: Iteration #2, current min val: 22.24148738011653
[ Info: Acquisition optimized, new candidate point: [3.032115941339768, 5.0274036465106455]
[ Info: Iteration #3, current min val: 22.24148738011653
[ Info: Acquisition optimized, new candidate point: [3.9525690987083792, 2.629549834695538]
[ Info: Iteration #4, current min val: 22.24148738011653
[ Info: Acquisition optimized, new candidate point: [-3.013677218491583, 4.696040639448896]
[ Info: Iteration #5, current min val: 22.24148738011653
[ Info: Acquisition optimized, new candidate point: [-4.288618635163485, 2.729584822217246]
[ Info: Iteration #6, current min val: 22.24148738011653
[ Info: Acquisition optimized, new candidate point: [-2.7715870864021768, 3.187143265298191]
[ Info: Iteration #7, current min val: 0.1664273163445334
[ Info: Acquisition optimized, new candidate point: [-3.254825216143795, -0.09378970580819537]
[ Info: Iteration #8, current min val: 0.1664273163445334
[ Info: Acquisition optimized, new candidate point: [-4.573327611321231, -3.3787865642873753]
[ Info: Iteration #9, current min val: 0.1664273163445334
[ Info: Acquisition optimized, new candidate point: [-3.484647413815764, -3.0977908128447456]
[ Info: Iteration #10, current min val: 0.1664273163445334
[ Info: Acquisition optimized, new candidate point: [-3.890423414577708, -5.999999999999367]
[ Info: Optimizing GP hyperparameters at iteration 11...
[ Info: New parameters: ℓ=[3.2704286935209064], variance =[171.64909515285146]
[ Info: Iteration #11, current min val: 0.1664273163445334
[ Info: Acquisition optimized, new candidate point: [5.999999999999999, -5.99999999999986]
[ Info: Iteration #12, current min val: 0.1664273163445334
[ Info: Acquisition optimized, new candidate point: [3.5476249017894217, -1.8199818167174178]
[ Info: Iteration #13, current min val: 0.07452650321346022
[ Info: Acquisition optimized, new candidate point: [2.9193776444020947, 2.2087072463063113]
[ Info: Iteration #14, current min val: 0.07452650321346022
[ Info: Acquisition optimized, new candidate point: [3.0576371621724068, 1.8574977220240607]
[ Info: Iteration #15, current min val: 0.07452650321346022
[ Info: Acquisition optimized, new candidate point: [-3.7860466193080486, -3.3096052124739983]
[ Info: Iteration #16, current min val: 0.0286385657239309
[ Info: Acquisition optimized, new candidate point: [-2.8065285821642942, 3.130025010894712]
[ Info: Iteration #17, current min val: 0.00013364535647463822
[ Info: Acquisition optimized, new candidate point: [-5.9958, 5.776200000000001]
[ Info: Iteration #18, current min val: 0.00013364535647463822
[ Info: Acquisition optimized, new candidate point: [-5.9946, -5.8122]
[ Info: Iteration #19, current min val: 0.00013364535647463822
[ Info: Acquisition optimized, new candidate point: [5.9466, -0.3533999999999997]
[ Info: Iteration #20, current min val: 0.00013364535647463822
[ Info: Acquisition optimized, new candidate point: [5.8842, 5.953800000000001]

Results

The optimization result is stored in result_grad. We can print the best found input and its corresponding function value.

Optimal point (GradBO): [-2.8065285821642942, 3.130025010894712]
Optimal value (GradBO): 0.00013364535647463822

Plotting of running minimum over iterations

The running minimum is the best function value found up to each iteration. Since each evaluation provides both a function value and a 2D gradient, we duplicate the running minimum values 3x to reflect the number of function evaluations.

Example block output

We observe that the gradient information does not necessarily lead to a better optimisation path in terms of function evaluations.

Plotting the surrogate model

We can visualize the surrogate model's mean and uncertainty along with the true function and the evaluated

Example block output

This page was generated using Literate.jl.