Sensitivity Functions¶

Hyperparameter sensitivity linear approximation¶

class vittles.sensitivity_lib.HyperparameterSensitivityLinearApproximation(objective_fun, opt_par_value, hyper_par_value, validate_optimum=False, hessian_at_opt=None, cross_hess_at_opt=None, hyper_par_objective_fun=None, grad_tol=1e-08)[source]¶

Linearly approximate dependence of an optimum on a hyperparameter.

Suppose we have an optimization problem in which the objective depends on a hyperparameter:

\[\hat{\theta} = \mathrm{argmin}_{\theta} f(\theta, \lambda).\]

The optimal parameter, \(\hat{\theta}\), is a function of \(\lambda\) through the optimization problem. In general, this dependence is complex and nonlinear. To approximate this dependence, this class uses the linear approximation:

\[\hat{\theta}(\lambda) \approx \hat{\theta}(\lambda_0) + \frac{d\hat{\theta}}{d\lambda}|_{\lambda_0} (\lambda - \lambda_0).\]

In terms of the arguments to this function, \(\theta\) corresponds to opt_par, \(\lambda\) corresponds to hyper_par, and \(f\) corresponds to objective_fun.

Methods

set_base_values:	Set the base values, \(\lambda_0\) and \(\theta_0 := \hat\theta(\lambda_0)\), at which the linear approximation is evaluated.
get_dopt_dhyper:	Return the Jacobian matrix \(\frac{d\hat{\theta}}{d\lambda}\|_{\lambda_0}\) in flattened space.
get_hessian_at_opt:	Return the Hessian of the objective function in the flattened space.
predict_opt_par_from_hyper_par:	Use the linear approximation to predict the value of `opt_par` from a value of `hyper_par`.

__init__(self, objective_fun, opt_par_value, hyper_par_value, validate_optimum=False, hessian_at_opt=None, cross_hess_at_opt=None, hyper_par_objective_fun=None, grad_tol=1e-08)[source]¶

Parameters:

objective_fun : callable: The objective function taking two positional arguments, - opt_par: The parameter to be optimized (numpy.ndarray (N,)) - hyper_par: A hyperparameter (numpy.ndarray (N,)) and returning a real value to be minimized.
opt_par_value : numpy.ndarray (N,): The value of opt_par at which objective_fun is optimized for the given value of hyper_par_value.
hyper_par_value : numpy.ndarray (M,): The value of hyper_par at which opt_par optimizes objective_fun.
validate_optimum : bool, optional: When setting the values of opt_par and hyper_par, check that opt_par is, in fact, a critical point of objective_fun.
hessian_at_opt : numpy.ndarray (N,N), optional: The Hessian of objective_fun at the optimum. If not specified, it is calculated using automatic differentiation.
cross_hess_at_opt : numpy.ndarray (N, M): Optional. The second derivative of the objective with respect to input_val then hyper_val. If not specified it is calculated at initialization.
hyper_par_objective_fun : callable, optional: The part of objective_fun depending on both opt_par and hyper_par. The arguments must be the same as objective_fun: - opt_par: The parameter to be optimized (numpy.ndarray (N,)) - hyper_par: A hyperparameter (numpy.ndarray (N,)) This can be useful if only a small part of the objective function depends on both opt_par and hyper_par. If not specified, objective_fun is used.
grad_tol : float, optional: The tolerance used to check that the gradient is approximately zero at the optimum.

get_opt_par_function(self)[source]¶: Return a differentiable function returning the optimal value.

predict_opt_par_from_hyper_par(self, new_hyper_par_value)[source]¶

Predict opt_par using the linear approximation.

Parameters:	new_hyper_par_value: `numpy.ndarray` (M,) The value of `hyper_par` at which to approximate `opt_par`.

Hyperparameter sensitivity Taylor series approximation¶

class vittles.sensitivity_lib.ParametricSensitivityTaylorExpansion(estimating_equation, input_val0, hyper_val0, order, hess_solver, forward_mode=True, max_input_order=None, max_hyper_order=None, force=False)[source]¶

Evaluate the Taylor series of an optimum on a hyperparameter.

This is a class for computing the Taylor series of eta(eps) = argmax_eta objective(eta, eps) using forward-mode automatic differentation.

Note

This class is experimental and should be used with caution.

__init__(self, estimating_equation, input_val0, hyper_val0, order, hess_solver, forward_mode=True, max_input_order=None, max_hyper_order=None, force=False)[source]¶

Parameters:

estimating_equation : callable

A vector-valued function function of two arguments, (input, output), where the length of the vector is the same as the length of input, and which is (approximately) the zero vector when evaluated at (input_val0, hyper_val0).

input_val0 : numpy.ndarray (N,)

The value of input_par at the optimum.

hyper_val0 : numpy.ndarray (M,)

The value of hyper_par at which input_val0 was found.

order : int

The maximum order of the Taylor series to be calculated.

hess_solver : function

A function that takes a single argument, v, and returns

\[\frac{\partial G}{\partial \eta}^{-1} v,\]

where \(G(\eta, \epsilon)\) is the estimating equation, and the partial derivative is evaluated at \((\eta, \epsilon) =\) (input_val0, hyper_val0).

forward_mode : bool

Optional. If True (the default), use forward-mode automatic differentiation. Otherwise, use reverse-mode.

max_input_order : int

Optional. The maximum number of nonzero partial derivatives of the objective function gradient with respect to the input parameter. If None, calculate partial derivatives of all orders.

max_hyper_order : int

Optional. The maximum number of nonzero partial derivatives of the objective function gradient with respect to the hyperparameter. If None, calculate partial derivatives of all orders.

force: `bool`

Optional. If True, force the instantiation of potentially expensive reverse mode derivative arrays. Default is False.

evaluate_input_derivs(self, dhyper, max_order=None)[source]¶: Return a list of the derivatives dkinput / dhyperk dhyper^k

evaluate_taylor_series(self, new_hyper_val, add_offset=True, max_order=None, sum_terms=True)[source]¶

Evaluate the derivative d^k input / d hyper^k in the direction dhyper.

Parameters:

new_hyper_val: `numpy.ndarray` (N, ): The new hyperparameter value at which to evaluate the Taylor series.
add_offset: `bool`: Optional. Whether to add the initial constant input_val0 to the Taylor series.
max_order: `int`: Optional. The order of the Taylor series. Defaults to the order argument to __init__.
sum_terms: `bool`: If True, add the terms in the Taylor series. If False, return the terms as a list.

Returns:

The Taylor series approximation to input_val(new_hyper_val) if: add_offset is True, or to input_val(new_hyper_val) - input_val0 if False. If sum_terms is True, then a vector of the same length as input_val is returned. Otherwise, an array of shape max_order + 1, len(input_val) is returned containing the terms of the Taylor series approximation.

evaluate_taylor_series_terms(self, new_hyper_val, add_offset=True, max_order=None)[source]¶: Return the terms in a Taylor series approximation.

classmethod optimization_objective(objective_function, input_val0, hyper_val0, order, hess0=None, forward_mode=True, max_input_order=None, max_hyper_order=None, force=False)[source]¶

Parameters:

objective_function : callable: The optimization objective as a function of two arguments (eta, eps), where eta is the parameter that is optimized and eps is a hyperparameter.
hess0 : numpy.ndarray (N, N): Optional. The Hessian of the objective at (input_val0, hyper_val0). If not specified it is calculated at initialization.
The remaining arguments are the same as for the `__init__` method.

print_terms(self, k=None)[source]¶

Print the derivative terms in the Taylor series.

Parameters:	k: integer Optional. Which term to print. If unspecified, all terms are printed.

Optimum checking¶

class vittles.bivariate_sensitivity_lib.OptimumChecker(estimating_equation, solver, input_base, hyper_base)[source]¶

__init__(self, estimating_equation, solver, input_base, hyper_base)[source]¶

Estimate the error in sensitivity due to incomplete optimization.

Parameters:

estimating_equation : callable: A function taking arguments (input, hyper) and returning a vector, typically the same length as the input. The idea is that estimating_equation(input_base, hyper_base) = [0, …, 0].
solver : callable: A function of a single vector variable v solving \(H^{-1} v\), where H is the Hessian of the estimating equation with respect to the input variable at input_base, hyper_base.
input_base : numpy.ndarray: The base value of the parameter to be optimized
hyper_base : numpy.ndarray: The base value of the hyperparameter.

correction(self, hyper_new, dinput_dhyper=None, newton_step=None)[source]¶: Return the first-order correction to the change in dinput_dhyper as you take a Newton step.

evaluate(self, hyper_new, dinput_dhyper=None, newton_step=None)[source]¶: Return the first-order approximation to the change in dinput_dhyper as you take a Newton step.

get_dinput_dhyper(self, dhyper)[source]¶: Return the first directional derivative of the optimum with respect to the hyperparameter in the direction dhyper.

get_newton_step(self)[source]¶: Return a Netwon step towards the optimum.

class vittles.bivariate_sensitivity_lib.CrossSensitivity(estimating_equation, solver, input_base, hyper1_base, hyper2_base, term_ii=True, term_i1=True, term_i2=True, term_12=True)[source]¶

Calculate a second-order derivative of an optimum with resepct to two hyperparameters.

Given an estimating equation \(G(\theta, \epsilon_1, \epsilon_2)\), with \(G(\hat\theta(\epsilon_1, \epsilon_2), \epsilon_1, \epsilon_2) = 0\), this class evaluates a directional derivatives

\[\frac{d^2\hat{\theta}}{d\epsilon_1 d\epsilon_2} \Delta \epsilon_1 \Delta \epsilon_2.\]

__init__(self, estimating_equation, solver, input_base, hyper1_base, hyper2_base, term_ii=True, term_i1=True, term_i2=True, term_12=True)[source]¶: Initialize self. See help(type(self)) for accurate signature.

Linear response covariances¶

class vittles.lr_cov_lib.LinearResponseCovariances(objective_fun, opt_par_value, validate_optimum=False, hessian_at_opt=None, factorize_hessian=True, grad_tol=1e-08)[source]¶

Calculate linear response covariances of a variational distribution.

Let \(q(\theta | \eta)\) be a class of probability distribtions on \(\theta\) where the class is parameterized by the real-valued vector \(\eta\). Suppose that we wish to approximate a distribution \(q(\theta | \eta^*) \approx p(\theta)\) by solving an optimization problem \(\eta^* = \mathrm{argmin} f(\eta)\). For example, \(f\) might be a measure of distance between \(q(\theta | \eta)\) and \(p(\theta)\). This class uses the sensitivity of the optimal \(\eta^*\) to estimate the covariance \(\mathrm{Cov}_p(g(\theta))\). This covariance estimate is called the “linear response covariance”.

In this notation, the arguments to the class mathods are as follows. \(f\) is objective_fun, \(\eta^*\) is opt_par_value, and the function calculate_moments evaluates \(\mathbb{E}_{q(\theta | \eta)}[g(\theta)]\) as a function of \(\eta\).

Methods

set_base_values:	Set the base values, \(\eta^*\) that optimizes the objective function.
get_hessian_at_opt:	Return the Hessian of the objective function evaluated at the optimum.
get_hessian_cholesky_at_opt:	Return the Cholesky decomposition of the Hessian of the objective function evaluated at the optimum.
get_lr_covariance:	Return the linear response covariance of a given moment.

__init__(self, objective_fun, opt_par_value, validate_optimum=False, hessian_at_opt=None, factorize_hessian=True, grad_tol=1e-08)[source]¶

Parameters:

objective_fun: Callable function: A callable function whose optimum parameterizes an approximate Bayesian posterior. The function must take as a single argument a numeric vector, opt_par.
opt_par_value:: The value of opt_par at which objective_fun is optimized.
validate_optimum: Boolean: When setting the values of opt_par, check that opt_par is, in fact, a critical point of objective_fun.
hessian_at_opt: Numeric matrix (optional): The Hessian of objective_fun at the optimum. If not specified, it is calculated using automatic differentiation.
factorize_hessian: Boolean: If True, solve the required linear system using a Cholesky factorization. If False, use the conjugate gradient algorithm to avoid forming or inverting the Hessian.
grad_tol: Float: The tolerance used to check that the gradient is approximately zero at the optimum.

get_lr_covariance(self, calculate_moments)[source]¶

Get the linear response covariance of a vector of moments.

Parameters:	calculate_moments: Callable function A function that takes the folded `opt_par` as a single argument and returns a numeric vector containing posterior moments of interest.
Returns:	Numeric matrix If `calculate_moments(opt_par)` returns \(\mathbb{E}_q[g(\theta)]\) then this returns the linear response estimate of \(\mathrm{Cov}_p(g(\theta))\).

get_lr_covariance_from_jacobians(self, moment_jacobian1, moment_jacobian2)[source]¶

Get the linear response covariance between two vectors of moments.

Parameters:	moment_jacobian1: 2d numeric array. The Jacobian matrix of a map from a value of `opt_par` to a vector of moments of interest. Following standard notation for Jacobian matrices, the rows should correspond to moments and the columns to elements of a flattened `opt_par`. moment_jacobian2: 2d numeric array. Like `moment_jacobian1` but for the second vector of moments.
Returns:	Numeric matrix If `moment_jacobian1(opt_par)` is the Jacobian of \(\mathbb{E}_q[g_1(\theta)]\) and `moment_jacobian2(opt_par)` is the Jacobian of \(\mathbb{E}_q[g_2(\theta)]\) then this returns the linear response estimate of \(\mathrm{Cov}_p(g_1(\theta), g_2(\theta))\).

get_moment_jacobian(self, calculate_moments)[source]¶

The Jacobian matrix of a map from opt_par to a vector of moments of interest.

Parameters:	calculate_moments: Callable function A function that takes the folded `opt_par` as a single argument and returns a numeric vector containing posterior moments of interest.
Returns:	Numeric matrix The Jacobian of the moments.