Class mcmc_para_base (o2scl)

O2scl : Class List

template<class func_t, class measure_t, class data_t, class vec_t = ubvector>
class o2scl::mcmc_para_base

A generic MCMC simulation class.

This class performs a Markov chain Monte Carlo simulation of a user-specified function using OpenMP and/or MPI. Either the Metropolis-Hastings algorithm with a user-specified proposal distribution or the affine-invariant sampling method of Goodman and Weare can be used.

By default, the Metropolis-Hastings algorithm is executed with a simple walk, with steps in each dimension of size \( (\mathrm{high} - \mathrm{low})/\mathrm{step\_fac} \) with the denominator specified in step_fac.

The function type is a template type, func_t, which should be of the form

int f(size_t num_of_parameters, const vec_t &parameters,
double &log_pdf, data_t &dat)
which computes log_pdf, the natural logarithm of the function value, for any point in parameter space (any point between low and high ).

If the function being simulated returns mcmc_skip then the point is automatically rejected. After each acceptance or rejection, a user-specified “measurement” function (of type measure_t ) is called, which can be used to store the results. In order to stop the simulation, either this function or the probability distribution being simulated should return the value mcmc_done .

A generic proposal distribution can be specified in set_proposal(). To go back to the default random walk method, one can call the function unset_proposal().

If aff_inv is set to true, then affine-invariant sampling is used. For affine-invariant sampling, the variable step_fac represents the value of \( a \), the limits of the distribution for \( z \).

In order to store data at each point, the user can store this data in any object of type data_t . If affine-invariant sampling is used, then each chain has it’s own data object. The class keeps twice as many copies of these data object as would otherwise be required, in order to avoid copying of data objects in the case that the steps are accepted or rejected.

Verbose output: If verbose is 0, no output is generated (the default). If verbose is 1, then output to cout occurs only if the settings are somehow misconfigured and the class attempts to recover from them, for example if not enough functions are specified for the requested number of OpenMP threads, or if more than one thread was requested but O2SCL_OPENMP was not defined, or if a negative value for step_fac was requested. When verbose is 1, a couple messages are written to scr_out : a summary of the number of walkers, chains, and threads at the beginning of the MCMC simulation, a message indicating why the MCMC simulation stopped, a message when the warm up iterations are completed, a message every time files are written to disk, and a message at the end counting the number of acceptances and rejections. If verbose is 2, then the file prefix is output to cout during initialization.

Idea for Future:

The copy constructor for the data_t type is used when the user doesn’t specify enough initial points for the corresponding number of threads and walkers. This requirement for a copy constructor could be removed by allowing threads and walkers to share the data_t for the initial point as needed.

Note

This class is experimental.

Note

Currently, this class requires that the data_t has a good copy constructor.

MPI properties

int mpi_rank

The MPI processor rank.

int mpi_size

The MPI number of processors.

std::ofstream scr_out

The screen output file.

std::vector<rng<>> rg

Random number generators.

std::vector<o2scl::prob_cond_mdim<vec_t>*> prop_dist

Pointer to proposal distribution for each thread.

bool pd_mode

If true, then use the user-specified proposal distribution.

bool warm_up

If true, we are in the warm up phase.

std::vector<vec_t> current

Current points in parameter space for each walker and each OpenMP thread.

This is an array of size n_threads times n_walk initial guesses, indexed by thread_index*n_walk+walker_index .

std::vector<data_t> data_arr

Data array.

This is an array of size 2 times n_threads times n_walk . The two copies of data objects are indexed by i_copy*n_walk*n_threads+thread_index*n_walk+walker_index

std::vector<bool> switch_arr

Data switch array for each walker and each OpenMP thread.

This is an array of size n_threads times n_walk initial guesses, indexed by thread_index*n_walk+walker_index .

std::vector<std::vector<size_t>> ret_value_counts

Return value counters, one vector independent chain.

Interface customization

std::vector<size_t> curr_walker

Index of the current walker.

This quantity has to be a vector because different threads may have different values for the current walker during the initialization phase for the affine sampling algorithm.

bool meas_for_initial

If true, call the measurement function for the initial point.

static const int mcmc_done = -10

Integer to indicate completion.

static const int mcmc_skip = -20

Integer to indicate rejection.

inline virtual int mcmc_init()

Initializations before the MCMC.

inline virtual void mcmc_cleanup()

Cleanup after the MCMC.

inline virtual void best_point(vec_t &best, double w_best, data_t &dat)

Function to run for the best point.

When this function is in a OpenMP parallel region it is marked critical to allow the user to store data which is shared across threads.

Output quantities

std::vector<size_t> n_accept

The number of Metropolis steps which were accepted in each independent chain (summed over all walkers)

This vector has a size equal to n_threads .

std::vector<size_t> n_reject

The number of Metropolis steps which were rejected in each independent chain (summed over all walkers)

This vector has a size equal to n_threads .

Settings

double mpi_start_time

The MPI starting time (defaults to 0.0)

This can be set by the user before mcmc() is called, so that the time required for initializations before the MCMC starts can be counted.

size_t max_iters

If non-zero, the maximum number of MCMC iterations (default 0)

If both max_iters and max_time are nonzero, the MCMC will stop when either the number of iterations exceeds max_iters or the time exceeds max_time, whichever occurs first.

double max_time

Time in seconds (default is 0)

If both max_iters and max_time are nonzero, the MCMC will stop when either the number of iterations exceeds max_iters or the time exceeds max_time, whichever occurs first.

std::string prefix

Prefix for output filenames (default “mcmc”)

bool aff_inv

If true, use affine-invariant Monte Carlo.

double step_fac

Stepsize factor (default 10.0)

std::vector<double> step_vec

Optionally specify step sizes for each parameter.

bool couple_threads

If true, couple the walkers across threads during affine-invariant sampling (default false)

size_t n_warm_up

Number of warm up steps (successful steps not iterations) (default 0)

Note

Not to be confused with warm_up, which is a protected boolean local variable in some functions which indicates whether we’re in warm up mode or not.

int user_seed

If non-zero, use as the seed for the random number generator (default 0)

The random number generator is modified so that each thread and each rank has a different set of random numbers.

If this value is zero, then the random number generators are seeded by the clock time in seconds, so that if two separate simulations begin at the same time (to within 1 second) they will produce identical results. This can be avoided simply by ensuring that user_seed is different between the two simulations.

int verbose

Output control (default 0)

size_t max_bad_steps

Maximum number of failed steps when generating initial points with affine-invariant sampling (default 1000)

size_t n_walk

Number of walkers (per openMP thread) for affine-invariant MC or 1 otherwise (default 1)

bool err_nonconv

If true, call the error handler if msolve() or msolve_de() does not converge (default true)

bool always_accept

If true, accept all steps.

double ai_initial_step

Initial step fraction for affine-invariance sampling walkers (default 0.1)

size_t n_threads

Number of OpenMP threads.

std::vector<ubvector> initial_points

Initial points in parameter space.

To fully specify all of the initial points, this should be a vector of size n_walk times n_threads . Initial points are used for multiple threads and/or walkers if the full number of initial points is not specified.

inline mcmc_para_base()

Basic usage

inline virtual int mcmc(size_t n_params, vec_t &low, vec_t &high, std::vector<func_t> &func, std::vector<measure_t> &meas)

Perform a MCMC simulation.

Perform a MCMC simulation over n_params parameters starting at initial point init, limiting the parameters to be between low and high, using func as the objective function and calling the measurement function meas at each MC point.

inline virtual int mcmc(size_t n_params, vec_t &low, vec_t &high, func_t &func, measure_t &meas)

Perform a MCMC simulation with a thread-safe function or with only one OpenMP thread.

Proposal distribution

template<class prob_vec_t>
inline void set_proposal(prob_vec_t &pv)

Set the proposal distribution.

Note

This function automatically sets aff_inv to false and n_walk to 1.

template<class prob_ptr_vec_t>
inline void set_proposal_ptrs(prob_ptr_vec_t &pv)

Set pointers to proposal distributions.

Note

This function automatically sets aff_inv to false and n_walk to 1.

inline virtual void unset_proposal()

Go back to random-walk Metropolis with a uniform distribution.