Variance

Definition of The Problem

Variance is average of the square deviation. We assume that input list is [a1, a2, ..., an], mathematical definition is as follows: var = 1/n sigma square(ai-ave), ave = 1/n sigma ai

By definitions (see reference manual) of the skeletons provided in the library, a program written in Haskell which calculates variance is as follows:

  var as = sqSum / n
  where
    sum = reduce (+) as
    ave = sum / n
    sqSum = reduce (+) (map square (map (-ave) as))

Thus, we can get a parallel program by just translating it into C++ code with the Library.

Programming in C++ with Library

C++ code for Variance is listed below:

  dist_list<double> *as = new dist_list<double>(gen, SIZE);
  double ave = list_skeletons::reduce(add, add_unit, as) / SIZE;
  Sub sub(ave);
  list_skeletons::map_ow(sub, as);
  list_skeletons::map_ow(sqr, as);
  double var = list_skeletons::reduce(add, add_unit, as) / SIZE;

This code is a part of a source file "variance.cpp" in samples directory. An explanation for the code is as follows.

  1. First, we generate the distributed list "as" by the constructor of dist_list class. The arguments are a function object "gen" and size of list "SIZE". We must define the function object to generate data, it takes an index of the list as integer argument.
      dist_list<double> *as = new dist_list<double>(gen, SIZE);
    
  2. Applying the reduce skeleton to "as" and dividing it by "SIZE", we can get average of "as". The reduce skeleton is provided as a static member function of list_skeletons class. "add" is an addition operator (in fact, binary function object) and add_unit is zero.
      double ave = list_skeletons::reduce(add, add_unit, as) / SIZE;
    
  3. Applying the map skeleton to "as" with "sub" and "sqr", we can get the distributed list of (ai - ave)2. "map_ow" is a type of map skeleton which overwrites a list of the arguments with its return value. "Sub sub(ave)" creates new function object which means (-ave) and "sqr" is square function object.
      Sub sub(ave);
      list_skeletons::map_ow(sub, as);
      list_skeletons::map_ow(sqr, as);
    
  4. Finally, applying the reduce skeleton to "as" and dividing it by "SIZE", we can get result of variance "var".
      double var = list_skeletons::reduce(add, add_unit, as) / SIZE;
    

Function Object

Each function object must inherit base function object class. An example is as follows:

struct Gen : public unary_function<int, double> {
  double operator()(int index) const { return static_cast<double>(index); }
} gen;

"Gen" inherits "unary_function", since it is one argument function object. "unary_function" is template class, we must give it type of argument and type of return value. "operator()" needs "const" qualifier, because we want to keep referential transparency of function object. "binary_function" is also similar.

Compilation and Execution

A complete code with a sample main function is listed below ("samples/list/variance.cpp"):

#include <iostream>
#include "list_skeletons.h"
using namespace std;

const int SIZE = 1000;

struct Gen : public unary_function<int, double> {
  double operator()(int index) const { return static_cast<double>(index); }
} gen;

struct Add : public binary_function<double, double, double> {
  double operator()(double x, double y) const { return x + y; }
} add;
const double add_unit = 0.0;

struct Sqr : public unary_function<double, double> {
  double operator()(double x) const { return x * x; }
} sqr;

struct Sub : public unary_function<double, double> {
  double val;
  Sub(double val_) : val(val_){ }
  double operator()(double x) const { return x - val; }
};

int SketoMain(int argc, char **argv)
{
  dist_list<double> *as = new dist_list<double>(gen, SIZE);
  double ave = list_skeletons::reduce(add, add_unit, as) / SIZE;
  Sub sub(ave);
  list_skeletons::map_ow(sub, as);
  list_skeletons::map_ow(sqr, as);
  double var = list_skeletons::reduce(add, add_unit, as) / SIZE;

  if(skeleton::rank == 0){
    cout << "average:" << ave << "\n" << "variance:" << var << "\n";
  }
  return 0;
}

The program will begin at the function "SketoMain" instead of the "main" function(the ordinary entry point). All the programs using our skeleton library must have "SketoMain" functions as their entry points.

Compilation

Compilation is simply done by the following command:
(You should specify a suitable path to the skeleton library in path_to_skeleton_library and change the current directory to 'samples/list/')

> mpiCC -Wall -O3 -o variance variance.cpp

NOTE:The name of C++ compiler with MPI "mpiCC" may differ in some environments.

If you don't have MPI's C++ compiler "mpiCC" (or C++ bindings for MPI), you may compile the program by passing some extra arguments to an ordinary C++ compiler such as g++:

> g++ -Wall -O3 -Ipath_to_skeleton_library -Ipath_to_mpi_include_files -Lpath_to_mpi_libraries -o variance variance.cpp -lmpich

NOTE:You can also compile the source code by just using make (GNU make) command if you have successfully generated 'base makefile' during the installation.

> make -f ../../makefile.base variance

Other samples and tests in the directory 'samples/list/' can be compiled in the same way (or just type 'make' in the directory).

Execution

Then, we can execute the program by the following command:

> mpirun -np n variance

Here n is the number of processors involved in the execution.
NOTE:The executer of MPI programs "mpirun" may differ in some environments.

Back to Index