eo::mpi Namespace Reference

MPI parallelization helpers for EO. More...


namespace  Channel

Tags used in MPI messages for framework communication.

namespace  Message

Simple orders used by the framework.


struct  SharedDataFunction
 Base class for the 4 algorithm functors. More...
struct  SendTaskFunction
 Functor (master side) used to send a task to the worker. More...
struct  HandleResponseFunction
 Functor (master side) used to indicate what to do when receiving a response. More...
struct  ProcessTaskFunction
 Functor (worker side) implementing the processing to do. More...
struct  IsFinishedFunction
 Functor (master side) indicating whether the job is done or not. More...
struct  JobStore
 Contains all the required data and the functors to launch a job. More...
class  Job
 Class implementing the centralized job algorithm. More...
class  OneShotJob
 Job that will be launched only once. More...
class  MultiJob
 Job that will be launched an unknown amount of times, in worker side. More...
struct  AssignmentAlgorithm
 Contains informations on the available workers and allows to find assignees for jobs. More...
struct  DynamicAssignmentAlgorithm
 Assignment (scheduling) algorithm which handles workers in a queue. More...
struct  StaticAssignmentAlgorithm
 Assignment algorithm which gives to each worker a precise number of tasks to do, in a round robin fashion. More...
class  Node
 Global object used to reach boost::mpi::communicator everywhere. More...
struct  ParallelApplyAssignment
 Structure used to save assignment to a worker, i.e which slice of the table it has to process. More...
struct  ParallelApplyData
 Data useful for a parallel apply (map). More...
class  SendTaskParallelApply
 Send task functor implementation for the parallel apply (map) job. More...
class  HandleResponseParallelApply
 Handle response functor implementation for the parallel apply (map) job. More...
class  ProcessTaskParallelApply
 Process task functor implementation for the parallel apply (map) job. More...
class  IsFinishedParallelApply
 Is finished functor implementation for the parallel apply (map) job. More...
struct  ParallelApplyStore
 Store containing all the datas and the functors for the parallel apply (map) job. More...
class  ParallelApply
 Parallel apply job. More...
struct  DummySendTaskFunction
 Send task functor which does nothing. More...
struct  DummyHandleResponseFunction
 Handle response functor which does nothing. More...
struct  DummyProcessTaskFunction
 Process task functor which does nothing. More...
struct  DummyIsFinishedFunction
 Is finished functor which returns true everytime. More...
struct  DummyJobStore
 Job store containing all dummy functors and containing no data. More...
struct  EmptyJob
 Job to run after a Multi Job, so as to indicate that every workers should terminate. More...


eoTimerStat timerStat
 A timer which allows user to generate statistics about computation times.
const int DEFAULT_MASTER = 0
 If the job only has one master, the user can use this constant, so as not to worry with integer ids.
const int REST_OF_THE_WORLD = -1
 Constant indicating to use all the resting available workers, in assignment algorithms constructor using an interval.

Detailed Description

MPI parallelization helpers for EO.

This namespace contains parallelization functions which help to parallelize computations in EO. It is based on a generic algorithm, which is then customized with functors, corresponding to the algorithm main steps. These computations are centralized, i.e there is one central host whose role is to handle the steps of the algorithm ; we call it the "master". The other hosts just have to perform a "dummy" computation, which may be any kind of processing ; we call them, the "slaves", or less pejoratively, the "workers". Workers can communicate to each other, but they receive their orders from the Master and send him back some results. A worker can also be the master of a different parallelization process, as soon as it is a part of its work. Machines of the network, also called hosts, are identified by an unique number: their rank. At any time during the execution of the program, all the hosts know the total number of hosts.

A parallelized Job is a set of tasks which are independant (i.e can be executed in random order without modifiying the result) and take a data input and compute a data output to be sent to the Master. The data can be of any type, however they have to be serialized to be sent over a network. It is sufficient that they can be serialized through boost.

For serialization purposes, don't depend upon boost. It would be easy to use only eoserial and send strings via mpi.

The main steps of the algorithm are the following:

  • For the master:
    • Have we done with the treatment we are doing ?
    • If this is the case, we can quit.
    • Otherwise, send an input data to some available worker.
    • If there's no available worker, wait for a worker to be free.
    • When receiving the response, handle it (eventually compute something on the output data, store it...).
    • Go back to the first step.
  • For the worker, it is even easier:
    • Wait for an order.
    • If there's nothing to do, just quit.
    • Otherwise, eventually retrieve data and do the work.
    • Go back to the first step.

There is of course some network adjustements to do and precisions to give there, but the main ideas are present. As the job is fully centralized, this is the master who tells the workers when to quit and when to work.

The idea behind these MPI helpers is to be the most generic possible. If we look back at the steps of the algorithm, we found that the steps can be splitted into 2 parts: the first consists in the steps of any parallelization algorithm and the other consists in the specific parts of the algorithm. Ideally, the user should just have to implement the specific parts of the algorithm. We identified these parts to be:

  • For the master:
    • What does mean to have terminated ? There are only two alternatives, in our binary world: either it is terminated, or it is not. Hence we only need a function returning a boolean to know if we're done with the computation : we'll call it IsFinished.
    • What do we have to do when we send a task ? We don't have any a priori on the form of the sent data, or the number of sent data. Moreover, as the tasks are all independant, we don't care of who will do the computation, as soon as it's done. Knowing the rank of the worker will be sufficient to send him data. We have identified another function, taking a single argument which is the rank of the worker: we'll call it SendTask.
    • What do we have to do when we receive a response from a worker ? One more time, we don't know which form or structure can have the receive data, only the user can know. Also we let the user the charge to retrieve the data ; he just has to know from who the master will retrieve the data. Here is another function, taking a rank (the sender's one) as a function argument : this will be HandleResponse.
  • For the worker:
    • What is the processing ? It can have any nature. We just need to be sure that a data is sent back to the master, but it seems difficult to check that: it will be the role of the user to assert that data is sent by the worker at the end of an execution. We've got identified our last function: ProcessTask.

In term of implementation, it would be annoying to have only abstract classes with these 4 methods to implement. It would mean that if you want to alter just one of these 4 functions, you have to implement a new sub class, with a new constructor which could have the same signature. Besides, this fashion doesn't allow you to add dynamic functionalities, using the design pattern Decorator for instance, without implement a class for each type of decoration you want to add. For these reasons, we decided to transform function into functors ; the user can then wrap the existing, basic comportments into more sophisticated computations, whenever he wants, and without the notion of order. We retrieve here the power of extension given by the design pattern Decorator.

Our 4 functors could have a big amount of data in common (see eoParallelApply to have an idea). So as to make it easy for the user to implement these 4 functors, we consider that these functors have to share a common data structure. This data structure is referenced (as a pointer) in the 4 functors, so the user doesn't need to pass a lot of parameters to each functor constructor.

There are two kinds of jobs:

  • The job which are launched a fixed and well known amount of times, i.e both master and workers know how many times they will be launched. They are "one shot jobs".
  • The job which are launched an unknown amount of times, for instance embedded in a while loop for which we don't know the amount of repetitions (typically, eoEasyEA loop is a good example, as we don't know the continuator condition). They are called "multi job". As the master tells the workers to quit, we have to differentiate these two kinds of jobs. When the job is of the kind "multi job", the workers would have to perform a while(true) loop so as to receive the orders ; but even if the master tells them to quit, they would begin another job and wait for another order, while the master would have quit: this would cause a deadlock and workers processes would be blocked, waiting for an order.
 All Classes Namespaces Files Functions Variables Typedefs Friends