Goto Chapter: Top 1 2 3 4 5 6 7 8 9 10 11 Ind
 [Top of Book]  [Contents]   [Previous Chapter]   [Next Chapter] 

8 Background jobs using fork
 8.1 Background jobs
 8.2 Parallel programming skeletons
 8.3 Worker farms

8 Background jobs using fork

This chapter describes a way to use multi-processor or multi-core machines from within GAP. In its current version the GAP system is a single threaded and single process system. However, modern operating systems allow, via the fork system call, to replicate a complete process on the same machine relatively efficiently. That is, at first after a fork the two processes actually use the same physical memory such that not much copying needs to be done. The child process is in exactly the same state as the parent process, sharing open files, network connections and the complete status of the workspace. However, whenever a page of memory is written, it is then automatically copied using new, additional physical memory, such that it behaves like a completely separate process. This method is called copy-on-write.

Thus this is a method to parallelise certain computations. Note however, that from the point of time when the fork has occurred, all further communication between the two processes has to be realised via pipes or even files.

The operations and methods described in this chapter help to use GAP in this way and implement certain skeletons of parallel programming to make these readily available in GAP. Note that this implementation has its severe limitations and should probably eventually be replaced by a proper multi-threaded version of GAP.

8.1 Background jobs

One creates a background job with the following operation:

8.1-1 BackgroundJobByFork
‣ BackgroundJobByFork( fun, args[, opt] )( operation )

Returns: a background job object or fail

This operation creates a background job using IO_fork (3.2-19) which starts up as an identical copy of the currently running GAP process. In this child process the function fun is called with the argument list args. The third argument opt must be a record for options. The operation returns either an object representing the background job or fail if the startup did not work.

This operation automatically sets up two pipes for communication with the child process. This is in particular used to report the result of the function call to fun back to the parent. However, if called without the option TerminateImmediately (see below) the child process stays alive even after the completion of fun and one can submit further argument lists for subsequent calls to fun. Of course, these additional argument lists will have to be sent over a pipe to the child process. A special case is if the argument args is equal to fail, in this case the child process is started but does not automatically call fun but rather waits in an idle state until an argument list is submitted via the pipe using the Submit (8.1-6) operation described below.

There are two components defined which can be bound in the options record opt. One is TerminateImmediately, if this is bound to true then the child process immediately terminates after the function fun returns its result. In this case, no pipe for communication from parent to child is created since it would never be used. Note that in this case one can still get the result of the function fun using the Pickup (8.1-5) operation described below, even when the child has already terminated, since the result is first transmitted back to the parent before termination.

The following operations are available to deal with background job objects:

8.1-2 IsIdle
‣ IsIdle( job )( operation )

Returns: true, false or fail

This operation checks whether or not the background job represented by the object job has already finished the function call to its worker function and is now idle. If so, true is returned. If it is still running and working on the worker function, false is returned. If the background job has already terminated altogether, this operation returns fail. Note that if a child process terminates automatically after the first completion of its worker function and sending the result, then the first call to IsIdle after completion will return true to indicate successful completion and all subsequent calls will return fail.

8.1-3 HasTerminated
‣ HasTerminated( job )( operation )

Returns: true or false

This operation checks whether or not the background job represented by the object job has already terminated. If so, true is returned, if not, false is returned.

8.1-4 WaitUntilIdle
‣ WaitUntilIdle( job )( operation )

Returns: the result of the worker function or fail

This operation waits until the worker function of the background job job has finished and the job is idle. It then returns the result of the worker function, which has automatically been transmitted to the parent process. If the child process has died before completion fail is returned.

8.1-5 Pickup
‣ Pickup( job )( operation )

Returns: the result of the worker function or fail

This operation does the same as WaitUntilIdle (8.1-4).

8.1-6 Submit
‣ Submit( job, args )( operation )

Returns: true or fail

This submits another argument list args for another call to the worker function in the background job job. It is an error if either the background job has already terminated or if it is still busy working on the previous argument list. That is, one must only submit another argument in a situation when IsIdle (8.1-2) would return true. This is for example the case directly after a successful call to Pickup (8.1-5) or i WaitUntilIdle (8.1-4) which did not return fail, unless the background job was created with the TerminateImmediately option set to true.

This operation returns immediately after submission, when the new argument list has been sent to the child process through a pipe. In particular, it does not await completion of the worker function for the new argument list.

8.1-7 Kill
‣ Kill( job )( operation )

Returns: nothing

This kills the background job represented by the object job with immediate effect. No more results can be expected from it. Note that unless one has created the background job with the TerminateImmediately option set to true one always has to call Kill on a background job eventually for cleanup purposes. Otherwise, the background job and the connecting pipes remain alive until the parent GAP process terminates.

8.2 Parallel programming skeletons

In this section we document the operations for the available skeletons. For a general description of these ideas see other sources.

8.2-1 ParTakeFirstResultByFork
‣ ParTakeFirstResultByFork( jobs, args[, opt] )( operation )

Returns: a list of results or fail

The argument jobs must be a list of GAP functions and the argument args a list of the same length containing argument lists with which the job functions can be called. This operation starts up a background job using fork for each of the functions in jobs, calls it with the corresponding argument list in args. As soon as any of the background jobs finishes with a result, ParTakeFirstResultByFork terminates all other jobs and reports the results found so far. Note that it can happen that two jobs finish at the same time in the sense that both results are received before all other jobs could be terminated. Therefore the result of ParTakeFirstResultByFork is a list, in which position \(i\) is bound if and only if job number \(i\) returned a result. So in the result at least one entry is bound but it is possible that more than one entry is bound.

You can specify an overall timeout to give up the whole computation if no job finishes by setting the TimeOut component of the options record opt. In this case you have to set it to a record with two components tv_sec and tv_usec which are seconds and microseconds respectively, exactly as returned by the IO_gettimeofday (3.2-29) function. In the case of timeout an empty list is returned.

8.2-2 ParDoByFork
‣ ParDoByFork( jobs, args[, opt] )( operation )

Returns: a list of results or fail

The argument jobs must be a list of GAP functions and the argument args a list of the same length containing argument lists with which the job functions can be called. This operation starts up a background job using fork for each of the functions in jobs, calls it with the corresponding argument list in args. As soon as all of the background jobs finish with a result, ParDoByFork reports the results found. Therefore the result of ParDoByFork is a list, in which position \(i\) is bound to the result that job number \(i\) returned.

You can specify an overall timeout to stop the whole computation if not all jobs finish in time by setting the TimeOut component of the options record opt. In this case you have to set it to a record with two components tv_sec and tv_usec which are seconds and microseconds respectively, exactly as returned by the IO_gettimeofday (3.2-29) function. In the case of timeout a list is returned in which the positions corresponding to those jobs that have already finished are bound to the respective results and the other positions are unbound.

8.2-3 ParListByFork
‣ ParListByFork( l, worker[, opt] )( operation )

Returns: a list of results or fail

This is a parallel version of the List (Reference: list and non-list difference) function. It applies the function worker to all elements of the list l and returns a list containing the results in corresponding positions. You have to specify the component NumberJobs in the options record opt which indicates how many background processes to start. You can optionally use the TimeOut option exactly as for ParDoByFork (8.2-2), however, if a timeout occurs, ParListByFork returns fail.

Note that the usefulness of this operation is relatively limited, since every individual result has to be sent back over a pipe from the child process to the parent process. Therefore this only makes sense if the computation time for the worker function dominates the communication time.

8.2-4 ParMapReduceByFork
‣ ParMapReduceByFork( l, map, reduce[, opt] )( operation )

Returns: a value or fail

This is a parallel version implementation of the classical MapReduce pattern. It applies the function map to all elements of the list l and then reduces the result using the reduce function which accepts two return values of map and returns one of them. Thus, the final result is one return value or fail if the startup of the jobs fails. You have to specify the component NumberJobs in the options record opt which indicates how many background processes to start. You can optionally use the TimeOut option exactly as for ParDoByFork (8.2-2), however, if a timeout occurs, ParMapReduceByFork returns fail.

Note that this can be very useful because quite often the cumulated computation time for all the worker function calls dominates the communication time for a single result.

8.2-5 IO_CallWithTimeout
‣ IO_CallWithTimeout( timeout, func, ... )( function )
‣ IO_CallWithTimeoutList( timeout, func, arglist )( function )

IO_CallWithTimeout and IO_CallWithTimeoutList allow calling a function with a limit on length of time it will run. The function is run inside a copy of the current GAP session, so any changes it makes to global variables are thrown away when the function finishes or times out. The return value of func is passed back to the current GAP session via IO_Pickle. Note that IO_Pickle may not be available for all objects.

IO_CallWithTimeout is variadic. Any arguments to it beyond the first two are passed as arguments to func. IO_CallWithTimeoutList in contrast takes exactly three arguments, of which the third is a list (possibly empty) of arguments to pass to func.

If the call completes within the allotted time and returns a value res, the result of IO_CallWithTimeout[List] is a list of the form [ true, res ].

If the call completes within the allotted time and returns no value, the result of IO_CallWithTimeout[List] is the list [ true ].

If the call does not complete within the timeout, the result of IO_CallWithTimeout[List] is the list [ false ]. If the call causes GAP to crash or exit, the result is the list [ fail ].

The timer is suspended during execution of a break loop and abandoned when you quit from a break loop.

The limit timeout is specified as a record. At present the following components are recognised nanoseconds, microseconds, milliseconds, seconds, minutes, hours, days and weeks. Any of these components which is present should be bound to a positive integer, rational or float and the times represented are totalled to give the actual timeout. As a shorthand, a single positive integers may be supplied, and is taken as a number of microseconds. Further components are permitted and ignored, to allow for future functionality.

The precision of the timeouts is not guaranteed, and there is a system dependent upper limit on the timeout which is typically about 8 years on 32 bit systems and about 30 billion years on 64 bit systems. Timeouts longer than this will be reduced to this limit.

Note that the next parallel skeleton is a worker farm which is described in the following section.

8.3 Worker farms

The parallel skeleton of a worker farm is basically nothing but a bunch of background jobs all with the same worker function and all eagerly waiting for work. The only additional concepts needed are an input and an output queue. The input queue contains argument lists and the output queue pairs of argument lists and results.

One creates a worker farm with the following operation:

8.3-1 ParWorkerFarmByFork
‣ ParWorkerFarmByFork( fun, opt )( operation )

Returns: an object representing the worker farm or fail

This operation creates a worker farm with the worker function fun and sets up its input and output queue. An object representing the farm is returned unless not all jobs could be started up in which case fail is returned. After startup all background jobs in the farm are idle. The only valid option in the options record opt is NumberJobs and it must be bound to the number of worker jobs in the farm, a positive integer.

The following operations are for worker farm objects:

8.3-2 DoQueues
‣ DoQueues( wf, block )( operation )

Returns: nothing

This operation called on a worker farm object wf administrates the input and output queues of the worker farm. In particular it checks whether new results are available from the workers and if so it appends them to the output queue. If jobs are idle and the input queue is non-empty, argument lists from the input queue are sent to the idle jobs and removed from the input queue.

This operation must be called regularly to keep up the communication with the clients. It uses select and so does not block if the boolean argument block is set to false. However, if larger chunks of data has to be sent or received this operation might need some time to return.

If the boolean argument block is set to true then the DoQueues blocks until at least one job has returned a result. This can be used to wait for the termination of all tasks without burning CPU cycles in the parent job. One would repeatedly call DoQueues with block set to true and after each such call check with IsIdle (8.3-4) whether all tasks are done. Note that one should no longer call DoQueues with block set to true once this is the case since then it would block forever.

This operation is called automatically by most of the following operations.

8.3-3 Kill
‣ Kill( wf )( operation )

Returns: nothing

This operation terminates all background jobs in the farm wf, which cannot be used subsequently. One should always call this operation when the worker farm is no longer needed to free resources.

8.3-4 IsIdle
‣ IsIdle( wf )( operation )

Returns: true or false

This operation returns true if all background jobs in the worker farm wf are idle. This means, that all tasks which have previously been submitted using Submit (8.3-5) have been completed and their result been appended to the output queue. The operation DoQueues (8.3-2) is automatically called before the execution of IsIdle.

8.3-5 Submit
‣ Submit( wg, arglist )( operation )

Returns: nothing

This operation submits a task in the form of an argument list for the worker function to the worker farm. It is appended at the end of the input queue. The operation DoQueues (8.3-2) is automatically called after the execution of Submit, giving the farm a chance to actually send the work out to the worker background jobs.

8.3-6 Pickup
‣ Pickup( wg, arglist )( operation )

Returns: nothing

This operation collects all results from the output queue of the worker farm. The output queue is empty after this function returns. The results are reported as a list of pairs, each pair has the input argument list as first component and the output object as second component.

The operation DoQueues (8.3-2) is automatically called before the execution of Pickup, giving the farm a chance to actually receive some more results from the worker background jobs.

 [Top of Book]  [Contents]   [Previous Chapter]   [Next Chapter] 
Goto Chapter: Top 1 2 3 4 5 6 7 8 9 10 11 Ind

generated by GAPDoc2HTML