The recovery actions described next occur automatically if failures occur in the
Apple Qmaster distributed processing system. There is no need for you, as the
administrator, to enable or configure these features.
If a Service Stops Unexpectedly
If either the cluster controller service or the processing enabled on a service node stops
unexpectedly, the Apple Qmaster distributed processing system restarts the service. To
avoid the risk of endless stopping and restarting, the system restarts the failed service a
maximum of four times. The first two times, it restarts the service right away. If the service
stops abruptly a third or fourth time, the system restarts the service only if it had been
running for at least 10 seconds before it stopped.
Apple Qmaster and Distributed Processing
If a Batch Is Interrupted
When a service stops suddenly while in the middle of processing an Apple Qmaster batch,
the cluster controller resubmits the interrupted batch in a way that prevents the
reprocessing of any batch segments that were complete before the service stopped. The
cluster controller delays resuming the batch for about a minute from the time it loses
contact with the service.
If a Batch Fails
When the service is running, but one batch fails to process, a service exception occurs.
When this happens, the cluster controller resubmits the batch immediately. The cluster
controller resubmits the batch a maximum of two times. If the job fails on the third
submission, the distributed processing system stops resubmitting the job. In Share Monitor,
the job’s status is set to Failed.