public class CheckpointCoordinator extends Object
Depending on the configured RecoveryMode, the behaviour of the CompletedCheckpointStore and CheckpointIDCounter change. The default standalone
implementations don't support any recovery.
| Modifier and Type | Field and Description |
|---|---|
protected CheckpointIDCounter |
checkpointIdCounter
Checkpoint ID counter to ensure ascending IDs.
|
protected Object |
lock
Coordinator-wide lock to safeguard the checkpoint updates
|
protected int |
numberKeyGroups |
| Constructor and Description |
|---|
CheckpointCoordinator(org.apache.flink.api.common.JobID job,
long baseInterval,
long checkpointTimeout,
int numberKeyGroups,
ExecutionVertex[] tasksToTrigger,
ExecutionVertex[] tasksToWaitFor,
ExecutionVertex[] tasksToCommitTo,
ClassLoader userClassLoader,
CheckpointIDCounter checkpointIDCounter,
CompletedCheckpointStore completedCheckpointStore,
RecoveryMode recoveryMode) |
CheckpointCoordinator(org.apache.flink.api.common.JobID job,
long baseInterval,
long checkpointTimeout,
long minPauseBetweenCheckpoints,
int maxConcurrentCheckpointAttempts,
int numberKeyGroups,
ExecutionVertex[] tasksToTrigger,
ExecutionVertex[] tasksToWaitFor,
ExecutionVertex[] tasksToCommitTo,
ClassLoader userClassLoader,
CheckpointIDCounter checkpointIDCounter,
CompletedCheckpointStore completedCheckpointStore,
RecoveryMode recoveryMode,
CheckpointStatsTracker statsTracker) |
| Modifier and Type | Method and Description |
|---|---|
ActorGateway |
createActivatorDeactivator(akka.actor.ActorSystem actorSystem,
UUID leaderSessionID) |
protected List<Set<Integer>> |
createKeyGroupPartitions(int numberKeyGroups,
int parallelism)
Groups the available set of key groups into key group partitions.
|
protected long |
getAndIncrementCheckpointId() |
protected ActorGateway |
getJobStatusListener() |
int |
getNumberOfPendingCheckpoints() |
int |
getNumberOfRetainedSuccessfulCheckpoints() |
Map<Long,PendingCheckpoint> |
getPendingCheckpoints() |
List<CompletedCheckpoint> |
getSuccessfulCheckpoints() |
boolean |
isShutdown() |
protected void |
onCancelCheckpoint(long canceledCheckpointId)
Callback on cancellation of a checkpoint.
|
protected void |
onFullyAcknowledgedCheckpoint(CompletedCheckpoint checkpoint)
Callback on full acknowledgement of a checkpoint.
|
protected void |
onShutdown()
Callback on shutdown of the coordinator.
|
boolean |
receiveAcknowledgeMessage(AcknowledgeCheckpoint message)
Receives an AcknowledgeCheckpoint message and returns whether the
message was associated with a pending checkpoint.
|
boolean |
receiveDeclineMessage(DeclineCheckpoint message)
Receives a
DeclineCheckpoint message and returns whether the
message was associated with a pending checkpoint. |
boolean |
restoreLatestCheckpointedState(Map<JobVertexID,ExecutionJobVertex> tasks,
boolean errorIfNoCheckpoint,
boolean allOrNothingState) |
protected void |
setJobStatusListener(ActorGateway jobStatusListener) |
void |
shutdown()
Shuts down the checkpoint coordinator.
|
void |
startCheckpointScheduler() |
void |
stopCheckpointScheduler() |
void |
suspend()
Suspends the checkpoint coordinator.
|
boolean |
triggerCheckpoint(long timestamp)
Triggers a new checkpoint and uses the given timestamp as the checkpoint
timestamp.
|
boolean |
triggerCheckpoint(long timestamp,
long nextCheckpointId)
Triggers a new checkpoint and uses the given timestamp as the checkpoint
timestamp.
|
protected final Object lock
protected final CheckpointIDCounter checkpointIdCounter
protected final int numberKeyGroups
public CheckpointCoordinator(org.apache.flink.api.common.JobID job,
long baseInterval,
long checkpointTimeout,
int numberKeyGroups,
ExecutionVertex[] tasksToTrigger,
ExecutionVertex[] tasksToWaitFor,
ExecutionVertex[] tasksToCommitTo,
ClassLoader userClassLoader,
CheckpointIDCounter checkpointIDCounter,
CompletedCheckpointStore completedCheckpointStore,
RecoveryMode recoveryMode)
throws Exception
Exceptionpublic CheckpointCoordinator(org.apache.flink.api.common.JobID job,
long baseInterval,
long checkpointTimeout,
long minPauseBetweenCheckpoints,
int maxConcurrentCheckpointAttempts,
int numberKeyGroups,
ExecutionVertex[] tasksToTrigger,
ExecutionVertex[] tasksToWaitFor,
ExecutionVertex[] tasksToCommitTo,
ClassLoader userClassLoader,
CheckpointIDCounter checkpointIDCounter,
CompletedCheckpointStore completedCheckpointStore,
RecoveryMode recoveryMode,
CheckpointStatsTracker statsTracker)
throws Exception
Exceptionprotected void onShutdown()
protected void onCancelCheckpoint(long canceledCheckpointId)
protected void onFullyAcknowledgedCheckpoint(CompletedCheckpoint checkpoint)
public void shutdown()
throws Exception
After this method has been called, the coordinator does not accept and further messages and cannot trigger any further checkpoints. All checkpoint state is discarded.
Exceptionpublic void suspend()
throws Exception
After this method has been called, the coordinator does not accept and further messages and cannot trigger any further checkpoints.
The difference to shutdown is that checkpoint state in the store and counter is kept around if possible to recover later.
Exceptionpublic boolean isShutdown()
public boolean triggerCheckpoint(long timestamp)
throws Exception
timestamp - The timestamp for the checkpoint.Exceptionpublic boolean triggerCheckpoint(long timestamp,
long nextCheckpointId)
throws Exception
timestamp - The timestamp for the checkpoint.nextCheckpointId - The checkpoint ID to use for this checkpoint or -1 if
the checkpoint ID counter should be queried.Exceptionpublic boolean receiveDeclineMessage(DeclineCheckpoint message) throws Exception
DeclineCheckpoint message and returns whether the
message was associated with a pending checkpoint.message - Checkpoint decline from the task managerExceptionpublic boolean receiveAcknowledgeMessage(AcknowledgeCheckpoint message) throws Exception
message - Checkpoint ack from the task managerException - If the checkpoint cannot be added to the completed checkpoint store.public boolean restoreLatestCheckpointedState(Map<JobVertexID,ExecutionJobVertex> tasks, boolean errorIfNoCheckpoint, boolean allOrNothingState) throws Exception
Exceptionprotected List<Set<Integer>> createKeyGroupPartitions(int numberKeyGroups, int parallelism)
numberKeyGroups - Number of available key groups (indexed from 0 to numberKeyGroups - 1)parallelism - Parallelism to generate the key group partitioning forpublic int getNumberOfPendingCheckpoints()
public int getNumberOfRetainedSuccessfulCheckpoints()
public Map<Long,PendingCheckpoint> getPendingCheckpoints()
public List<CompletedCheckpoint> getSuccessfulCheckpoints() throws Exception
Exceptionprotected long getAndIncrementCheckpointId()
protected ActorGateway getJobStatusListener()
protected void setJobStatusListener(ActorGateway jobStatusListener)
public ActorGateway createActivatorDeactivator(akka.actor.ActorSystem actorSystem, UUID leaderSessionID)
Copyright © 2014–2016 The Apache Software Foundation. All rights reserved.