public class WatermarkManager extends Object
PCollections and input and output watermarks of
AppliedPTransforms to provide event-time and completion tracking for
in-memory execution. WatermarkManager is designed to update and return a
consistent view of watermarks in the presence of concurrent updates.
An WatermarkManager is provided with the collection of root
AppliedPTransforms and a map of PCollections to
all the AppliedPTransforms that consume them at construction time.
Whenever a root transform produces elements, the
WatermarkManager is provided with the produced elements and the output watermark
of the producing transform. The
watermark manager is responsible for computing the watermarks
of all transforms that consume one or more
PCollections.
Whenever a non-root AppliedPTransform finishes processing one or more in-flight
elements (referred to as the input bundle), the following occurs
atomically:
AppliedPTransform.AppliedPTransform are added to the collection
of pending elements for each AppliedPTransform that consumes them.AppliedPTransform becomes the maximum value of
PCollection watermarksAppliedPTransform becomes the maximum of
PCollection can be advanced to the output watermark of
the AppliedPTransformAppliedPTransforms can be
advanced.The watermark of a PCollection is equal to the output watermark of the
AppliedPTransform that produces it.
The watermarks for a PTransform are updated as follows when output is committed:
Watermark_In' = MAX(Watermark_In, MIN(U(TS_Pending), U(Watermark_InputPCollection))) Watermark_Out' = MAX(Watermark_Out, MIN(Watermark_In', U(StateHold))) Watermark_PCollection = Watermark_Out_ProducingPTransform
| Modifier and Type | Class and Description |
|---|---|
static class |
WatermarkManager.FiredTimers
A pair of
TimerInternals.TimerData and key which can be delivered to the appropriate
AppliedPTransform. |
static class |
WatermarkManager.TimerUpdate
A collection of newly set, deleted, and completed timers.
|
class |
WatermarkManager.TransformWatermarks
A reference to the input and output watermarks of an
AppliedPTransform. |
| Modifier and Type | Method and Description |
|---|---|
static WatermarkManager |
create(Clock clock,
org.apache.beam.runners.direct.DirectGraph graph)
Creates a new
WatermarkManager. |
Collection<WatermarkManager.FiredTimers> |
extractFiredTimers()
Returns a map of each
PTransform that has pending timers to those timers. |
Set<AppliedPTransform<?,?,?>> |
getCompletedTransforms() |
WatermarkManager.TransformWatermarks |
getWatermarks(AppliedPTransform<?,?,?> transform)
Gets the input and output watermarks for an
AppliedPTransform. |
void |
initialize(Map<AppliedPTransform<?,?,?>,? extends Iterable<org.apache.beam.runners.direct.DirectRunner.CommittedBundle<?>>> initialBundles) |
void |
updateWatermarks(org.apache.beam.runners.direct.DirectRunner.CommittedBundle<?> completed,
WatermarkManager.TimerUpdate timerUpdate,
org.apache.beam.runners.direct.CommittedResult result,
org.joda.time.Instant earliestHold)
Updates the watermarks of a transform with one or more inputs.
|
public static WatermarkManager create(Clock clock, org.apache.beam.runners.direct.DirectGraph graph)
WatermarkManager. All watermarks within the newly created WatermarkManager start at BoundedWindow.TIMESTAMP_MIN_VALUE, the minimum watermark,
with no watermark holds or pending elements.clock - the clock to use to determine processing timegraph - the graph representing this pipelinepublic WatermarkManager.TransformWatermarks getWatermarks(AppliedPTransform<?,?,?> transform)
AppliedPTransform. If the
PTransform has not processed any elements, return a watermark of
BoundedWindow.TIMESTAMP_MIN_VALUE.public void initialize(Map<AppliedPTransform<?,?,?>,? extends Iterable<org.apache.beam.runners.direct.DirectRunner.CommittedBundle<?>>> initialBundles)
public void updateWatermarks(@Nullable org.apache.beam.runners.direct.DirectRunner.CommittedBundle<?> completed, WatermarkManager.TimerUpdate timerUpdate, org.apache.beam.runners.direct.CommittedResult result, org.joda.time.Instant earliestHold)
Each transform has two monotonically increasing watermarks: the input watermark, which can, at any time, be updated to equal:
MAX(CurrentInputWatermark, MIN(PendingElements, InputPCollectionWatermarks))and the output watermark, which can, at any time, be updated to equal:
MAX(CurrentOutputWatermark, MIN(InputWatermark, WatermarkHolds)).
completed - the input that has completedtimerUpdate - the timers that were added, removed, and completed as part of producing
this updateresult - the result that was produced by processing the inputearliestHold - the earliest watermark hold in the transform's state. null if there
is no holdpublic Collection<WatermarkManager.FiredTimers> extractFiredTimers()
PTransform that has pending timers to those timers. All of the
pending timers will be removed from this WatermarkManager.public Set<AppliedPTransform<?,?,?>> getCompletedTransforms()
Copyright © 2016–2017 The Apache Software Foundation. All rights reserved.