public abstract class AbstractMapFileWriter<T> extends Object
MapFileRecordWriter and
MapFileSequenceRecordWriter.| Modifier and Type | Field and Description |
|---|---|
protected WritableType |
convertTextTo |
protected AtomicLong |
counter |
static String |
DEFAULT_FILENAME_PATTERN |
static int |
DEFAULT_INDEX_INTERVAL |
static int |
DEFAULT_MAP_FILE_SPLIT_SIZE |
protected String |
filenamePattern |
protected org.apache.hadoop.conf.Configuration |
hadoopConfiguration |
protected int |
indexInterval |
protected AtomicBoolean |
isClosed |
static Class<? extends org.apache.hadoop.io.WritableComparable> |
KEY_CLASS |
static String |
MAP_FILE_INDEX_INTERVAL_KEY
Configuration key for the map file interval.
|
protected int |
mapFileSplitSize |
protected org.apache.hadoop.io.SequenceFile.Writer.Option[] |
opts |
protected File |
outputDir |
protected List<File> |
outputFiles |
protected List<org.apache.hadoop.io.MapFile.Writer> |
writers |
| Constructor and Description |
|---|
AbstractMapFileWriter(File outputDir)
Constructor for all default values.
|
AbstractMapFileWriter(File outputDir,
int mapFileSplitSize)
Constructor for most default values.
|
AbstractMapFileWriter(File outputDir,
int mapFileSplitSize,
WritableType convertTextTo) |
AbstractMapFileWriter(File outputDir,
int mapFileSplitSize,
WritableType convertTextTo,
int indexInterval,
org.apache.hadoop.conf.Configuration hadoopConfiguration) |
AbstractMapFileWriter(File outputDir,
int mapFileSplitSize,
WritableType convertTextTo,
int indexInterval,
String filenamePattern,
org.apache.hadoop.conf.Configuration hadoopConfiguration) |
AbstractMapFileWriter(File outputDir,
WritableType convertTextTo) |
| Modifier and Type | Method and Description |
|---|---|
void |
close() |
protected List<Writable> |
convertTextWritables(List<Writable> record) |
Configuration |
getConf() |
protected abstract org.apache.hadoop.io.Writable |
getHadoopWritable(T input) |
protected abstract Class<? extends org.apache.hadoop.io.Writable> |
getValueClass() |
void |
setConf(Configuration conf) |
void |
write(T record) |
public static final String DEFAULT_FILENAME_PATTERN
public static final Class<? extends org.apache.hadoop.io.WritableComparable> KEY_CLASS
public static final String MAP_FILE_INDEX_INTERVAL_KEY
public static final int DEFAULT_MAP_FILE_SPLIT_SIZE
public static final int DEFAULT_INDEX_INTERVAL
protected final File outputDir
protected final int mapFileSplitSize
protected final WritableType convertTextTo
protected final int indexInterval
protected final String filenamePattern
protected org.apache.hadoop.conf.Configuration hadoopConfiguration
protected final AtomicLong counter
protected final AtomicBoolean isClosed
protected List<org.apache.hadoop.io.MapFile.Writer> writers
protected org.apache.hadoop.io.SequenceFile.Writer.Option[] opts
public AbstractMapFileWriter(File outputDir)
outputDir - Output directory for the map file(s)public AbstractMapFileWriter(@NonNull
File outputDir,
int mapFileSplitSize)
outputDir - Output directory for the map file(s)mapFileSplitSize - Split size for the map file: if 0, use a single map file for all output. If > 0,
multiple map files will be used: each will contain a maximum of mapFileSplitSize.
This can be used to avoid having a single multi gigabyte map file, which may be
undesirable in some cases (transfer across the network, for example)public AbstractMapFileWriter(@NonNull
File outputDir,
WritableType convertTextTo)
outputDir - Output directory for the map file(s)convertTextTo - If null: Make no changes to Text writable objects. If non-null, Text writable instances
will be converted to this type. This is useful, when would rather store numerical values
even if the original record reader produces strings/text.public AbstractMapFileWriter(@NonNull
File outputDir,
int mapFileSplitSize,
WritableType convertTextTo)
outputDir - Output directory for the map file(s)mapFileSplitSize - Split size for the map file: if 0, use a single map file for all output. If > 0,
multiple map files will be used: each will contain a maximum of mapFileSplitSize.
This can be used to avoid having a single multi gigabyte map file, which may be
undesirable in some cases (transfer across the network, for example)convertTextTo - If null: Make no changes to Text writable objects. If non-null, Text writable instances
will be converted to this type. This is useful, when would rather store numerical values
even if the original record reader produces strings/text.public AbstractMapFileWriter(@NonNull
File outputDir,
int mapFileSplitSize,
WritableType convertTextTo,
int indexInterval,
org.apache.hadoop.conf.Configuration hadoopConfiguration)
outputDir - Output directory for the map file(s)mapFileSplitSize - Split size for the map file: if 0, use a single map file for all output. If > 0,
multiple map files will be used: each will contain a maximum of mapFileSplitSize.
This can be used to avoid having a single multi gigabyte map file, which may be
undesirable in some cases (transfer across the network, for example)convertTextTo - If null: Make no changes to Text writable objects. If non-null, Text writable instances
will be converted to this type. This is useful, when would rather store numerical values
even if the original record reader produces strings/text.indexInterval - Index interval for the Map file. Defaults to 1, which is suitable for most caseshadoopConfiguration - Hadoop configuration.public AbstractMapFileWriter(@NonNull
File outputDir,
int mapFileSplitSize,
WritableType convertTextTo,
int indexInterval,
String filenamePattern,
org.apache.hadoop.conf.Configuration hadoopConfiguration)
outputDir - Output directory for the map file(s)mapFileSplitSize - Split size for the map file: if 0, use a single map file for all output. If > 0,
multiple map files will be used: each will contain a maximum of mapFileSplitSize.
This can be used to avoid having a single multi gigabyte map file, which may be
undesirable in some cases (transfer across the network, for example)convertTextTo - If null: Make no changes to Text writable objects. If non-null, Text writable instances
will be converted to this type. This is useful, when would rather store numerical values
even if the original record reader produces strings/text.indexInterval - Index interval for the Map file. Defaults to 1, which is suitable for most casesfilenamePattern - The naming pattern for the map files. Used with String.format(pattern, int)hadoopConfiguration - Hadoop configuration.protected abstract Class<? extends org.apache.hadoop.io.Writable> getValueClass()
public void setConf(Configuration conf)
public Configuration getConf()
protected abstract org.apache.hadoop.io.Writable getHadoopWritable(T input)
public void write(T record) throws IOException
IOExceptionpublic void close()
Copyright © 2017. All rights reserved.