com.amazonaws.internal.SdkInternalList<T> tags
The key-value pairs to use to create tags. If you specify a key without specifying a value, Amazon ML creates a tag with the specified key and a value of null.
String resourceId
The ID of the ML object to tag. For example, exampleModelId.
String resourceType
The type of the ML object to tag.
String batchPredictionId
The ID assigned to the BatchPrediction at creation. This
value should be identical to the value of the
BatchPredictionID in the request.
String mLModelId
The ID of the MLModel that generated predictions for the
BatchPrediction request.
String batchPredictionDataSourceId
The ID of the DataSource that points to the group of
observations to predict.
String inputDataLocationS3
The location of the data file or directory in Amazon Simple Storage Service (Amazon S3).
String createdByIamUser
The AWS user account that invoked the BatchPrediction. The
account type can be either an AWS root account or an AWS Identity and
Access Management (IAM) user account.
Date createdAt
The time that the BatchPrediction was created. The time is
expressed in epoch time.
Date lastUpdatedAt
The time of the most recent edit to the BatchPrediction. The
time is expressed in epoch time.
String name
A user-supplied name or description of the BatchPrediction.
String status
The status of the BatchPrediction. This element can have one
of the following values:
PENDING - Amazon Machine Learning (Amazon ML) submitted
a request to generate predictions for a batch of observations.INPROGRESS - The process is underway.FAILED - The request to perform a batch prediction did
not run to completion. It is not usable.COMPLETED - The batch prediction process completed
successfully.DELETED - The BatchPrediction is marked as
deleted. It is not usable.String outputUri
The location of an Amazon S3 bucket or directory to receive the operation
results. The following substrings are not allowed in the
s3 key portion of the outputURI field: ':',
'//', '/./', '/../'.
String message
A description of the most recent details about processing the batch prediction request.
String batchPredictionId
A user-supplied ID that uniquely identifies the
BatchPrediction.
String batchPredictionName
A user-supplied name or description of the BatchPrediction.
BatchPredictionName can only use the UTF-8 character set.
String mLModelId
The ID of the MLModel that will generate predictions for the
group of observations.
String batchPredictionDataSourceId
The ID of the DataSource that points to the group of
observations to predict.
String outputUri
The location of an Amazon Simple Storage Service (Amazon S3) bucket or
directory to store the batch prediction results. The following substrings
are not allowed in the s3 key portion of the
outputURI field: ':', '//', '/./', '/../'.
Amazon ML needs permissions to store and retrieve the logs on your behalf. For information about how to set permissions, see the Amazon Machine Learning Developer Guide.
String batchPredictionId
A user-supplied ID that uniquely identifies the
BatchPrediction. This value is identical to the value of the
BatchPredictionId in the request.
String dataSourceId
A user-supplied ID that uniquely identifies the DataSource.
Typically, an Amazon Resource Number (ARN) becomes the ID for a
DataSource.
String dataSourceName
A user-supplied name or description of the DataSource.
RDSDataSpec rDSData
The data specification of an Amazon RDS DataSource:
DatabaseInformation -
DatabaseName - The name of the Amazon RDS database.InstanceIdentifier - A unique identifier for the Amazon
RDS database instance.DatabaseCredentials - AWS Identity and Access Management (IAM) credentials that are used to connect to the Amazon RDS database.
ResourceRole - A role (DataPipelineDefaultResourceRole) assumed by an EC2 instance to carry out the copy task from Amazon RDS to Amazon Simple Storage Service (Amazon S3). For more information, see Role templates for data pipelines.
ServiceRole - A role (DataPipelineDefaultRole) assumed by the AWS Data Pipeline service to monitor the progress of the copy task from Amazon RDS to Amazon S3. For more information, see Role templates for data pipelines.
SecurityInfo - The security information to use to access an RDS DB
instance. You need to set up appropriate ingress rules for the security
entity IDs provided to allow access to the Amazon RDS instance. Specify a
[SubnetId, SecurityGroupIds] pair for a
VPC-based RDS DB instance.
SelectSqlQuery - A query that is used to retrieve the observation data
for the Datasource.
S3StagingLocation - The Amazon S3 location for staging Amazon RDS data.
The data retrieved from Amazon RDS using SelectSqlQuery is
stored in this location.
DataSchemaUri - The Amazon S3 location of the DataSchema.
DataSchema - A JSON string representing the schema. This is not required
if DataSchemaUri is specified.
DataRearrangement - A JSON string that represents the splitting and
rearrangement requirements for the Datasource.
Sample -
"{\"splitting\":{\"percentBegin\":10,\"percentEnd\":60}}"
String roleARN
The role that Amazon ML assumes on behalf of the user to create and
activate a data pipeline in the user's account and copy data using the
SelectSqlQuery query from Amazon RDS to Amazon S3.
Boolean computeStatistics
The compute statistics for a DataSource. The statistics are
generated from the observation data referenced by a
DataSource. Amazon ML uses the statistics internally during
MLModel training. This parameter must be set to
true if the DataSource needs to be
used for MLModel training.
String dataSourceId
A user-supplied ID that uniquely identifies the datasource. This value
should be identical to the value of the DataSourceID in the
request.
String dataSourceId
A user-supplied ID that uniquely identifies the DataSource.
String dataSourceName
A user-supplied name or description of the DataSource.
RedshiftDataSpec dataSpec
The data specification of an Amazon Redshift DataSource:
DatabaseInformation -
DatabaseName - The name of the Amazon Redshift database.
ClusterIdentifier - The unique ID for the Amazon
Redshift cluster.DatabaseCredentials - The AWS Identity and Access Management (IAM) credentials that are used to connect to the Amazon Redshift database.
SelectSqlQuery - The query that is used to retrieve the observation data
for the Datasource.
S3StagingLocation - The Amazon Simple Storage Service (Amazon S3)
location for staging Amazon Redshift data. The data retrieved from Amazon
Redshift using the SelectSqlQuery query is stored in this
location.
DataSchemaUri - The Amazon S3 location of the DataSchema.
DataSchema - A JSON string representing the schema. This is not required
if DataSchemaUri is specified.
DataRearrangement - A JSON string that represents the splitting and
rearrangement requirements for the DataSource.
Sample -
"{\"splitting\":{\"percentBegin\":10,\"percentEnd\":60}}"
String roleARN
A fully specified role Amazon Resource Name (ARN). Amazon ML assumes the role on behalf of the user to create the following:
A security group to allow Amazon ML to execute the
SelectSqlQuery query on an Amazon Redshift cluster
An Amazon S3 bucket policy to grant Amazon ML read/write permissions on
the S3StagingLocation
Boolean computeStatistics
The compute statistics for a DataSource. The statistics are
generated from the observation data referenced by a
DataSource. Amazon ML uses the statistics internally during
MLModel training. This parameter must be set to
true if the DataSource needs to be used for
MLModel training.
String dataSourceId
A user-supplied ID that uniquely identifies the datasource. This value
should be identical to the value of the DataSourceID in the
request.
String dataSourceId
A user-supplied identifier that uniquely identifies the
DataSource.
String dataSourceName
A user-supplied name or description of the DataSource.
S3DataSpec dataSpec
The data specification of a DataSource:
DataLocationS3 - The Amazon S3 location of the observation data.
DataSchemaLocationS3 - The Amazon S3 location of the
DataSchema.
DataSchema - A JSON string representing the schema. This is not required
if DataSchemaUri is specified.
DataRearrangement - A JSON string that represents the splitting and
rearrangement requirements for the Datasource.
Sample -
"{\"splitting\":{\"percentBegin\":10,\"percentEnd\":60}}"
Boolean computeStatistics
The compute statistics for a DataSource. The statistics are
generated from the observation data referenced by a
DataSource. Amazon ML uses the statistics internally during
MLModel training. This parameter must be set to
true if the DataSource needs to be
used for MLModel training.
String dataSourceId
A user-supplied ID that uniquely identifies the DataSource.
This value should be identical to the value of the
DataSourceID in the request.
String evaluationId
A user-supplied ID that uniquely identifies the Evaluation.
String evaluationName
A user-supplied name or description of the Evaluation.
String mLModelId
The ID of the MLModel to evaluate.
The schema used in creating the MLModel must match the
schema of the DataSource used in the Evaluation
.
String evaluationDataSourceId
The ID of the DataSource for the evaluation. The schema of
the DataSource must match the schema used to create the
MLModel.
String evaluationId
The user-supplied ID that uniquely identifies the Evaluation
. This value should be identical to the value of the
EvaluationId in the request.
String mLModelId
A user-supplied ID that uniquely identifies the MLModel.
String mLModelName
A user-supplied name or description of the MLModel.
String mLModelType
The category of supervised learning that this MLModel will
address. Choose from the following types:
REGRESSION if the MLModel will be
used to predict a numeric value.BINARY if the MLModel result has two
possible values.MULTICLASS if the MLModel result has
a limited number of values.For more information, see the Amazon Machine Learning Developer Guide.
com.amazonaws.internal.SdkInternalMap<K,V> parameters
A list of the training parameters in the MLModel. The list
is implemented as a map of key-value pairs.
The following is the current set of training parameters:
sgd.maxMLModelSizeInBytes - The maximum allowed size of the
model. Depending on the input data, the size of the model might affect
its performance.
The value is an integer that ranges from 100000 to
2147483648. The default value is 33554432.
sgd.maxPasses - The number of times that the training
process traverses the observations to build the MLModel. The
value is an integer that ranges from 1 to 10000
. The default value is 10.
sgd.shuffleType - Whether Amazon ML shuffles the training
data. Shuffling the data improves a model's ability to find the optimal
solution for a variety of data types. The valid values are
auto and none. The default value is
none. We strongly recommend that you shuffle your
data.
sgd.l1RegularizationAmount - The coefficient regularization
L1 norm. It controls overfitting the data by penalizing large
coefficients. This tends to drive coefficients to zero, resulting in a
sparse feature set. If you use this parameter, start by specifying a
small value, such as 1.0E-08.
The value is a double that ranges from 0 to
MAX_DOUBLE. The default is to not use L1 normalization. This
parameter can't be used when L2 is specified. Use this
parameter sparingly.
sgd.l2RegularizationAmount - The coefficient regularization
L2 norm. It controls overfitting the data by penalizing large
coefficients. This tends to drive coefficients to small, nonzero values.
If you use this parameter, start by specifying a small value, such as
1.0E-08.
The value is a double that ranges from 0 to
MAX_DOUBLE. The default is to not use L2 normalization. This
parameter can't be used when L1 is specified. Use this
parameter sparingly.
String trainingDataSourceId
The DataSource that points to the training data.
String recipe
The data recipe for creating the MLModel. You must specify
either the recipe or its URI. If you don't specify a recipe or its URI,
Amazon ML creates a default.
String recipeUri
The Amazon Simple Storage Service (Amazon S3) location and file name that
contains the MLModel recipe. You must specify either the
recipe or its URI. If you don't specify a recipe or its URI, Amazon ML
creates a default.
String mLModelId
A user-supplied ID that uniquely identifies the MLModel.
This value should be identical to the value of the MLModelId
in the request.
String mLModelId
The ID assigned to the MLModel during creation.
String mLModelId
A user-supplied ID that uniquely identifies the MLModel.
This value should be identical to the value of the MLModelId
in the request.
RealtimeEndpointInfo realtimeEndpointInfo
The endpoint information of the MLModel
String dataSourceId
The ID that is assigned to the DataSource during creation.
String dataLocationS3
The location and name of the data in Amazon Simple Storage Service
(Amazon S3) that is used by a DataSource.
String dataRearrangement
A JSON string that represents the splitting and rearrangement requirement
used when this DataSource was created.
String createdByIamUser
The AWS user account from which the DataSource was created.
The account type can be either an AWS root account or an AWS Identity and
Access Management (IAM) user account.
Date createdAt
The time that the DataSource was created. The time is
expressed in epoch time.
Date lastUpdatedAt
The time of the most recent edit to the BatchPrediction. The
time is expressed in epoch time.
Long dataSizeInBytes
The total number of observations contained in the data files that the
DataSource references.
Long numberOfFiles
The number of data files referenced by the DataSource.
String name
A user-supplied name or description of the DataSource.
String status
The current status of the DataSource. This element can have
one of the following values:
DataSource.DataSource did not run
to completion. It is not usable.DataSource is marked as deleted. It is not
usable.String message
A description of the most recent details about creating the
DataSource.
RedshiftMetadata redshiftMetadata
RDSMetadata rDSMetadata
String roleARN
Boolean computeStatistics
The parameter is true if statistics need to be generated
from the observation data.
String batchPredictionId
A user-supplied ID that uniquely identifies the
BatchPrediction.
String batchPredictionId
A user-supplied ID that uniquely identifies the
BatchPrediction. This value should be identical to the value
of the BatchPredictionID in the request.
String dataSourceId
A user-supplied ID that uniquely identifies the DataSource.
String dataSourceId
A user-supplied ID that uniquely identifies the DataSource.
This value should be identical to the value of the
DataSourceID in the request.
String evaluationId
A user-supplied ID that uniquely identifies the Evaluation
to delete.
String evaluationId
A user-supplied ID that uniquely identifies the Evaluation.
This value should be identical to the value of the
EvaluationId in the request.
String mLModelId
A user-supplied ID that uniquely identifies the MLModel.
String mLModelId
A user-supplied ID that uniquely identifies the MLModel.
This value should be identical to the value of the MLModelID
in the request.
String mLModelId
The ID assigned to the MLModel during creation.
String mLModelId
A user-supplied ID that uniquely identifies the MLModel.
This value should be identical to the value of the MLModelId
in the request.
RealtimeEndpointInfo realtimeEndpointInfo
The endpoint information of the MLModel
com.amazonaws.internal.SdkInternalList<T> tagKeys
One or more tags to delete.
String resourceId
The ID of the tagged ML object. For example, exampleModelId.
String resourceType
The type of the tagged ML object.
String filterVariable
Use one of the following variables to filter a list of
BatchPrediction:
CreatedAt - Sets the search criteria to the
BatchPrediction creation date.Status - Sets the search criteria to the
BatchPrediction status.Name - Sets the search criteria to the contents of the
BatchPrediction Name.IAMUser - Sets the search criteria to the user account
that invoked the BatchPrediction creation.MLModelId - Sets the search criteria to the
MLModel used in the BatchPrediction.DataSourceId - Sets the search criteria to the
DataSource used in the BatchPrediction.DataURI - Sets the search criteria to the data file(s)
used in the BatchPrediction. The URL can identify either a
file or an Amazon Simple Storage Solution (Amazon S3) bucket or
directory.String eQ
The equal to operator. The BatchPrediction results will have
FilterVariable values that exactly match the value specified
with EQ.
String gT
The greater than operator. The BatchPrediction results will
have FilterVariable values that are greater than the value
specified with GT.
String lT
The less than operator. The BatchPrediction results will
have FilterVariable values that are less than the value
specified with LT.
String gE
The greater than or equal to operator. The BatchPrediction
results will have FilterVariable values that are greater
than or equal to the value specified with GE.
String lE
The less than or equal to operator. The BatchPrediction
results will have FilterVariable values that are less than
or equal to the value specified with LE.
String nE
The not equal to operator. The BatchPrediction results will
have FilterVariable values not equal to the value specified
with NE.
String prefix
A string that is found at the beginning of a variable, such as
Name or Id.
For example, a Batch Prediction operation could have the
Name 2014-09-09-HolidayGiftMailer. To search
for this BatchPrediction, select Name for the
FilterVariable and any of the following strings for the
Prefix:
2014-09
2014-09-09
2014-09-09-Holiday
String sortOrder
A two-value parameter that determines the sequence of the resulting list
of MLModels.
asc - Arranges the list in ascending order (A-Z, 0-9).dsc - Arranges the list in descending order (Z-A, 9-0).
Results are sorted by FilterVariable.
String nextToken
An ID of the page in the paginated results.
Integer limit
The number of pages of information to include in the result. The range of
acceptable values is 1 through 100. The default
value is 100.
com.amazonaws.internal.SdkInternalList<T> results
A list of BatchPrediction objects that meet the search
criteria.
String nextToken
The ID of the next page in the paginated results that indicates at least one more page follows.
String filterVariable
Use one of the following variables to filter a list of
DataSource:
CreatedAt - Sets the search criteria to
DataSource creation dates.Status - Sets the search criteria to
DataSource statuses.Name - Sets the search criteria to the contents of
DataSource Name.DataUri - Sets the search criteria to the URI of data
files used to create the DataSource. The URI can identify
either a file or an Amazon Simple Storage Service (Amazon S3) bucket or
directory.IAMUser - Sets the search criteria to the user account
that invoked the DataSource creation.String eQ
The equal to operator. The DataSource results will have
FilterVariable values that exactly match the value specified
with EQ.
String gT
The greater than operator. The DataSource results will have
FilterVariable values that are greater than the value
specified with GT.
String lT
The less than operator. The DataSource results will have
FilterVariable values that are less than the value specified
with LT.
String gE
The greater than or equal to operator. The DataSource
results will have FilterVariable values that are greater
than or equal to the value specified with GE.
String lE
The less than or equal to operator. The DataSource results
will have FilterVariable values that are less than or equal
to the value specified with LE.
String nE
The not equal to operator. The DataSource results will have
FilterVariable values not equal to the value specified with
NE.
String prefix
A string that is found at the beginning of a variable, such as
Name or Id.
For example, a DataSource could have the Name
2014-09-09-HolidayGiftMailer. To search for this
DataSource, select Name for the
FilterVariable and any of the following strings for the
Prefix:
2014-09
2014-09-09
2014-09-09-Holiday
String sortOrder
A two-value parameter that determines the sequence of the resulting list
of DataSource.
asc - Arranges the list in ascending order (A-Z, 0-9).dsc - Arranges the list in descending order (Z-A, 9-0).
Results are sorted by FilterVariable.
String nextToken
The ID of the page in the paginated results.
Integer limit
The maximum number of DataSource to include in the result.
com.amazonaws.internal.SdkInternalList<T> results
A list of DataSource that meet the search criteria.
String nextToken
An ID of the next page in the paginated results that indicates at least one more page follows.
String filterVariable
Use one of the following variable to filter a list of
Evaluation objects:
CreatedAt - Sets the search criteria to the
Evaluation creation date.Status - Sets the search criteria to the
Evaluation status.Name - Sets the search criteria to the contents of
Evaluation Name.IAMUser - Sets the search criteria to the user account
that invoked an Evaluation.MLModelId - Sets the search criteria to the
MLModel that was evaluated.DataSourceId - Sets the search criteria to the
DataSource used in Evaluation.DataUri - Sets the search criteria to the data file(s)
used in Evaluation. The URL can identify either a file or an
Amazon Simple Storage Solution (Amazon S3) bucket or directory.String eQ
The equal to operator. The Evaluation results will have
FilterVariable values that exactly match the value specified
with EQ.
String gT
The greater than operator. The Evaluation results will have
FilterVariable values that are greater than the value
specified with GT.
String lT
The less than operator. The Evaluation results will have
FilterVariable values that are less than the value specified
with LT.
String gE
The greater than or equal to operator. The Evaluation
results will have FilterVariable values that are greater
than or equal to the value specified with GE.
String lE
The less than or equal to operator. The Evaluation results
will have FilterVariable values that are less than or equal
to the value specified with LE.
String nE
The not equal to operator. The Evaluation results will have
FilterVariable values not equal to the value specified with
NE.
String prefix
A string that is found at the beginning of a variable, such as
Name or Id.
For example, an Evaluation could have the Name
2014-09-09-HolidayGiftMailer. To search for this
Evaluation, select Name for the
FilterVariable and any of the following strings for the
Prefix:
2014-09
2014-09-09
2014-09-09-Holiday
String sortOrder
A two-value parameter that determines the sequence of the resulting list
of Evaluation.
asc - Arranges the list in ascending order (A-Z, 0-9).dsc - Arranges the list in descending order (Z-A, 9-0).
Results are sorted by FilterVariable.
String nextToken
The ID of the page in the paginated results.
Integer limit
The maximum number of Evaluation to include in the result.
com.amazonaws.internal.SdkInternalList<T> results
A list of Evaluation that meet the search criteria.
String nextToken
The ID of the next page in the paginated results that indicates at least one more page follows.
String filterVariable
Use one of the following variables to filter a list of
MLModel:
CreatedAt - Sets the search criteria to
MLModel creation date.Status - Sets the search criteria to
MLModel status.Name - Sets the search criteria to the contents of
MLModel Name.IAMUser - Sets the search criteria to the user account
that invoked the MLModel creation.TrainingDataSourceId - Sets the search criteria to the
DataSource used to train one or more MLModel.RealtimeEndpointStatus - Sets the search criteria to the
MLModel real-time endpoint status.MLModelType - Sets the search criteria to
MLModel type: binary, regression, or multi-class.Algorithm - Sets the search criteria to the algorithm
that the MLModel uses.TrainingDataURI - Sets the search criteria to the data
file(s) used in training a MLModel. The URL can identify
either a file or an Amazon Simple Storage Service (Amazon S3) bucket or
directory.String eQ
The equal to operator. The MLModel results will have
FilterVariable values that exactly match the value specified
with EQ.
String gT
The greater than operator. The MLModel results will have
FilterVariable values that are greater than the value
specified with GT.
String lT
The less than operator. The MLModel results will have
FilterVariable values that are less than the value specified
with LT.
String gE
The greater than or equal to operator. The MLModel results
will have FilterVariable values that are greater than or
equal to the value specified with GE.
String lE
The less than or equal to operator. The MLModel results will
have FilterVariable values that are less than or equal to
the value specified with LE.
String nE
The not equal to operator. The MLModel results will have
FilterVariable values not equal to the value specified with
NE.
String prefix
A string that is found at the beginning of a variable, such as
Name or Id.
For example, an MLModel could have the Name
2014-09-09-HolidayGiftMailer. To search for this
MLModel, select Name for the
FilterVariable and any of the following strings for the
Prefix:
2014-09
2014-09-09
2014-09-09-Holiday
String sortOrder
A two-value parameter that determines the sequence of the resulting list
of MLModel.
asc - Arranges the list in ascending order (A-Z, 0-9).dsc - Arranges the list in descending order (Z-A, 9-0).
Results are sorted by FilterVariable.
String nextToken
The ID of the page in the paginated results.
Integer limit
The number of pages of information to include in the result. The range of
acceptable values is 1 through 100. The default
value is 100.
com.amazonaws.internal.SdkInternalList<T> results
A list of MLModel that meet the search criteria.
String nextToken
The ID of the next page in the paginated results that indicates at least one more page follows.
String resourceId
The ID of the tagged ML object.
String resourceType
The type of the tagged ML object.
com.amazonaws.internal.SdkInternalList<T> tags
A list of tags associated with the ML object.
String evaluationId
The ID that is assigned to the Evaluation at creation.
String mLModelId
The ID of the MLModel that is the focus of the evaluation.
String evaluationDataSourceId
The ID of the DataSource that is used to evaluate the
MLModel.
String inputDataLocationS3
The location and name of the data in Amazon Simple Storage Server (Amazon S3) that is used in the evaluation.
String createdByIamUser
The AWS user account that invoked the evaluation. The account type can be either an AWS root account or an AWS Identity and Access Management (IAM) user account.
Date createdAt
The time that the Evaluation was created. The time is
expressed in epoch time.
Date lastUpdatedAt
The time of the most recent edit to the Evaluation. The time
is expressed in epoch time.
String name
A user-supplied name or description of the Evaluation.
String status
The status of the evaluation. This element can have one of the following values:
PENDING - Amazon Machine Learning (Amazon ML) submitted
a request to evaluate an MLModel.INPROGRESS - The evaluation is underway.FAILED - The request to evaluate an MLModel
did not run to completion. It is not usable.COMPLETED - The evaluation process completed
successfully.DELETED - The Evaluation is marked as
deleted. It is not usable.PerformanceMetrics performanceMetrics
Measurements of how well the MLModel performed, using
observations referenced by the DataSource. One of the
following metrics is returned, based on the type of the
MLModel:
BinaryAUC: A binary MLModel uses the Area Under the Curve
(AUC) technique to measure performance.
RegressionRMSE: A regression MLModel uses the Root Mean
Square Error (RMSE) technique to measure performance. RMSE measures the
difference between predicted and actual values for a single variable.
MulticlassAvgFScore: A multiclass MLModel uses the F1 score
technique to measure performance.
For more information about performance metrics, please see the Amazon Machine Learning Developer Guide.
String message
A description of the most recent details about evaluating the
MLModel.
String batchPredictionId
An ID assigned to the BatchPrediction at creation.
String batchPredictionId
An ID assigned to the BatchPrediction at creation. This
value should be identical to the value of the
BatchPredictionID in the request.
String mLModelId
The ID of the MLModel that generated predictions for the
BatchPrediction request.
String batchPredictionDataSourceId
The ID of the DataSource that was used to create the
BatchPrediction.
String inputDataLocationS3
The location of the data file or directory in Amazon Simple Storage Service (Amazon S3).
String createdByIamUser
The AWS user account that invoked the BatchPrediction. The
account type can be either an AWS root account or an AWS Identity and
Access Management (IAM) user account.
Date createdAt
The time when the BatchPrediction was created. The time is
expressed in epoch time.
Date lastUpdatedAt
The time of the most recent edit to BatchPrediction. The
time is expressed in epoch time.
String name
A user-supplied name or description of the BatchPrediction.
String status
The status of the BatchPrediction, which can be one of the
following values:
PENDING - Amazon Machine Learning (Amazon ML) submitted
a request to generate batch predictions.INPROGRESS - The batch predictions are in progress.FAILED - The request to perform a batch prediction did
not run to completion. It is not usable.COMPLETED - The batch prediction process completed
successfully.DELETED - The BatchPrediction is marked as
deleted. It is not usable.String outputUri
The location of an Amazon S3 bucket or directory to receive the operation results.
String logUri
A link to the file that contains logs of the
CreateBatchPrediction operation.
String message
A description of the most recent details about processing the batch prediction request.
String dataSourceId
The ID assigned to the DataSource at creation. This value
should be identical to the value of the DataSourceId in the
request.
String dataLocationS3
The location of the data file or directory in Amazon Simple Storage Service (Amazon S3).
String dataRearrangement
A JSON string that represents the splitting and rearrangement requirement
used when this DataSource was created.
String createdByIamUser
The AWS user account from which the DataSource was created.
The account type can be either an AWS root account or an AWS Identity and
Access Management (IAM) user account.
Date createdAt
The time that the DataSource was created. The time is
expressed in epoch time.
Date lastUpdatedAt
The time of the most recent edit to the DataSource. The time
is expressed in epoch time.
Long dataSizeInBytes
The total size of observations in the data files.
Long numberOfFiles
The number of data files referenced by the DataSource.
String name
A user-supplied name or description of the DataSource.
String status
The current status of the DataSource. This element can have
one of the following values:
PENDING - Amazon ML submitted a request to create a
DataSource.INPROGRESS - The creation process is underway.FAILED - The request to create a DataSource
did not run to completion. It is not usable.COMPLETED - The creation process completed successfully.
DELETED - The DataSource is marked as
deleted. It is not usable.String logUri
A link to the file containing logs of CreateDataSourceFrom*
operations.
String message
The user-supplied description of the most recent details about creating
the DataSource.
RedshiftMetadata redshiftMetadata
RDSMetadata rDSMetadata
String roleARN
Boolean computeStatistics
The parameter is true if statistics need to be generated
from the observation data.
String dataSourceSchema
The schema used by all of the data files of this DataSource.
This parameter is provided as part of the verbose format.
String evaluationId
The ID of the Evaluation to retrieve. The evaluation of each
MLModel is recorded and cataloged. The ID provides the means
to access the information.
String evaluationId
The evaluation ID which is same as the EvaluationId in the
request.
String mLModelId
The ID of the MLModel that was the focus of the evaluation.
String evaluationDataSourceId
The DataSource used for this evaluation.
String inputDataLocationS3
The location of the data file or directory in Amazon Simple Storage Service (Amazon S3).
String createdByIamUser
The AWS user account that invoked the evaluation. The account type can be either an AWS root account or an AWS Identity and Access Management (IAM) user account.
Date createdAt
The time that the Evaluation was created. The time is
expressed in epoch time.
Date lastUpdatedAt
The time of the most recent edit to the BatchPrediction. The
time is expressed in epoch time.
String name
A user-supplied name or description of the Evaluation.
String status
The status of the evaluation. This element can have one of the following values:
PENDING - Amazon Machine Language (Amazon ML) submitted
a request to evaluate an MLModel.INPROGRESS - The evaluation is underway.FAILED - The request to evaluate an MLModel
did not run to completion. It is not usable.COMPLETED - The evaluation process completed
successfully.DELETED - The Evaluation is marked as
deleted. It is not usable.PerformanceMetrics performanceMetrics
Measurements of how well the MLModel performed using
observations referenced by the DataSource. One of the
following metric is returned based on the type of the
MLModel:
BinaryAUC: A binary MLModel uses the Area Under the Curve
(AUC) technique to measure performance.
RegressionRMSE: A regression MLModel uses the Root Mean
Square Error (RMSE) technique to measure performance. RMSE measures the
difference between predicted and actual values for a single variable.
MulticlassAvgFScore: A multiclass MLModel uses the F1 score
technique to measure performance.
For more information about performance metrics, please see the Amazon Machine Learning Developer Guide.
String logUri
A link to the file that contains logs of the
CreateEvaluation operation.
String message
A description of the most recent details about evaluating the
MLModel.
String mLModelId
The MLModel ID, which is same as the
MLModelId in the request.
String trainingDataSourceId
The ID of the training DataSource.
String createdByIamUser
The AWS user account from which the MLModel was created. The
account type can be either an AWS root account or an AWS Identity and
Access Management (IAM) user account.
Date createdAt
The time that the MLModel was created. The time is expressed
in epoch time.
Date lastUpdatedAt
The time of the most recent edit to the MLModel. The time is
expressed in epoch time.
String name
A user-supplied name or description of the MLModel.
String status
The current status of the MLModel. This element can have one
of the following values:
PENDING - Amazon Machine Learning (Amazon ML) submitted
a request to describe a MLModel.INPROGRESS - The request is processing.FAILED - The request did not run to completion. The ML
model isn't usable.COMPLETED - The request completed successfully.DELETED - The MLModel is marked as deleted.
It isn't usable.Long sizeInBytes
RealtimeEndpointInfo endpointInfo
The current endpoint of the MLModel
com.amazonaws.internal.SdkInternalMap<K,V> trainingParameters
A list of the training parameters in the MLModel. The list
is implemented as a map of key-value pairs.
The following is the current set of training parameters:
sgd.maxMLModelSizeInBytes - The maximum allowed size of the
model. Depending on the input data, the size of the model might affect
its performance.
The value is an integer that ranges from 100000 to
2147483648. The default value is 33554432.
sgd.maxPasses - The number of times that the training
process traverses the observations to build the MLModel. The
value is an integer that ranges from 1 to 10000
. The default value is 10.
sgd.shuffleType - Whether Amazon ML shuffles the training
data. Shuffling data improves a model's ability to find the optimal
solution for a variety of data types. The valid values are
auto and none. The default value is
none. We strongly recommend that you shuffle your data.
sgd.l1RegularizationAmount - The coefficient regularization
L1 norm. It controls overfitting the data by penalizing large
coefficients. This tends to drive coefficients to zero, resulting in a
sparse feature set. If you use this parameter, start by specifying a
small value, such as 1.0E-08.
The value is a double that ranges from 0 to
MAX_DOUBLE. The default is to not use L1 normalization. This
parameter can't be used when L2 is specified. Use this
parameter sparingly.
sgd.l2RegularizationAmount - The coefficient regularization
L2 norm. It controls overfitting the data by penalizing large
coefficients. This tends to drive coefficients to small, nonzero values.
If you use this parameter, start by specifying a small value, such as
1.0E-08.
The value is a double that ranges from 0 to
MAX_DOUBLE. The default is to not use L2 normalization. This
parameter can't be used when L1 is specified. Use this
parameter sparingly.
String inputDataLocationS3
The location of the data file or directory in Amazon Simple Storage Service (Amazon S3).
String mLModelType
Identifies the MLModel category. The following are the
available types:
Float scoreThreshold
The scoring threshold is used in binary classification
MLModel models. It marks the
boundary between a positive prediction and a negative prediction.
Output values greater than or equal to the threshold receive a positive
result from the MLModel, such as true. Output values less
than the threshold receive a negative response from the MLModel, such as
false.
Date scoreThresholdLastUpdatedAt
The time of the most recent edit to the ScoreThreshold. The
time is expressed in epoch time.
String logUri
A link to the file that contains logs of the CreateMLModel
operation.
String message
A description of the most recent details about accessing the
MLModel.
String recipe
The recipe to use when training the MLModel. The
Recipe provides detailed information about the observation
data to use during training, and manipulations to perform on the
observation data during training.
This parameter is provided as part of the verbose format.
String schema
The schema used by all of the data files referenced by the
DataSource.
This parameter is provided as part of the verbose format.
Integer code
Integer code
Integer code
Integer code
String mLModelId
The ID assigned to the MLModel at creation.
String trainingDataSourceId
The ID of the training DataSource. The
CreateMLModel operation uses the
TrainingDataSourceId.
String createdByIamUser
The AWS user account from which the MLModel was created. The
account type can be either an AWS root account or an AWS Identity and
Access Management (IAM) user account.
Date createdAt
The time that the MLModel was created. The time is expressed
in epoch time.
Date lastUpdatedAt
The time of the most recent edit to the MLModel. The time is
expressed in epoch time.
String name
A user-supplied name or description of the MLModel.
String status
The current status of an MLModel. This element can have one
of the following values:
PENDING - Amazon Machine Learning (Amazon ML) submitted
a request to create an MLModel.INPROGRESS - The creation process is underway.FAILED - The request to create an MLModel
didn't run to completion. The model isn't usable.COMPLETED - The creation process completed successfully.
DELETED - The MLModel is marked as deleted.
It isn't usable.Long sizeInBytes
RealtimeEndpointInfo endpointInfo
The current endpoint of the MLModel.
com.amazonaws.internal.SdkInternalMap<K,V> trainingParameters
A list of the training parameters in the MLModel. The list
is implemented as a map of key-value pairs.
The following is the current set of training parameters:
sgd.maxMLModelSizeInBytes - The maximum allowed size of the
model. Depending on the input data, the size of the model might affect
its performance.
The value is an integer that ranges from 100000 to
2147483648. The default value is 33554432.
sgd.maxPasses - The number of times that the training
process traverses the observations to build the MLModel. The
value is an integer that ranges from 1 to 10000
. The default value is 10.
sgd.shuffleType - Whether Amazon ML shuffles the training
data. Shuffling the data improves a model's ability to find the optimal
solution for a variety of data types. The valid values are
auto and none. The default value is
none.
sgd.l1RegularizationAmount - The coefficient regularization
L1 norm, which controls overfitting the data by penalizing large
coefficients. This parameter tends to drive coefficients to zero,
resulting in sparse feature set. If you use this parameter, start by
specifying a small value, such as 1.0E-08.
The value is a double that ranges from 0 to
MAX_DOUBLE. The default is to not use L1 normalization. This
parameter can't be used when L2 is specified. Use this
parameter sparingly.
sgd.l2RegularizationAmount - The coefficient regularization
L2 norm, which controls overfitting the data by penalizing large
coefficients. This tends to drive coefficients to small, nonzero values.
If you use this parameter, start by specifying a small value, such as
1.0E-08.
The value is a double that ranges from 0 to
MAX_DOUBLE. The default is to not use L2 normalization. This
parameter can't be used when L1 is specified. Use this
parameter sparingly.
String inputDataLocationS3
The location of the data file or directory in Amazon Simple Storage Service (Amazon S3).
String algorithm
The algorithm used to train the MLModel. The following
algorithm is supported:
SGD -- Stochastic gradient descent. The goal of
SGD is to minimize the gradient of the loss function.String mLModelType
Identifies the MLModel category. The following are the
available types:
REGRESSION - Produces a numeric result. For example,
"What price should a house be listed at?"BINARY - Produces one of two possible results. For
example, "Is this a child-friendly web site?".MULTICLASS - Produces one of several possible results.
For example,
"Is this a HIGH-, LOW-, or MEDIUM-risk trade?".
Float scoreThreshold
Date scoreThresholdLastUpdatedAt
The time of the most recent edit to the ScoreThreshold. The
time is expressed in epoch time.
String message
A description of the most recent details about accessing the
MLModel.
com.amazonaws.internal.SdkInternalMap<K,V> properties
String predictedLabel
The prediction label for either a BINARY or
MULTICLASS MLModel.
Float predictedValue
REGRESSION MLModel.com.amazonaws.internal.SdkInternalMap<K,V> predictedScores
com.amazonaws.internal.SdkInternalMap<K,V> details
String mLModelId
A unique identifier of the MLModel.
com.amazonaws.internal.SdkInternalMap<K,V> record
String predictEndpoint
Prediction prediction
RDSDatabase databaseInformation
Describes the DatabaseName and
InstanceIdentifier of an an Amazon RDS database.
String selectSqlQuery
The query that is used to retrieve the observation data for the
DataSource.
RDSDatabaseCredentials databaseCredentials
The AWS Identity and Access Management (IAM) credentials that are used connect to the Amazon RDS database.
String s3StagingLocation
The Amazon S3 location for staging Amazon RDS data. The data retrieved
from Amazon RDS using SelectSqlQuery is stored in this
location.
String dataRearrangement
A JSON string that represents the splitting and rearrangement processing
to be applied to a DataSource. If the
DataRearrangement parameter is not provided, all of the
input data is used to create the Datasource.
There are multiple parameters that control what data is used to create a datasource:
percentBegin
Use percentBegin to indicate the beginning of the range of
the data used to create the Datasource. If you do not include
percentBegin and percentEnd, Amazon ML includes
all of the data when creating the datasource.
percentEnd
Use percentEnd to indicate the end of the range of the data
used to create the Datasource. If you do not include
percentBegin and percentEnd, Amazon ML includes
all of the data when creating the datasource.
complement
The complement parameter instructs Amazon ML to use the data
that is not included in the range of percentBegin to
percentEnd to create a datasource. The
complement parameter is useful if you need to create
complementary datasources for training and evaluation. To create a
complementary datasource, use the same values for
percentBegin and percentEnd, along with the
complement parameter.
For example, the following two datasources do not share any data, and can be used to train and evaluate a model. The first datasource has 25 percent of the data, and the second one has 75 percent of the data.
Datasource for evaluation:
{"splitting":{"percentBegin":0, "percentEnd":25}}
Datasource for training:
{"splitting":{"percentBegin":0, "percentEnd":25, "complement":"true"}}
strategy
To change how Amazon ML splits the data for a datasource, use the
strategy parameter.
The default value for the strategy parameter is
sequential, meaning that Amazon ML takes all of the data
records between the percentBegin and percentEnd
parameters for the datasource, in the order that the records appear in
the input data.
The following two DataRearrangement lines are examples of
sequentially ordered training and evaluation datasources:
Datasource for evaluation:
{"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"sequential"}}
Datasource for training:
{"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"sequential", "complement":"true"}}
To randomly split the input data into the proportions indicated by the
percentBegin and percentEnd parameters, set the strategy
parameter to random and provide a string that is used as the
seed value for the random data splitting (for example, you can use the S3
path to your data as the random seed string). If you choose the random
split strategy, Amazon ML assigns each row of data a pseudo-random number
between 0 and 100, and then selects the rows that have an assigned number
between percentBegin and percentEnd.
Pseudo-random numbers are assigned using both the input seed string value
and the byte offset as a seed, so changing the data results in a
different split. Any existing ordering is preserved. The random splitting
strategy ensures that variables in the training and evaluation data are
distributed similarly. It is useful in the cases where the input data may
have an implicit sort order, which would otherwise result in training and
evaluation datasources containing non-similar data records.
The following two DataRearrangement lines are examples of
non-sequentially ordered training and evaluation datasources:
Datasource for evaluation:
{"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"random", "randomSeed"="s3://my_s3_path/bucket/file.csv"}}
Datasource for training:
{"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"random", "randomSeed"="s3://my_s3_path/bucket/file.csv", "complement":"true"}}
String dataSchema
A JSON string that represents the schema for an Amazon RDS
DataSource. The DataSchema defines the
structure of the observation data in the data file(s) referenced in the
DataSource.
A DataSchema is not required if you specify a
DataSchemaUri
Define your DataSchema as a series of key-value pairs.
attributes and excludedVariableNames have an
array of key-value pairs for their value. Use the following format to
define your DataSchema.
{ "version": "1.0",
"recordAnnotationFieldName": "F1",
"recordWeightFieldName": "F2",
"targetFieldName": "F3",
"dataFormat": "CSV",
"dataFileContainsHeader": true,
"attributes": [
{ "fieldName": "F1", "fieldType": "TEXT" }, { "fieldName": "F2", "fieldType": "NUMERIC" }, { "fieldName": "F3", "fieldType": "CATEGORICAL" }, { "fieldName": "F4", "fieldType": "NUMERIC" }, { "fieldName": "F5", "fieldType": "CATEGORICAL" }, { "fieldName": "F6", "fieldType": "TEXT" }, { "fieldName": "F7", "fieldType": "WEIGHTED_INT_SEQUENCE" }, { "fieldName": "F8", "fieldType": "WEIGHTED_STRING_SEQUENCE" } ],
"excludedVariableNames": [ "F6" ] }
String dataSchemaUri
The Amazon S3 location of the DataSchema.
String resourceRole
The role (DataPipelineDefaultResourceRole) assumed by an Amazon Elastic Compute Cloud (Amazon EC2) instance to carry out the copy operation from Amazon RDS to an Amazon S3 task. For more information, see Role templates for data pipelines.
String serviceRole
The role (DataPipelineDefaultRole) assumed by AWS Data Pipeline service to monitor the progress of the copy task from Amazon RDS to Amazon S3. For more information, see Role templates for data pipelines.
String subnetId
The subnet ID to be used to access a VPC-based RDS DB instance. This attribute is used by Data Pipeline to carry out the copy task from Amazon RDS to Amazon S3.
com.amazonaws.internal.SdkInternalList<T> securityGroupIds
The security group IDs to be used to access a VPC-based RDS DB instance. Ensure that there are appropriate ingress rules set up to allow access to the RDS DB instance. This attribute is used by Data Pipeline to carry out the copy operation from Amazon RDS to an Amazon S3 task.
RDSDatabase database
The database details required to connect to an Amazon RDS.
String databaseUserName
String selectSqlQuery
The SQL query that is supplied during CreateDataSourceFromRDS.
Returns only if Verbose is true in
GetDataSourceInput.
String resourceRole
The role (DataPipelineDefaultResourceRole) assumed by an Amazon EC2 instance to carry out the copy task from Amazon RDS to Amazon S3. For more information, see Role templates for data pipelines.
String serviceRole
The role (DataPipelineDefaultRole) assumed by the Data Pipeline service to monitor the progress of the copy task from Amazon RDS to Amazon S3. For more information, see Role templates for data pipelines.
String dataPipelineId
The ID of the Data Pipeline instance that is used to carry to copy data from Amazon RDS to Amazon S3. You can use the ID to find details about the instance in the Data Pipeline console.
Integer peakRequestsPerSecond
The maximum processing rate for the real-time endpoint for
MLModel, measured in incoming requests per second.
Date createdAt
The time that the request to create the real-time endpoint for the
MLModel was received. The time is expressed in epoch time.
String endpointUrl
The URI that specifies where to send real-time prediction requests for
the MLModel.
The application must wait until the real-time endpoint is ready before using this URI.
String endpointStatus
The current status of the real-time endpoint for the MLModel
. This element can have one of the following values:
NONE - Endpoint does not exist or was previously
deleted.READY - Endpoint is ready to be used for real-time
predictions.UPDATING - Updating/creating the endpoint.RedshiftDatabase databaseInformation
Describes the DatabaseName and
ClusterIdentifier for an Amazon Redshift
DataSource.
String selectSqlQuery
Describes the SQL Query to execute on an Amazon Redshift database for an
Amazon Redshift DataSource.
RedshiftDatabaseCredentials databaseCredentials
Describes AWS Identity and Access Management (IAM) credentials that are used connect to the Amazon Redshift database.
String s3StagingLocation
Describes an Amazon S3 location to store the result set of the
SelectSqlQuery query.
String dataRearrangement
A JSON string that represents the splitting and rearrangement processing
to be applied to a DataSource. If the
DataRearrangement parameter is not provided, all of the
input data is used to create the Datasource.
There are multiple parameters that control what data is used to create a datasource:
percentBegin
Use percentBegin to indicate the beginning of the range of
the data used to create the Datasource. If you do not include
percentBegin and percentEnd, Amazon ML includes
all of the data when creating the datasource.
percentEnd
Use percentEnd to indicate the end of the range of the data
used to create the Datasource. If you do not include
percentBegin and percentEnd, Amazon ML includes
all of the data when creating the datasource.
complement
The complement parameter instructs Amazon ML to use the data
that is not included in the range of percentBegin to
percentEnd to create a datasource. The
complement parameter is useful if you need to create
complementary datasources for training and evaluation. To create a
complementary datasource, use the same values for
percentBegin and percentEnd, along with the
complement parameter.
For example, the following two datasources do not share any data, and can be used to train and evaluate a model. The first datasource has 25 percent of the data, and the second one has 75 percent of the data.
Datasource for evaluation:
{"splitting":{"percentBegin":0, "percentEnd":25}}
Datasource for training:
{"splitting":{"percentBegin":0, "percentEnd":25, "complement":"true"}}
strategy
To change how Amazon ML splits the data for a datasource, use the
strategy parameter.
The default value for the strategy parameter is
sequential, meaning that Amazon ML takes all of the data
records between the percentBegin and percentEnd
parameters for the datasource, in the order that the records appear in
the input data.
The following two DataRearrangement lines are examples of
sequentially ordered training and evaluation datasources:
Datasource for evaluation:
{"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"sequential"}}
Datasource for training:
{"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"sequential", "complement":"true"}}
To randomly split the input data into the proportions indicated by the
percentBegin and percentEnd parameters, set the strategy
parameter to random and provide a string that is used as the
seed value for the random data splitting (for example, you can use the S3
path to your data as the random seed string). If you choose the random
split strategy, Amazon ML assigns each row of data a pseudo-random number
between 0 and 100, and then selects the rows that have an assigned number
between percentBegin and percentEnd.
Pseudo-random numbers are assigned using both the input seed string value
and the byte offset as a seed, so changing the data results in a
different split. Any existing ordering is preserved. The random splitting
strategy ensures that variables in the training and evaluation data are
distributed similarly. It is useful in the cases where the input data may
have an implicit sort order, which would otherwise result in training and
evaluation datasources containing non-similar data records.
The following two DataRearrangement lines are examples of
non-sequentially ordered training and evaluation datasources:
Datasource for evaluation:
{"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"random", "randomSeed"="s3://my_s3_path/bucket/file.csv"}}
Datasource for training:
{"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"random", "randomSeed"="s3://my_s3_path/bucket/file.csv", "complement":"true"}}
String dataSchema
A JSON string that represents the schema for an Amazon Redshift
DataSource. The DataSchema defines the
structure of the observation data in the data file(s) referenced in the
DataSource.
A DataSchema is not required if you specify a
DataSchemaUri.
Define your DataSchema as a series of key-value pairs.
attributes and excludedVariableNames have an
array of key-value pairs for their value. Use the following format to
define your DataSchema.
{ "version": "1.0",
"recordAnnotationFieldName": "F1",
"recordWeightFieldName": "F2",
"targetFieldName": "F3",
"dataFormat": "CSV",
"dataFileContainsHeader": true,
"attributes": [
{ "fieldName": "F1", "fieldType": "TEXT" }, { "fieldName": "F2", "fieldType": "NUMERIC" }, { "fieldName": "F3", "fieldType": "CATEGORICAL" }, { "fieldName": "F4", "fieldType": "NUMERIC" }, { "fieldName": "F5", "fieldType": "CATEGORICAL" }, { "fieldName": "F6", "fieldType": "TEXT" }, { "fieldName": "F7", "fieldType": "WEIGHTED_INT_SEQUENCE" }, { "fieldName": "F8", "fieldType": "WEIGHTED_STRING_SEQUENCE" } ],
"excludedVariableNames": [ "F6" ] }
String dataSchemaUri
Describes the schema location for an Amazon Redshift
DataSource.
RedshiftDatabase redshiftDatabase
String databaseUserName
String selectSqlQuery
The SQL query that is specified during
CreateDataSourceFromRedshift. Returns only if Verbose
is true in GetDataSourceInput.
Integer code
String dataLocationS3
The location of the data file(s) used by a DataSource. The
URI specifies a data file or an Amazon Simple Storage Service (Amazon S3)
directory or bucket containing data files.
String dataRearrangement
A JSON string that represents the splitting and rearrangement processing
to be applied to a DataSource. If the
DataRearrangement parameter is not provided, all of the
input data is used to create the Datasource.
There are multiple parameters that control what data is used to create a datasource:
percentBegin
Use percentBegin to indicate the beginning of the range of
the data used to create the Datasource. If you do not include
percentBegin and percentEnd, Amazon ML includes
all of the data when creating the datasource.
percentEnd
Use percentEnd to indicate the end of the range of the data
used to create the Datasource. If you do not include
percentBegin and percentEnd, Amazon ML includes
all of the data when creating the datasource.
complement
The complement parameter instructs Amazon ML to use the data
that is not included in the range of percentBegin to
percentEnd to create a datasource. The
complement parameter is useful if you need to create
complementary datasources for training and evaluation. To create a
complementary datasource, use the same values for
percentBegin and percentEnd, along with the
complement parameter.
For example, the following two datasources do not share any data, and can be used to train and evaluate a model. The first datasource has 25 percent of the data, and the second one has 75 percent of the data.
Datasource for evaluation:
{"splitting":{"percentBegin":0, "percentEnd":25}}
Datasource for training:
{"splitting":{"percentBegin":0, "percentEnd":25, "complement":"true"}}
strategy
To change how Amazon ML splits the data for a datasource, use the
strategy parameter.
The default value for the strategy parameter is
sequential, meaning that Amazon ML takes all of the data
records between the percentBegin and percentEnd
parameters for the datasource, in the order that the records appear in
the input data.
The following two DataRearrangement lines are examples of
sequentially ordered training and evaluation datasources:
Datasource for evaluation:
{"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"sequential"}}
Datasource for training:
{"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"sequential", "complement":"true"}}
To randomly split the input data into the proportions indicated by the
percentBegin and percentEnd parameters, set the strategy
parameter to random and provide a string that is used as the
seed value for the random data splitting (for example, you can use the S3
path to your data as the random seed string). If you choose the random
split strategy, Amazon ML assigns each row of data a pseudo-random number
between 0 and 100, and then selects the rows that have an assigned number
between percentBegin and percentEnd.
Pseudo-random numbers are assigned using both the input seed string value
and the byte offset as a seed, so changing the data results in a
different split. Any existing ordering is preserved. The random splitting
strategy ensures that variables in the training and evaluation data are
distributed similarly. It is useful in the cases where the input data may
have an implicit sort order, which would otherwise result in training and
evaluation datasources containing non-similar data records.
The following two DataRearrangement lines are examples of
non-sequentially ordered training and evaluation datasources:
Datasource for evaluation:
{"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"random", "randomSeed"="s3://my_s3_path/bucket/file.csv"}}
Datasource for training:
{"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"random", "randomSeed"="s3://my_s3_path/bucket/file.csv", "complement":"true"}}
String dataSchema
A JSON string that represents the schema for an Amazon S3
DataSource. The DataSchema defines the
structure of the observation data in the data file(s) referenced in the
DataSource.
You must provide either the DataSchema or the
DataSchemaLocationS3.
Define your DataSchema as a series of key-value pairs.
attributes and excludedVariableNames have an
array of key-value pairs for their value. Use the following format to
define your DataSchema.
{ "version": "1.0",
"recordAnnotationFieldName": "F1",
"recordWeightFieldName": "F2",
"targetFieldName": "F3",
"dataFormat": "CSV",
"dataFileContainsHeader": true,
"attributes": [
{ "fieldName": "F1", "fieldType": "TEXT" }, { "fieldName": "F2", "fieldType": "NUMERIC" }, { "fieldName": "F3", "fieldType": "CATEGORICAL" }, { "fieldName": "F4", "fieldType": "NUMERIC" }, { "fieldName": "F5", "fieldType": "CATEGORICAL" }, { "fieldName": "F6", "fieldType": "TEXT" }, { "fieldName": "F7", "fieldType": "WEIGHTED_INT_SEQUENCE" }, { "fieldName": "F8", "fieldType": "WEIGHTED_STRING_SEQUENCE" } ],
"excludedVariableNames": [ "F6" ] }
String dataSchemaLocationS3
Describes the schema location in Amazon S3. You must provide either the
DataSchema or the DataSchemaLocationS3.
String key
A unique identifier for the tag. Valid characters include Unicode letters, digits, white space, _, ., /, =, +, -, %, and @.
String value
An optional string, typically used to describe or define the tag. Valid characters include Unicode letters, digits, white space, _, ., /, =, +, -, %, and @.
String batchPredictionId
The ID assigned to the BatchPrediction during creation. This
value should be identical to the value of the
BatchPredictionId in the request.
String dataSourceId
The ID assigned to the DataSource during creation. This
value should be identical to the value of the DataSourceID
in the request.
String evaluationId
The ID assigned to the Evaluation during creation. This
value should be identical to the value of the Evaluation in
the request.
String mLModelId
The ID assigned to the MLModel during creation.
String mLModelName
A user-supplied name or description of the MLModel.
Float scoreThreshold
The ScoreThreshold used in binary classification
MLModel that marks the boundary between a positive
prediction and a negative prediction.
Output values greater than or equal to the ScoreThreshold
receive a positive result from the MLModel, such as
true. Output values less than the
ScoreThreshold receive a negative response from the
MLModel, such as false.
String mLModelId
The ID assigned to the MLModel during creation. This value
should be identical to the value of the MLModelID in the
request.
Copyright © 2016. All rights reserved.