In this video, you will see how to build a binary classification model that assesses the likelihood that a customer of an outdoor equipment company will buy a tent. This video uses a data set called GoSales which you’ll find in the IBM Watson Gallery.  Add this data set to the Machine Learning project, and then go to the project. You’ll find the GoSales.csv file listed with your other Data Assets. View the data set. The feature columns are gender, age, marital_status, and profession and contain the attributes on which the machine learning model will base predictions. The label columns are Is_tent, product_line, and purchase_amounts and contain historical outcomes that the models could be trained to predict. Go back to the Assets tab, and add to the project, an AutoAI experiment.  This project already has the Watson Machine Learning service associated. If you haven’t done that yet, first watch the video showing how to run an AutoAI experiment based on a sample. Just provide a name for the experiment, and then click Create. The AutoAI Experiment Builder displays. You first need to load the training data. In this case, the data set will be from the project. Select the GoSales CSV file from the list.  AutoAI reads the data set, and lists the columns found in the data set. Since you want the model to predict the likelihood that a given customer will purchase a tent, select IS_TENT as the column to predict. Now edit the experiment settings. First look at settings for the Data Source. If you have a large data set, you can run the experiment on a subsample of rows. And you can configure how much of the data will be used for training and how much will be used for evaluation. The default is a 90%-10% split where 10% of the data is reserved for evaluation.  You can also select which columns from the data set to include when running the experiment. On the Prediction panel, you can select a prediction type. In this case, AutoAI analyzed your data and determined that the IS_TENT column contains True/False information, making this data suitable for a binary classification model. And the positive class is True, and the default metric for a binary classification is ROC/AUC which balances precision, accuracy, and recall. If you’d like, you can choose specific algorithms to consider for this experiment. On the General panel,  you can review other details about the experiment. In this case, accepting the default settings makes the most sense. Now run the experiment, and wait as the Pipeline leaderboard fills in to show the generated pipelines using different estimators, such as the XGBoost classifier, or enhancements, such as   hyperparameter optimization and feature engineering, with the pipelines ranked based on the ROC AUC metric. Hyperparameter Optimization is a mechanism for automatically exploring a search space of potential Hyperparameters, building a series of models and comparing the models using metrics of interest. Feature engineering attempts to transform the raw data into the combination of features that best represents the problem to achieve the most accurate prediction. Okay! The run has completed. View the progress map to see details of the run.
Scroll down to view the pipeline leaderboard, You may want to start with comparing the pipelines. This chart provides metrics for the four pipelines viewed by cross-validation score. You can see the pipelines ranked based on other metrics such as accuracy or average precision.  You can select an individual pipeline to review the model evaluation which includes the ROC curve. During AutoAI training, your data set is split into two parts: training data and hold-out data. The training data is used by the AutoAI training stages to generate the model pipelines and cross-validation scores are used to rank them.  After training, the hold-out data is used for the resulting pipeline model evaluation and computation of performance information such as ROC curves and confusion matrices. You can also view the confusion matrix, precision recall curve, model information, and feature importance. This pipeline had the highest ranking, so you can save this as a machine learning model. Just accept the defaults, and save the model. 
Now that you’ve trained the model, you’re ready to view the model, and deploy it. The Overview tab shows a model summary and the input schema. On the Deployments tab, add a deployment. This will be a Web Service deployment with the specified name. When you’re ready, save the new deployment.  When the model deployment is complete, view the deployment. The Overview tab shows the basic deployment information. On the Test tab, you can test the model prediction. You can either enter test input data, or paste JSON input data, and click Predict. This shows that there’s a very high probability that the first customer will buy a tent, and a very high probability that the second customer will not buy a tent. On the Implementation tab, you’ll find the scoring end point for future reference. You’ll also find code snippets for various programming languages to utilize this deployment from your application. You can also view the API specification from here.  And back in the project. you’ll find the AutoAI experiment and the model on the Assets tab, and the deployment on the Deployments tab. Find more videos in the IBM Watson Data and AI Learning Center. http://ibm.biz/ibm-machine-learning.


Leave a Reply

Your email address will not be published. Required fields are marked *