Import the Data
Clean the Data
Split the Data into Training/Test Sets (80% training/20% testing)
Create a Model - select an algorithm
Train the Model
Make Predictions
Evaluate and Improve

Libraries and Tools

Library	Purpose
NumPy	NumPy offers comprehensive mathematical functions, random number generators, linear algebra routines, Fourier transforms, and more.
Pandas	Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool.
MatPlotLib	Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.
SciKit-Learn	Simple and efficient tools for predictive data analysis · Accessible to everybody, and reusable in various contexts
Jupyter	The Jupyter Notebook App is a server-client application that allows editing and running notebook documents via a web browser.
Anaconda	Anaconda is a distribution of the Python and R programming languages for scientific computing, that aims to simplify package management and deployment.

Getting Started

Install Anaconda

...

Create a new Python3 notebook

Image Modified

Import a Dataset

...

Interesting DataFrame functions:

Method

Description

Example

shape

returns dimensions of dataset

df.shape

(16598, 11)

describe

returns useful statistics about our data

df.describe()

(see above image)

values

returns your data

Jupyter Shortcuts

Shortcut	Mode	Key	Description
Add Cell Above	Command	a
Add Cell Below	Command	b
Delete Current Cell	Command	dd
Run current Cell and Stay in Cell	Command/Edit	<CTRL><ENTER>	Run Commands in cell without adding a cell below.
Autocompletion	Edit	<TAB>	Get methods for object
Method Documentation	Edit	<SHIFT> <TAB>	Get information on method
Make Comment	Edit	<CMD> /	Comment/UnComment

Real Example

Import the data

Code Block
import pandas as pd df = pd.read_csv('music.csv') df

Spit the Data

Create input and output data sets. X = input, y = output.

...

Code Block
import pandas as pd df = pd.read_csv('music.csv') X = df.drop(columns="genre") y = df["genre"] y

Train and Do a Prediction

Code Block

import pandas as pd
from sklearn.tree import DecisionTreeClassifier

df = pd.read_csv('music.csv')
X = df.drop(columns="genre")
y = df["genre"]

model = DecisionTreeClassifier()

# train model
model.fit(X,y)

# predict
# 21 year old male and 22 year old female
predictions = model.predict([[21,1],[22,0]])
predictions

In the above example, we used 100% of the data for training and 0 for testing our model.

Testing our Model

Code Block

import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

df = pd.read_csv('music.csv')
X = df.drop(columns="genre")
y = df["genre"]

#split our data into train and test DataFrames (20% for testing)
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)

model = DecisionTreeClassifier()

# train model
model.fit(X_train,y_train)

# run predict using test data
predictions = model.predict(X_test)
score = accuracy_score(y_test, predictions)
score

Model Persistence

Saving a Trained Model

Code Block

import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import joblib

df = pd.read_csv('music.csv')
X = df.drop(columns="genre")
y = df["genre"]

#split our data into train and test DataFrames (20% for testing)
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)

model = DecisionTreeClassifier()

# train model
model.fit(X_train,y_train)

# run predict using test data
# predictions = model.predict(X_test)
# score = accuracy_score(y_test, predictions)

#save our model
joblib.dump(model,"music-recomender.joblib")

Predictions from a Saved Model

Code Block
import joblib #load our model model = joblib.load("music-recomender.joblib") # run predict using test data predictions = model.predict([[20,1],[21,0]]) predictions

Visualizing Decision Trees

Code Block

import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree

df = pd.read_csv('music.csv')
X = df.drop(columns="genre")
y = df["genre"]

model = DecisionTreeClassifier()

# train model
model.fit(X,y)

# export graph of data in dot format
tree.export_graphviz(model,out_file='music_recomender.dot', 
                    feature_names=['age','gender'], 
                    class_names=sorted(y.unique()),
                    label='all',
                    rounded=True,
                    filled=True)

This will output our .dot file. We just need to pull it into VSCode with dot plugin to visualize it.

Image Added

References

Reference	URL
Python Machine Learning Tutorial (Data Science)	https://www.youtube.com/watch?v=7eh4d6sabA0

...

Space shortcuts

Page tree

Versions Compared

Old Version 3

New Version Current

Key

Libraries and Tools

Getting Started

Import a Dataset

Jupyter Shortcuts

Real Example

Import the data

Spit the Data

Train and Do a Prediction

Testing our Model

Model Persistence

Saving a Trained Model

Predictions from a Saved Model

Visualizing Decision Trees

References

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 3

New Version Current

Key

Libraries and Tools

Getting Started

Import a Dataset

Jupyter Shortcuts

Real Example

Import the data

Spit the Data

Train and Do a Prediction

Testing our Model

Model Persistence

Saving a Trained Model

Predictions from a Saved Model

Visualizing Decision Trees

References