Libraries and Tools

Pandas

Library	Purpose
Numpy	Multi-dimensional array
NumPy	NumPy offers comprehensive mathematical functions, random number generators, linear algebra routines, Fourier transforms, and more.
Pandas	Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool.
MatPlotLib	Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.
SciKit-Learn	Simple and efficient tools for predictive data analysis · Accessible to everybody, and reusable in various contexts
Jupyter	The Jupyter Notebook App is a server-client application that allows editing and running notebook documents via a web browser.
Anaconda	Anaconda is a distribution of the Python and R programming languages for scientific computing, that aims to simplify package management and deployment.

Getting Started

Install Anaconda

https://www.anaconda.com/products/individual

Start a jupyter notebook

Code Block
$ jupyter notebook

Create a new Python3 notebook

Image Added

Import a Dataset

We can get some sample datasets from kaggle.com - https://www.kaggle.com/

From our Jupyter notebook, we are going to import a downloaded CSV.

Code Block
import panda as pd df = pd.read_csv('vgsales.csv') df

The pd.read function returns a DataFrame object

Image Added

Dataframe Functions:

Image Added

Interesting DataFrame functions:

Method

Description

Example

shape

returns dimensions of dataset

df.shape

(16598, 11)

describe

returns useful statistics about our data

df.describe()

(see above image)

values

returns your data

Jupyter Shortcuts

Shortcut	Mode	Key	Description
Add Cell Above	Command	a
Add Cell Below	Command	b
Delete Current Cell	Command	dd
Run current Cell and Stay in Cell	Command/Edit	<CTRL><ENTER>	Run Commands in cell without adding a cell below.
Autocompletion	Edit	<TAB>	Get methods for object
Method Documentation	Edit	<SHIFT> <TAB>	Get information on method
Make Comment	Edit	<CMD> /	Comment/UnComment

Real Example

Import the data

Code Block
import pandas as pd df = pd.read_csv('music.csv') df

Spit the Data

Create input and output data sets. X = input, y = output.

Since we want to predict the type of music based on age and sex, we create our input data as X and our output as y.

Code Block
import pandas as pd df = pd.read_csv('music.csv') X = df.drop(columns="genre") y = df["genre"] y

Train and Do a Prediction

Code Block

import pandas as pd
from sklearn.tree import DecisionTreeClassifier

df = pd.read_csv('music.csv')
X = df.drop(columns="genre")
y = df["genre"]

model = DecisionTreeClassifier()

# train model
model.fit(X,y)

# predict
# 21 year old male and 22 year old female
predictions = model.predict([[21,1],[22,0]])
predictions

In the above example, we used 100% of the data for training and 0 for testing our model.

Testing our Model

Code Block

import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

df = pd.read_csv('music.csv')
X = df.drop(columns="genre")
y = df["genre"]

#split our data into train and test DataFrames (20% for testing)
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)

model = DecisionTreeClassifier()

# train model
model.fit(X_train,y_train)

# run predict using test data
predictions = model.predict(X_test)
score = accuracy_score(y_test, predictions)
score

Image Added

Model Persistence

References

Reference	URL
Python Machine Learning Tutorial (Data Science)	https://www.youtube.com/watch?v=7eh4d6sabA0

...

Space shortcuts

Page tree

Versions Compared

Old Version 2

New Version 3

Key

Libraries and Tools

Getting Started

Import a Dataset

Jupyter Shortcuts

Real Example

Import the data

Spit the Data

Train and Do a Prediction

Testing our Model

Model Persistence

References

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 2

New Version 3

Key

Libraries and Tools

Getting Started

Import a Dataset

Jupyter Shortcuts

Real Example

Import the data

Spit the Data

Train and Do a Prediction

Testing our Model

Model Persistence

References