1. Introduction to SVM
Used SVM to build and train a model using human cell records, and classify cells to whether the samples are benign (mild state) or malignant (evil state).
SVM works by mapping data to a high-dimensional feature space so that data points can be categorized, even when the data are not otherwise linearly separable (This gets done by kernel function of SVM classifier). A separator between the categories is found, then the data is transformed in such a way that the separator could be drawn as a hyperplane.
2. Necessary imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
3. About the Cancer data
Original Author - UCI Machine Learning Repository (Asuncion and Newman, 2007)[
http://mlearn.ics.uci.edu/MLRepositor...
]
Public Source -
https://s3-api.us-geo.objectstorage.s...
4. Load Data From CSV File
The characteristics of the cell samples from each patient are contained in fields Clump to Mit. The values are graded from 1 to 10, with 1 being the closest to benign.
The Class field contains the diagnosis, as confirmed by separate medical procedures, as to whether the samples are benign (value = 2) or malignant (value = 4).
cell_df = pd.read_csv('cell_samples.csv')
cell_df.head() ->give first five rows
cell_df.tail() -> last five
cell_df.shape
cell_df.size
cell_df.count()
cell_df['Class'].value_counts()
5. Distribution of the classes
malignant_df = cell_df[cell_df['Class'] == 4]
benign_df = cell_df[cell_df['Class'] == 2]
axes = benign_df.plot(kind='scatter',x='Clump',y='UnifSize',color='blue',label='benign')
malignant_df.plot(kind='scatter',x='Clump',y='UnifSize',color='red',label='malignant',ax=axes)
6. Selection of unwanted columns
cell_df.dtypes
cell_df = cell_df[pd.to_numeric(cell_df['BareNuc'],errors='coerce').notnull()]
cell_df['BareNuc'] = cell_df['BareNuc'].astype('int')
cell_df.dtypes