1. Introduction to SVM

Used SVM to build and train a model using human cell records, and classify cells to whether the samples are benign (mild state) or malignant (evil state).

SVM works by mapping data to a high-dimensional feature space so that data points can be categorized, even when the data are not otherwise linearly separable (This gets done by kernel function of SVM classifier). A separator between the categories is found, then the data is transformed in such a way that the separator could be drawn as a hyperplane.


2. Necessary imports

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

3. About the Cancer data


Original Author - UCI Machine Learning Repository (Asuncion and Newman, 2007)[

http://mlearn.ics.uci.edu/MLRepositor...

]

Public Source -

https://s3-api.us-geo.objectstorage.s...


4. Load Data From CSV File

The characteristics of the cell samples from each patient are contained in fields Clump to Mit. The values are graded from 1 to 10, with 1 being the closest to benign.

The Class field contains the diagnosis, as confirmed by separate medical procedures, as to whether the samples are benign (value = 2) or malignant (value = 4).


cell_df = pd.read_csv('cell_samples.csv')
cell_df.head() ->give first five rows
cell_df.tail() -> last five
cell_df.shape
cell_df.size
cell_df.count()
cell_df['Class'].value_counts()

5. Distribution of the classes

malignant_df = cell_df[cell_df['Class'] == 4]
benign_df = cell_df[cell_df['Class'] == 2]

axes = benign_df.plot(kind='scatter',x='Clump',y='UnifSize',color='blue',label='benign')
malignant_df.plot(kind='scatter',x='Clump',y='UnifSize',color='red',label='malignant',ax=axes)

6. Selection of unwanted columns


cell_df.dtypes

cell_df = cell_df[pd.to_numeric(cell_df['BareNuc'],errors='coerce').notnull()]
cell_df['BareNuc'] = cell_df['BareNuc'].astype('int')
cell_df.dtypes