Cyber Crime & Confusion Matrix
Confusion Matrix
A confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known. The confusion matrix itself is relatively simple to understand, but the related terminology can be confusing.
When we get the data, after data cleaning, pre-processing and wrangling, the first step we do is to feed it to an outstanding model and of course, get output in probabilities. But hold on! How in the hell can we measure the effectiveness of our model. Better the effectiveness, better the performance and that’s exactly what we want. And it is where the Confusion matrix comes into the limelight. Confusion Matrix is a performance measurement for machine learning classification.
It is extremely useful for measuring Recall, Precision, Specificity, Accuracy and most importantly AUC-ROC Curve.
True Positive: You predicted positive and it’s true.
True Negative: You predicted negative and it’s true.
False Positive: You predicted positive and it’s false.
False Negative: You predicted negative and it’s false.
Type I & Type II Error
Confusion matrices have two types of errors: Type I and Type II. I was taught two ways to keep Type I and Type II straight.
The value present at upper right corner ( i.e False Positives) , are known as Type I error. As these are the values, that are predicted to be positive for which outcome should be negative. For example , A person is victim , but declared as innocent by our model, then those will come under type I error. While, The value present at lower left corner (i.e False Negative), are known as Type II error. These are the values, that are predicted to be negative , for which outcome should be positive. For example , A person is innocent , but concluded to be victim by our model, then those will come under type II error.
The first way is to re-write False Negative and False Positive. False Positive is a Type I error because False Positive = False True and that only has one F. False Negative is a Type II error because False Negative = False False so thus there are two F’s making it a Type II. (Kudos to Riley Dallas for this method!)
The second way is to consider the meanings of these words. False Positive contains one negative word (False) so it’s a Type I error. False Negative has two negative words (False + Negative) so it’s a Type II error.
Cyber Crimes :
Cybercrime, or computer crime, is a crime that involves a computer and a network. The computer may have been used in the commission of a crime, or it may be the target. Cybercrime may harm someone’s security and financial health.
While most cybercrimes are carried out in order to generate profit for the cybercriminals, some cybercrimes are carried out against computers or devices directly to damage or disable them, while others use computers or networks to spread malware, illegal information, images or other materials. Some cybercrimes do both — i.e., target computers to infect them with a computer virus, which is then spread to other machines and, sometimes, entire networks.
Role of Confusion Matrix in the field of Cyber Security
> It shows how any classification model is confused when it makes predictions.
>The confusion matrix not only gives you insight into the errors being made by your classifier but also the types of errors that are being made.
>This breakdown helps you to overcomes the limitation of using classification accuracy alone.
>Every column of the confusion matrix represents the instances of that predicted class.
>Each row of the confusion matrix represents the instances of the actual class.
>It provides insight not only into the errors which are made by a classifier but also errors that are being made.
How Confusion Matrix solves Cyber Crime Cases?
Based on Past Cyber security incidents and response actions. We can get data set and we can analyze them and create various types of machine learning model to investigate the kind of problems concerned with the prediction of response actions to future incidents from features of past incidents. But Here the problem arises is , which model will work great, not only for correct prediction , for predicting less false positives also. So here the role of Confusion matrix comes into play, means among some machine learning model , the confusion matrix helps to find the best model.
An Intrusion Detection System (IDS) provides approaches against many fast-growing network attacks (e.g., DDoS attack, Ransomware attack, Botnet attack, etc.), as it blocks the harmful activities occurring in the network system. In this work, three different classification machine learning algorithms — Naïve Bayes (NB), Support Vector Machine (SVM), and K-nearest neighbor (KNN) — were used to detect the accuracy and reducing the processing time of an algorithm. Also then confusion matrix were generated and compared to finalize the best suitable model by testing it on test dataset.