Deep Into: logistic classifications
will it rain tomorrow?
supervise learning can be divided into future two parts which are regression and classification. we use regression to predict numerical values with historical data. likewise, classification techniques are used to classify data using their shared qualities or characteristics. in machine learning we can do this classification using several algorithms in this post I hope to build a logistic regression model to use in binary classification. binary means there are only two possible outcomes like head or tail, right or left, or yes/no.
for example, if we consider the wine problem, next month's wine price is a kind of regression question. when we consider wine type quality as a classification problem (good or bad). now we have a clear idea about classification. let's dive deeper. how we do this classification thing? we have several strategies for this task. The most simple approach is the regression line. we fit the regression line by using the data we have and we can use that line as a boundary to group our data. but this is not a good idea.. because very low accuracy can achieve by this method. so I decide to use another strategy which is the logistic classification method which uses conditional probability. with this, I can get a probability of an event whether it happens or not.
as I say before linear regression method not very much accurate it misses most of the data points. then looks logistic method gives probability so to get a prediction we need to set a threshold (normally 0.5) to get predictions. then group data according to threshold greater values go to one class and lower values go to another class.
probability graph of a logistic classification model |
same features used in the linear model used heer also and no changes are done on data. as you can see in graph x and the y-axis shows features used in the model and the z-axis how the probability. we can assume threshold as 0.5 then greater values than it treats as "yes" others as "no". so let's compare predictions of each model through grapes.
predictions of linear regression-based model |
predictions of the logistic classification model |
Logistic classification
Model building
- NumPy
- pandas
- feature engine
- SciPy
- Seabron
- plotly
- SciKit learn
- are there any missing values
- distribution of features
- possible outliers
note:
compare probabilities of NO in both model |
we can see how the probability of each model related to each other it's clear that both models are fairly working well because data points lie in a kind of symmetric shape. blue points indicate were predicted values by both models, not the same and orange points indicate predictions are the same in that point. you can get to know exactly which model is better with this results,
learn more
- full notebook is in this repository - australia-rain-prediction
- read this page in Wikipedia - logistic regression
Well done...
ReplyDeleteWell done
ReplyDeleteSuperb Brother
ReplyDeleteCongratulations on your first blog post and keep writing these amazing content!
ReplyDeleteGood luck bro
ReplyDeleteSuperb.👌
ReplyDelete