## Sunday, August 25, 2019

### Easy Binary Logistic Regression Interpretation in SPSS

What is binary logistic regression?
Binary logistic regression belongs to the family of logistic regression analysis wherein the dependent or outcome variable is binary or categorical in nature and one or more nominal, ordinal, interval or ratio-level independent variables. Like all linear regressions, logistic regression is a predictive analysis.
Binary logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more continuous-level (interval or ratio scale) independent variables. In binary logistic regression,  the log of odds of the dependent variable is modeled as a linear combination of the independent variables. Log odds are an alternate way of expressing probabilities, which simplifies the process of updating them with new evidence.
What is the primary assumption when using binary logistic regression?
The dependent variable in binary logistic regression should be binary or dichotomous in nature. Example of these includes pass/failed, alive/dead, male/female, yes/no, approved/disapproved, with/without.

What are the kinds of studies in using the binary logistic regression?
*Predict the likelihood of the law graduates to pass the bar exam or not?
*Whether or not the teachers will use the new instructional media or not?
*Predict the likelihood of the divorce bill in the Philippines be approved or not?

Are there assumptions in logistic regression?
Logistic regression is a non-parametric statistics approach. Like parametric statistics, binary logistic regression requires to fulfill certain assumptions before employing it. The assumptions for logistic regression are the following:
1. Does not require a linear relationship between the dependent and independent variables.
2. The error terms (residuals) do not need to be normally distributed.
3. Binary logistic regression requires a dependent variable to be binary and ordinal logistic regression requires the dependent variable to be ordinal.
4. Logistic regression requires observations to be independent of each other.
5. Logistic regression requires there to be little or no multicollinearity among the independent variables.
6. Finally, logistic regression typically requires a large sample size.  A general guideline is that you need a minimum of 10 cases with the least frequent outcome for each independent variable in your model. For example, if you have 5 independent variables and the expected probability of your least frequent outcome is .10, then you would need a minimum sample size of 500 (10*5 / .10).

Can I use Linear Regression when the dependent variable is binary?
A simple or multiple linear regression is a violation if you will carry out statistical analysis with a dependent variable that is binary in nature because the dependent variable is binary and violates the assumption of normality.

What is the formula for binary logistic regression?
Binary logistic regression is similar to ordinary least square. The prediction equation is:

log(p/1-p) = b0 + b1*x1 + b2*x2 + b3*x3 + b3*x3+b4*x4

What are the steps in interpreting the Binary Logistic Regression result in an easy way?

Here are the steps in interpreting the results of binary logistic regression using SPSS.

 Case Processing Summary Unweighted Casesa N Percent Selected Cases Included in Analysis 100 100.0 Missing Cases 0 .0 Total 100 100.0 Unselected Cases 0 .0 Total 100 100.0 a. If weight is in effect, see classification table for the total number of cases.

The first table above shows a breakdown of the number of cases used and not used in the analysis. Based on the table there are no missing cases in the dataset.

 Dependent Variable Encoding Original Value Internal Value Will Not Adopt 0 Adopt 1

The second table above gives the coding for the outcome variable, adoption.

 Categorical Variables Codings Frequency Parameter coding (1) Farm_Size 3 Hectares or less 52 1.000 4 Hectares or more 48 .000 Farm_Locatiosns Rural 48 1.000 Urban 52 .000

The table above shows how the values of the categorical variable farm size and locations were handled, there are terms (essentially dummy variables) in the model.

The block 0 output is for a model that includes only the intercept (constant). It is a null model, a model with no predictors.

 Variables in the Equation B S.E. Wald df Sig. Exp(B) Step 0 Constant -.160 .201 .639 1 .424 .852

The .852 is the predicted odds of adopting farming technology. The computation is presented below: Since 46 of our subjects decided to adopt the technology and 54 decided not to adopt, our observed odds is 46/54=.852

 Variables not in the Equation Score df Sig. Step 0 Variables Awareness 58.237 1 .000 Farm_Locations(1) 31.975 1 .000 Farm_Size(1) 51.794 1 .000 Overall Statistics 70.867 3 .000

It gives the results of a score test, also known as a Lagrange multiplier (LM) test. Lagrange multiplier test measures a hypothesis about the parameters in a likelihood framework. Lagrange multiplier test measures only the estimates of the parameters subject to the restrictions while Wald tests are are based on unrestricted estimates.
The column labeled Score gives the estimated change in a model fit if the term is added to the model, the other two columns give the degrees of freedom, and p-value (labeled Sig.) for the estimated change. Based on the table above, all three of the predictors, awarenesslocation, and size, are expected to improve the fit of the model.

Block 1 Method Enter

 Omnibus Tests of Model Coefficients Chi-square df Sig. Step 1 Step 94.698 3 .000 Block 94.698 3 .000 Model 94.698 3 .000

The table above gives the overall test for the model that includes the predictors. The chi-square value of 94.698 with a p-value .000 tells us that our model as a whole fits significantly better than an empty model (i.e., a model with no predictors).

The Omnibus Tests of Model Coefficients is used to check that the new model (with explanatory variables included) is an improvement over the baseline model. It uses chi-square tests to see if there is a significant difference between the Log-likelihoods.

With the addition of the 3 predictors, 95.7 percent of the observed respondents who will adopt and 92.6 percent who will not adopt in the new farming technology were correctly predicted that gives an overall percentage of 94.0. Significantly, this is higher as compared with the null model.

 Model Summary Step -2 Log likelihood Cox & Snell R Square Nagelkerke R Square 1 43.291a .612 .818 a. Estimation terminated at iteration number 6 because parameter estimates changed by less than .001.
The -2 Log Likelihood is 43.291. This statistic measures how poorly the model predicts the decisions. The Chi-square value of 94.698+43.291=137.989 is the -2log likelihood of the model without the predictors. So based on that, the null model has a higher value (very poor in predicting the decisions as compared with this model summary).
The Cox and Snell R Square and Nagelkerke R Square, explains the variation in the likelihood that the farmer will adopt the new farming technology.  The full model explains that about 61 to 82 percent in the likelihood that the farmers will adopt the new farming technology given the set of independent variables.

Cox & Snell R Square and Nagelkerke R Square, are pseudo-R-squares. They determine the variation of probability of the likelihood. In the given example, 61 to 82% of the variation of probability that the farmers will adopt the new farming technology.

 Variables in the Equation B S.E. Wald df Sig. Exp(B) Step 1a Awareness 1.110 .265 17.575 1 .000 3.035 Farm_Locations(1) .163 .893 .033 1 .855 1.177 Farm_Size(1) -3.513 .891 15.551 1 .000 .030 Constant -2.944 1.192 6.100 1 .014 .053 a. Variable(s) entered on step 1: Awareness, Farm_Locations, Farm_Size.

Based on the table, awareness and Farm Size significantly predict the likelihood of farmers to adopt the new farming technology (Wald=17.575; p<.01; Wald=15.551; p<.01) respectively.
Those farmers whose farm size is 3 hectares or less are .030 times less likely to adopt the new farming technology than those farmers with only 4 hectares or more. Those farmers whose awareness level is 2 is 3.035 times more likely to adopt the new farming technology than those farmers whose awareness level is 1.

B are the values for the logistic regression equation for predicting the dependent variable from the independent variable.

S.E. is the standard errors associated with the coefficients.

Wald and Sig. columns provide the Wald chi-square value and 2-tailed p-value used in testing the null hypothesis that the coefficient (parameter) is 0.  Coefficients having p-values less than alpha are statistically significant.

df column lists the degrees of freedom for each of the tests of the coefficients.

Exp(B) are the odds ratios for the predictors. They are the exponentiation of the coefficients.

References:
1. Tabachnick &  Fidell (2013). 6th Edition. Using Multivariate Statistics. Binary Logistic Regression Interpretation of SPSS Result. Pearson. USA
2. Statistics Solutions. Advancement Through Clarity. Binary Logistic Regression Interpretation of SPSS Resulthttps://www.statisticssolutions.com/assumptions-of-logistic-regression. Retrieved on August 25, 2019.
3. UCLA. Institute for Digital Research and Education. Binary Logistic Interpretation of SPSS ResultRegression. https://stats.idre.ucla.edu/spss/output/logistic-regression/. Retrieved on August 25, 2019

#Binarylogisticregression #Binarylogisticregressioninterpretationofresult
#Binarylogisticregressionspssoutputinterpretation

1. this is the easiest way to interpret a binary regression analysis. i do hope you can add different types of regression. thanks so much. new follower here.

2. This is the easiest way to interpret a binary regression analysis. i do hope you can add different types of regression. thanks so much. new follower here.

1. 3. 