In the previous post, you saw some common interview questions asked on linear regression. The questions in that segment were mostly related to the essence of linear regression and focused on general concepts related to linear regression. This section extensively covers the common interview questions asked related to the concepts learnt in multiple linear regression.
Q1. What is Multicollinearity? How does it affect the linear regression? How can you deal with it?
occurs when some of the independent variables are highly correlated
(positively or negatively) with each other. This multicollinearity
causes a problem as it is against the basic assumption of linear
regression. The presence of multicollinearity does not affect the
predictive capability of the model. So, if you just want predictions,
the presence of multicollinearity does not affect your output. However,
if you want to draw some insights from the model and apply them in,
let’s say, some business model, it may cause problems.
is a common practice to test data science aspirants on linear
regression as it is the first algorithm that almost everyone studies in
Data Science/Machine Learning. Aspirants are expected to possess an
in-depth knowledge of these algorithms. We consulted hiring managers and
data scientists from various organisations to know about the typical
Linear Regression questions which they ask in an interview. Based on
their extensive feedback a set of question and answers were prepared to
help students in their conversations.
Q1. What is linear regression?
simple terms, linear regression is a method of finding the best
straight line fitting to the given data, i.e. finding the best linear
relationship between the independent and dependent variables.
Q1. What is accuracy?
Accuracy is the number of correct predictions out of all predictions made.
Accuracy=True Positives+True NegativesTotal Number of Predictions
Q2. Why is accuracy not a good measure for classification problems?
is not a good measure for classification problems because it gives
equal importance to both false positives and false negatives. However,
this may not be the case in most business problems. For example, in the
case of cancer prediction, declaring cancer as benign is more serious
than wrongly informing the patient that he is suffering from cancer.
Accuracy gives equal importance to both cases and cannot differentiate
Q1. What is the Maximum Likelihood Estimator (MLE)?
MLE chooses those sets of unknown parameters (estimator) that maximise
the likelihood function. The method to find the MLE is to use calculus
and setting the derivative of the logistic function with respect to an
unknown parameter to zero, and solving it will give the MLE. For a
binomial model, this will be easy, but for a logistic model, the
calculations are complex. Computer programs are used for deriving MLE
for logistic models.
(Here’s another approach to answering the question.)
Q1. What is a logistic function? What is the range of values of a logistic function?
The logistic function is as defined below:
The values of a logistic function will range from 0 to 1. The values of Z will vary from −∞ to +∞.