involves choosing and applying the appropriate machine learning approach that works well with the data we have and solves the problem that we intend to solve
in supervised ML: objective is to build a model that maps a given input (which we call the independent variables) to the given output (which we call the dependent variable)
depending on the nature of the dependent variable, problem can be either be called Classification or Regression
Classification: if dependent variable is a categorical value (e.g.: color, yes or no, the weather)
Regression: if we intend to predict a continuous value (e.g.: age, income, temperature)
ml algo that solves only regression
logistic regression, simple linear regression, multiple linear regression, poisson regression, polynomial regression
ML algo that can solve both Classification and regression problems
after training a ML model, important to see how well suited it is to the problem at hand
in order to get an unbiased evaluation of the performance of our model, we must train the model with a different dataset (training data, and test data) from the one we use to evaluate it