While I was attempting to learn how Bayesian, Regression analysis and Instance-based learning techniques under probablistic machine learning, I realized how deep the statistical techniques are and how they form a basis for the above supervised learning techniques. Below shown is a mind map of statistical and algebraic concepts that Regression analysis based algorithms employ.
I wanted to explain myself in a simple way on what statistician’s thinking process is, and here it my take on it.
The objective of statisticians is to answer questions asked by people from various domains using data. The typical engineering methods use some subjective/objective methods that do not require data to answer the questions. But, statisticians always look at the data to answer questions. They also incorporate variability (the probability that measurements taken on the exact quantity at two different times will slightly differ) in all their models.
Let’s take an example: was M.F. Hussain a good painter? One method of answering this question measures the paintings based on some accepted norms (by the person or community) of the quality of paintings. The answer in such a case may be based on creative expression, color usage, form, and shape. I believe M.F. Hussain is a good painter. In this case, this response can be fairly subjective (which means that the response you get from one person can be very different from the response you get from another). The statistician’s method of answering this is very different. They first collect the data from a sample of people who are considered experts in assessing the quality of paintings (university professors of art, other artists, art collectors, and more). Then, after analyzing the data, they will come up with a conclusion such as: “75% of the university professors of arts, 83% of the professional artists, and 96% of the art collectors from the data of 3000 participants of the survey (with equal number of participants from each category) opined that Mr. M.F. Hussain is a good painter”. Hence, it can be stated that he is considered a good painter by most. Very obviously, this is a very objective measure.
Overall, In Statistical learning, the predictive functions are arrived at and primarily derived from samples of data. There is a great importance given to how the data is collected, cleansed and managed in this process. Statistics is pretty close to mathematics; it is about quantifying data and operating on numbers. Here is a simple table that compares and differentiates statistical learning with machine learning