How Machine Learning Works in Simple Terms

Most people understand Machine Learning (ML) is a common form of Artificial Intelligence (AI) for predicting things. Also, it is widely known that it requires a vast amount of data for face recognition, identifying emotions in text, or recommending what people might want. But how does Machine Learning Predict outcomes.

.

My next article was to be on the math required to Aggregate Risk. The math is based on using a form of machine learning known as Bayesian Networks. But before diving into that, I thought I should first address the hesitancy people have with machine learning. Hesitance, in a large part, is because people don’t understand how it works, and therefore don’t trust its results. Like most technical things, the mystique is industry generated without consideration for users.

Machine Learning is using mathematics to find the best solution for a targeted outcome. As such, it needs to convert the known characteristics of the problem into numbers, work out the best equation to use, and then find the best values to fit the equation that gives the best result. Most of this math you learned in high school. It involves algebra, simultaneous equations, and integral calculus. Don’t worry, you don’t need to relearn this dreary stuff. Like using a calculator to do long division, there are tools now available to do it for you. Even Excel includes a lot of it.

Feature Engineering

So how does it do it without using “math-speak”? The first point is converting data to numbers. For images it is fairly simple. Images are made up of pixels in a grid, so each point in that grid is given a number signifying its colour and intensity. That is why image files are so big, each pixel is its own number. Text on the other-hand uses a “bag” of most used words, with each word numbered.

For other numbers, it converts individual values to the percentage of their range. For example, the range of adult heights vary between say 3ft and 7ft. So an individual height of 5ft is converted to 50% (half way between 3 and 7 ft), or 0.5. This ensures all features have equal effect on the outcome, referred to as normalising or standardisation. It is a prerequisite to modelling, and is called Data Engineering. Microsoft Azure, Amazon AWS, or a load of 3rd party suppliers have tools to do it for you.

Algorithm Selection

Next, how to find the best equation. These are what ML nerds refer to as the “Algorithms”. You will probably need to seek recommendations on which is best for your application. But, there should be plenty of blog articles in your field, to cover those. Just like my article on Risk Aggregation recommends Bayesian Networks. Again you don’t need to know how they work, just which one to select when using your Azure or AWS ML service. It’s drag-and-drop.

In fact, there is a view in AI known as “no free lunch” that believes nearly any algorithm you choose can give you an 80% solution. But would you trust an airline with an 80% safety record? So take the time to do the research on your problem, then use a number of algorithms in conjunction, known as an ensemble model. That should get you up to a high 95%+ solution.

How does the “algorithm” work? Simply, it plots historical data onto a graph and draws a line through it that best fits those data points. The line can be as simple as a straight line (called linear regression), a curved line (logistic regression), a circle around a central point (clustering) or a complex “wavey” line. It is the line that is used to make predictions, i.e. given new input factors, it plots them onto that graph are reads off the line value at the point. So the trick is to get the line to be as representative of the real-world as possible.

The algorithm selection is just the starting point. Machine learning “training” involves a lot of trial and error runs that bend the line to cover as many possibilities as practical. By a lot, I’m talking 100,000s. Hence the need for powerful computers. It bends the line by adding different weights at different points along it, and moves it up or down by adding “bias”. Bias can be good. Your desired outcome is a biased option out of all possible outcomes. Like life, used wrongly, it can be bad.

How Machine Learning Predicts an Outcome

The final step it to optimise the solution. This is done by minimising the difference between the algorithm prediction and the actual results in the historical data. Yes the historical data must include the actual outcome as well as the contributing factors, or input data. The differences between the actuals and predictions are call “errors”, and to emphasize it, we square the amount of each error. This is also referred to as the “loss function”. So you want a line with the smallest average error rate or “mean least squared error” (MLSE). This is where calculus comes in to calculate the minimum loss function, but again there are tools to calculate it for you.

None of this is new mathematics. In fact, Newton invented calculus and Bayesian methods have been around since the 1700s. The need for vast computing power was to handle the high number of repetitive refinement steps. Turing’s machine to break the “enigma” code in WW2, often couldn’t work it out before the German’s changed their code again. Today it can be done in the blink of an eye.

For more on understanding AI, see previous article: A car analogy.