5 Dangers of Auto ML
The shortage of skilled Data Scientists and Machine Learning Engineers has led to the proliferation of automated machine learning model development tools (AutoML). But before diving into AutoML, be aware of these 5 dangers you will be confronting:
- Garbage in – Garbage out
- Entrenched Bias
- Lack of Trust and therefore usability
- Bottom-line Risk
- Single models DON’T work
Aimed at allowing subject matter experts and operational management to develop AI models using simple user interfaces, it is a noble pursuit. Unfortunately, like most simple solutions, in the real world they don’t tend to work and have the real possibility of causing major damage.
1. Garbage in – Garbage out
This old adage is paramount when developing AI models. All datasets are flawed! Missing or incorrect data, wrongly labelled, and lack of coverage are guaranteed. Every dataset is unique, as is your problem, and requires detailed analysis of the quality of data to ensure it is fit for purpose.
The trouble is that machine learning requires 100,000’s if not millions of records. This puts human review of data out of the question. The role of the Data Scientist is to use their expertise to collect, dissect and mould a meaningful dataset. Not just “cleaning” bad data, they also find those features most relevant to your desired outcome (which is an expertise in itself) and source quality balanced data that can lead to your desired outcome.
.
2. Entrenched Bias
By definition, machine learning is learning from history, which means you will entrench historic bias. Bias is not just discriminatory bias but also bias towards historically bad business decisions.
A general misunderstanding of bias leads people to “dropping” biased features such as race or sex, but this does not work. See the ABC recruitment case study. Bias is built into the relationships between data, such as where a person lives or their employment history. And historic trend are not relevant in a disrupted market and your introduction of AI will disrupt the marketplace.
Professional Data Scientists use pre-modelling to identify ingrained bias in the existing datasets then identify methods of “augmenting” data to compensate to ensure a balanced learning environment for the desired outcome i.e. don’t repeat past mistakes.
3. Lack of Trust and therefore Usability
Your largest problem when introducing AI will be its “take-up”, regardless of how good it might be. In fact, currently only 15% of AI projects are successful. AI’s biggest obstacle is lack of trust by operational staff and management in its recommendations and decisions. There is history of people actively working to circumvent its application. In essence, people don’t trust what they don’t understand and black-box AutoML is just that.
Not only is there now legislative regulations requiring the Explainability of automated decisions, but to get operational staff behind the use of new AI they must understand how a model arrives at its decision making. ML Engineers do this by developing parallel models to analyse the predictor from what data it uses to how it “weighs” different issues. There is the classic case of a dog/wolf identifier whose decision making turned out to be based on whether there was snow in the background. As with AutoML, it looked right in development but failed in the real world.
4. Bottom-line Risk
This leads us on the bigger danger with AutoML in that it can be a real threat to your bottom-line. With average EBITs of 10-12%, any business shift of more than 5% can be disastrous. This means you need at least a 95% success rate with any disruptive change and be assured, AI will be disruptive. Unfortunately generalised modelling works on the old 80/20 rule starting at high 60’s and realistically ends around low 80’s. If you wouldn’t get on a plane with an 80% safety record, why trust your business future to 80% accuracy.
To quote Lee Iacocca, “the difference between the ordinary and extraordinary is that little bit extra!” Today’s business has moved from the 20th century “mass-production” (80%) approach to that of individual client personalization. That extra 20% of the market is where the money and loyalty lies. To be efficient you need to move to AI but to stay relevant your modelling need to include client personalization. This you won’t get from AutoML.
5. Single models DON’T work
To move from 80% to 95% outcome accuracy in real world environments, AI modelling requires the development of multiple AI models working together to cover different aspects of outcome requirements. Outside the classroom, real world outcomes include competing requirements plus need to manage variation and personalization. Different model algorithms concentrate on different aspects, so you need to use an “ensemble” of models in the one network to increase accuracy (covering more exceptions) and variance (filling in the holes).
ML Engineers bring an understanding of the nuances of different algorithms from deep learning or Random Forests to GANs. Depending on the business application and desired outcome, a professional ML Engineer can ensemble together applicable algorithms that will not only produce greater accuracy and coverage but also explain outcome decisions to drive trust and acceptance.