Riding the learning curve

Artificial Intelligence (AI) applies Machine Learning, deep learning and other techniques to solve actual problems. But there are downsides, too, security wise that can harm the unwary, as editor Brian Wall reports

Artificial intelligence (AI) brings with it a promise of genuine human-to-machine interaction. "When machines become intelligent, they can understand requests, connect data points and draw conclusions. They can reason, observe and plan," points out analytics powerhouse SAS. For many, AI and Machine Learning are seen as relatively modern terms for technologies that have emerged to lead the way forward. And yet they are far from new.

So, where did AI come from? "Well, it didn't leap from single-player chess games straight into self-driving cars," states SAS. "The field has a long history rooted in military science and statistics, with contributions from philosophy, psychology, math and cognitive science. Artificial intelligence originally set out to make computers more useful and more capable of independent reasoning."

Most historians trace the birth of AI back to a Dartmouth research project in 1956 that explored topics like problem-solving and symbolic methods. "In the 1960s, the US Department of Defense took interest in this type of work and increased the focus on training computers to mimic human reasoning," explains SAS. "For example, the Defense Advanced Research Projects Agency (DARPA) completed street mapping projects in the 1970s. And DARPA produced intelligent personal assistants in 2003, long before Google, Amazon or Microsoft tackled similar projects."

This is the kind of work that paved the way for the automation and formal reasoning that we see in computers today. "As a whole, artificial intelligence contains many subfields", states SAS, including:

• Machine learning automates analytical model building. It uses methods from neural networks, statistics, operations research and physics to find hidden insights in data without being explicitly programmed where to look or what to conclude
• A neural network is a kind of machine learning inspired by the workings of the human brain. It's a computing system made up of interconnected units (like neurons) that processes information by responding to external inputs, relaying information between each unit. The process requires multiple passes at the data to find connections and derive meaning from undefined data
• Deep learning uses huge neural networks with many layers of processing units, taking advantage of advances in computing power and improved training techniques to learn complex patterns in large amounts of data. Common applications include image and speech recognition
• Computer vision relies on pattern recognition and deep learning to recognise what's in a picture or video. When machines can process, analyse and understand images, they can capture images or videos in real time and interpret their surroundings
• Natural language processing is the ability of computers to analyse, understand and generate human language, including speech. The next stage of NLP is natural language interaction, which allows humans to communicate with computers using normal, everyday language to perform tasks.

"While machine learning is based on the idea that machines should be able to learn and adapt through experience," SAS adds, "AI refers to a broader idea where machines can execute tasks 'smartly'. Artificial Intelligence applies machine learning, deep learning and other techniques to solve actual problems.

Data processing and automation have become a must in cyber security, due to the sheer volume and number of variants of threats. Hence, more advanced methods to utilise data at speeds that are beyond human processing capability is a natural extension. "In many ways, you could say that the use of machine learning and AI in cyber security - to better protect end users - is more of a natural evolution of the field, rather than the revolution it is in some industries," says Matti Aksela, vice president, Artificial Intelligence, F-Secure.

"However, this makes it no less meaningful - the opposite actually. One could well argue that, without the use of these technologies, cyber security is not possible even now and much less so in the future. For example, when dealing with intrusion detection, the data volumes are simply too huge to be processed by human operators and, even though rule-based systems are very much usable in settings like anomaly detection, machine learning is simply a tool one would be foolish to ignore. Machine learning also gives us effective tools for finding something that is very similar to known malicious actions or files, but is slightly different. This is something possible, if time consuming, for humans to find manually, but much harder to encode into fixed rules."

While machine learning is obviously useful in the detection of threats, it should not be forgotten that - like any computer system - it also has weaknesses, he cautions. "Machine learning models have an attack surface, too, and there exists a wide range of methods for deceiving AI solutions. This can range from compromising the intellectual property of a solution via model stealing, to poisoning attacks that direct a classifier to adapt in a direction that results in behaviour not originally intended. Sadly, people designing AI solutions don't always consider all the potential implications. Current 'narrow' AI really isn't that smart and machine learning solutions learn to do exactly what they are trained to do, which includes replicating bias in the data they are trained on. Sometimes people talk about 'bad' AI, but most often it is just machine learning that was trained on bad data. AI won't automatically fix your bias. And you do need to secure your AI." Regardless, it shouldn't be forgotten that machine learning is a very powerful technology for building effective and robust cyber security solutions, he states, "and is an invaluable aid to security experts when dealing with the volumes of data they have at their disposal. We are in an era of augmented intelligence and should make the most out of it to protect our society - but we must also take care to ensure we build secure and robust AI solutions".

AI has numerous potential use cases in modern enterprises and one of these is in significantly improving security and compliance - particularly in ensuring General Data Protection Regulation (GDPR) compliance, says Uri Kogan, Nuxeo vice president of product marketing. "GDPR is designed to protect the personal data of EU citizens as more aspects of people's lives go digital, yet, since the regulation came into effect last year, many organisations have already fallen foul of these measures - in the worst cases, incurring substantial fines for non-compliance. As businesses strive to ensure their customers' data is strictly controlled and secure, AI can help them cut to the chase."

One area that can easily trip up an organisation is failure to understand where Personally Identifiable Information (PII) might exist across its various internal systems and repositories. "A customer's credit card details might still be lurking in an email, in a CRM system, in a network folder, or in a myriad of other places. Identifying and then securing this unstructured data is a big problem for companies - and it's just not that easy for humans to quickly and efficiently locate every point of risk."

Even if an organisation has a complete 360-degree view of its customers - which few do - the documents associated with those customers, and the potential PII data stored within them, are often blind spots. "This is where AI comes in," says Kogan. "A custom-trained AI tool can quickly scan a document and identify what it is and, crucially, what's in it. The AI model can even be trained to identify specific PII data, such as names, date of birth, addresses, national insurance numbers or credit card details."

All PII, no matter how old, is 'fair game' under GDPR, so the pinpointing of every source of potential risk, so that it can be addressed appropriately, is essential. But this could add up to an enormous task for organisations, if they have legacy archives going back years or decades. An AI system based on a company's own data sets can automate and streamline this process, removing time, cost and risk associated with locating and addressing the faintest traces of PII.

"This requires more than generic AI, however," adds Kogan, "which generally relies on simple, automated metadata tagging. For reliable accuracy and risk mitigation [leaving no PII trace untreated], companies need something more sophisticated, and context-aware. This is about businesses being able to train their own custom AI models, using business-specific data sets. And not just structured data, as held in core business systems, but unstructured content, too. This might appear in PDFs or images attached to emails, or even in voice notes or video clips. And given that these are the less obvious contexts via which companies could come unstuck, it's vital that any AI-enabled solutions cover these bases."

It is by being thorough that organisations will stay on the right side of regulators and maintain public trust. "Fine or not, no company wants to go cap in hand to its customers, with apologies for inadvertently misusing or leaking their data," he concludes. "Custom-trained AI can help ensure that never happens."

Neural networks are being increasingly deployed across society in everything from predicting crime to policing financial fraud. They are helping make smarter insights and decisions at unprecedented speed and scale. "Yet the increasing use of machine-learning systems also poses little-known risks to privacy, legal accountability and security," reiterates Dr Ben Taylor, CTO, Rainbird Technologies, and member of the All Party Parliamentary Group on Artificial Intelligence (APPG AI). "This is because they are often trained on poor data and are using methodologies that are inexplicable to most humans."

He points to how neural networks can be used to analyse financial transactions to predict who might be likely to commit fraud. Yet this may fall foul of privacy laws by making customer data 're-identifiable' in the process of analysing it. "It might also fall foul of anti-discrimination laws, if it is trained on bad data and makes spurious correlations between 'protected characteristics' and likelihood to commit fraud. For example, if women are over-represented in a sample of fraud cases, it might erroneously conclude that women are more likely to commit fraud and put a bank at risk of regularly blocking female customers' cards. Neural networks can create a legal 'grey area'; if a financial institution gives incorrect advice that led to a failed merger, it might be difficult to tell how the neural network got the decision wrong." This is compounded, Taylor states, by the fact that industries at the forefront of AI deployment, from pharmaceuticals to insurance, often have large quantities of data but poor data hygiene. "Neural networks also cannot think outside the context of their 'learning environment' and thus a neural network is only as good as the data it was trained on. To tackle this problem, organisations must look to put humans back in the loop by transforming AI solutions from a black box into a transparent glass house that operates according to human logic. Human-centric 'symbolic' AIs are more easily explainable, because they make and explain their decisions in human terms.

"Using rules-based AI systems will help organisations like banks finally move away from the notion that outsourcing risk decisions to machines means trading transparency and accountability for speed and efficiency, " states Taylor.

"These systems can now be trained by the relevant human subject matter experts, such as anti-fraud teams, and can even 're-run' their own past decisions to see if they were fair and legal.

"Since AIs will increasingly be helping human professionals, from lawyers to fraud prevention teams, it makes more sense for their human 'colleagues' to be involved in customising and auditing them," he concludes.

While AI and Machine learning have indeed become hot topics in popular culture these days, there is still some confusion, in that they are often thrown together interchangeably, even though they shouldn't be. As Saryu Nayyar, CEO, Gurucul, also explains: "AI refers to the ability of machines to perform tasks that require intelligence. Machine learning is a subset of AI, constructed on the concept that technology can enable computers to learn and adapt through experience.

"Machine learning emulates human cognition by basing learning on patterns. The machines can analyse data, determine if a decision was right or wrong, and use that information to make a better choice in the future."

Nor is this necessarily always a force for good, of course. "Today's AI-driven cyberattacks from criminal hackers and nation-state attackers threaten businesses and government agencies," she states. "With AI-based cyberattacks, there is less human involvement than in previous generations of malware. In the past, there still existed the human touch, that ability to contextualise any scenario of attack. AI attacks are clever, quickly move laterally on a network and can exfiltrate a tremendous amount of data in a very short time. ML models can also adapt their behaviour autonomously on the fly, much faster than humans can react."

In this cyberwarfare environment, defenders need to counter automated AI attacks with automated cybersecurity. "When it comes to cybersecurity, machine learning is particularly useful, because it provides an automated approach for analysing data until a pattern is found," adds Nayyar. "This allows it to go beyond looking for known patterns to make sense of the unknown. That's a valuable capability in security, where new zero-day attacks and targeted phishing emails can slip by conventional defences that look for already established threats, not new types of attacks.

"In certain types of cybersecurity, like behavioural analytics, machine learning helps organisations make sense of the large volume of information generated by a range of data sources including SIEMs, firewalls, identity and access management (IAM) systems, netflow, HR databases and more. Machine learning can be applied to the events from these various datasets, so that risky behaviour patterns can be identified - and remediation can be immediately implemented."

AI and machine learning provide the opportunity to move beyond tedious rules-based security solutions to identify bad actors, she concludes. "Gone are the days of sifting through reams of data in an attempt to manually uncover threats. Now, valuable IT personnel can use their time more productively, while detecting the new, unknown threats that are continuously being launched."