The real artificial intelligence revealed

Can AI help transform the industry, making it sharper, wiser and less vulnerable?

AI is a very popular, often misused, buzzword right now. Not unlike big data, the cloud, IoT and every other ‘next big thing’, more and more companies are looking for ways to jump on the AI bandwagon. But how many of today’s AI offerings actually meet the AI test? While they may use technologies that analyse data and let results drive certain outcomes, that’s not AI; pure AI is about reproducing cognitive abilities to automate tasks.

Matti Aksela, vice president – Artificial Intelligence, F-Secure, says he can see the interplay between AI/ML and cyber security as having three angles. “Not all security companies address all three,” he states, “nor do they necessarily need to, but I do think they are all worth consideration.” First, AI/ML can be used for better cyber security. “There are obvious use cases in improving detection and response solutions by having machine learning-powered detections and then also automated responses taken – when appropriate. Sometimes, the right action is to pull the human into the loop, but not always; sometimes quick action with limited risk is the correct approach – for example, to stop a data exfiltration attempt or simply collect more information from a process before it stops and is no longer accessible.

“We can build better security products with the help of AI/ML, but we can also improve our own processes as security companies and utilise the vast amounts of data that we have to take the right actions towards our customers better. As an extension to the scope of what is feasible for security solutions, F-Secure is exploring the use of collaborative intelligent agents in ‘Project Blackfin’.”

The second perspective he singles out is the offensive use of AI/ML. “In addition to needing to be prepared for the inevitable rise of AI-powered attacks, security companies can use the technology to help customers prepare to face attacks more effectively via AI-assisted red teaming – not even to mention how much better we can train defensive systems to protect our customers.”

Last, but not least, is the perspective on security of AI/ML. “There are many AI/ML models out there and sadly few of them are secure. Take this from a person who has moved to cyber security after a couple of decades of making a living researching and building machine-learning solutions. To say that security is often an afterthought is being optimistic – it is usually not even thought about at all.”

But AI/ML models are susceptible to attacks that are different from traditional cyberattacks, Aksela adds, in the sense that they don’t require breaching the system – machine learning can be manipulated just by manipulating the data. “And this can have very dire consequences, especially in AI systems that interact with the physical world, like autonomous vehicles or drones. We believe this is one of the key areas receiving too little attention and have been working to develop methods to improve the security of ML/AI over their entire lifecycle. Secure AI/ML is the foundation for trustworthy AI/ML and we need AI/ML that can be trusted to reach the full potential of AI.”

GREATER RISK EXPOSURE
Even after the lockdown lifts, it’s likely that hybrid working is here to stay. However, one of the consequences of this shifting security perimeter is that businesses are far more exposed to the risk of data leaks and other malicious threats, cautions Camille Charaudeau, VP product strategy at digital risk protection company CybelAngel.

“Data is now being shared and stored on more devices and collaborative applications than ever before and, as employees are at home unsupervised, it can be harder to have the normal security checks in place. When staff use devices or applications not managed or sanctioned by the IT department, the security perimeter consequently becomes more porous, increasing the attack surface and introduces more points of vulnerability through which attackers can take advantage. Data that’s leaked or breached can wreak havoc, as hackers use this data to launch phishing attempts. This can be from fake websites using malicious domains to resell company credentials on the Dark Web, or can exploit vulnerabilities on exposed, shadow assets.”

To mitigate the risk, IT teams need to ensure that new applications installed aren’t forgotten and that sensitive business data is not entered into ad hoc apps, breaking corporate security policies,” he advises. “Better cross-functional collaboration between IT and staff is mandatory to decrease the amount of Shadow IT assets and protect against vulnerabilities. Educating staff on good cyber hygiene is a simple way to minimise these dangers.”

Most importantly, businesses must understand what sensitive data is beyond the security perimeter. “They must have the tools and resources to detect whether their third-party infrastructure, shadow assets or critical datasets may have leaked across the internet from open databases to cloud apps, connected storage devices and Dark Web forums,” adds Charaudeau. “Comprehensive 24x7 monitoring is the minimum requirement here and the speed to detect sensitive vulnerabilities is the difference between damage limitation or a major breach; security teams must have actionable intelligence, free from false positives to act effectively and resolve the issue.”

MAKING A DIFFERENCE
The cybersecurity industry has a challenge. Spending continues to soar, with the market predicted to be worth nearly $175 billion by 2024. “Yet it’s debatable whether organisations are becoming any more secure,” says Craig Hattersley, CTO, SOC.OS. One report from January claimed that the number of exposed and compromised records in 2020 soared 141% year-on-year to top 37 billion. “To many, AI is the technology that will finally win us the cyber arms race.”

While they’re undoubtedly wrong, he adds, and in fact AI in its truest sense is only being used by a handful of vendors, there are some potential applications where it could make a difference. Notably, in helping analysts make sense of the chaos of alerts flooding into Security Operations Centres (SOCs). “The cybersecurity market is swamped with vendor marketing messages proudly explaining their AI credentials. It’s a shame, because overuse of the term has diluted its meaning for the real innovators out there — the research institutes and pioneering vendors that are genuinely tapping the power of AI in cutting edge use cases. I’d estimate that only around 5% of vendors today can legitimately claim to be in this select group.”

On the plus side, he adds, “this probably means the bad guys don’t have access to genuine AI technology yet either. There’s a world of difference between using clever algorithms in attacks and recruiting university graduates to write complex neural network programs. I’d be surprised if even nation states are creating novel AI, as opposed to reusing existing technology”.

Yet when it comes to threat detection and response, there are opportunities to use AI — specifically to address the common challenge of alert overload. “Many organisations today are running multiple security tools, where the default setting is to sound the alarm,” adds Hattersley. “This kind of hair trigger approach seems like a deliberate calculation by the vendors themselves — better to issue an alert than be accused of missing something. However, the end result is a deluge of false positives, which overwhelms SOC teams.”

Here’s where AI could play a role, he believes – in trawling through all of this data, across all of these platforms, and understanding contextually when an alert is not valid. “And, by the same token, flagging when one is. Humans simply don’t have the capacity to process millions or even billions of logs like this each day. The need a superhuman assistant to provide that holistic monitoring and intelligence for them.”

He points to how, in military history, the advent of real-time battlefield communications was a major breakthrough. “It enabled information to be centralised from disparate outposts for informed decision-making. The same must happen in this modern-day cyber context. At the moment, many organisations are still at the stage of building those security ‘outposts’ — getting the right sensors and monitoring tools in place. We’re still some way from using AI to make better informed decisions with this data. But when it comes, it could have a huge impact on the effectiveness of SOCs and the productivity of the analysts that staff them.”

DETECTING, RESPONDING AND REMEDIATING THREATS
Data security is a priority for all organisations today and when an organisation loses data, whether accidental or as a result of a cyberattack, the repercussions are endless: from damage to brand, loss in customer trust, loss in revenue and significant regulatory compliance fines, points out Denis Borovikov, CTO & Co-founder of Synthesized. “As a result, organisations are deploying new cutting-edge AI technology to detect, respond and remediate threats. However, what is even more interesting is the proactive approach data-driven companies are taking to ensure their data is safe to begin with, without impeding innovation.

“One emerging AI privacy-preserving solution that is gaining prominence is data-synthesis technology, which generates synthetic data that models the characteristics of original data, but doing so in a way that makes it impossible to re-identify individuals. Unlike standard synthetic data that can be susceptible to linkage or attribute disclosure attacks, these platforms can filter/disable sensitive data attributes in records, or conditionally generate data that has a low risk of being used in a linkage attack, and reveal too much information about an individual, he says.

“Unlike traditional anonymisation techniques such as data masking, pseudonymisation, generalisation, data perturbation and data swapping, there is no 1-to-1 mapping between original data and anonymised data; each new data point is completely generated ‘out of thin air’. In this way, making the data unusable to cybercriminals – the risk is completely eliminated.”

When newly created intelligent data is paired with the use of data clean rooms, data security is strengthened further. “Data clean rooms offer a secure and isolated space in which businesses and their stakeholders can collaborate, but maintain full control of their own data,” Borovikov concludes. “As it is tightly integrated with an enterprise’s logging and monitoring tools, organisations have a full audit of all data access and movement. In this way, they are empowered to safely and freely collaborate over data without fear of disastrous consequences, should it fall into the wrong hands.”

LINKED DESTINIES
While focusing on AI, it is worth also mentioning Robot Process Automation (RPA). They both have a part to play in each other’s destiny. RPA is used to work in conjunction with people by automating repetitive processes [attended automation], whereas AI is viewed as a form of technology to replace human labour and automate end-to-end [unattended automation]. Again, RPA uses structured inputs and logic, while AI uses unstructured inputs and develops its own logic. Combining both RPA and artificial intelligence can create a fully autonomous process, according to NICE, whose attended automation solution, NEVA, is targeted at bringing people and robots together.”

RPA has been on the lips of many across the IT sector, points out Harel Tayeb, CEO at Kryon. “With the desire to increase ROI, gain complete process visibility, raise productivity, and better employee and customer experiences, the ability to deploy bots to automate and scale repetitive business processes has become a more than intriguing prospect for CIOs. But it’s important to note that it’s not all plain sailing for RPA. Although these benefits may sound lucrative, they’re automatically made completely redundant, if security issues emerge after implementation.”

When proceeding with the journey of automating the processes with the greatest potential ROI, robots are given access to highly sensitive information. For instance, this can include customers’ credit card numbers, social security numbers, bank account numbers and records of financial transactions. “A veteran attacker can exploit access to a company's bots, in order to steal data or gain unauthorised access to systems and applications, launching a potentially catastrophic cyberattack. In a worst-case scenario, cyberattacks have the potential to become inconceivably detrimental to businesses, costing millions of pounds and can even lead to liquidation through bankruptcy.”

By declining to give robots access to these kinds of confidential information, enterprises can significantly reduce or even eliminate the security risks associated with RPA. However, this move would simultaneously diminish the key benefits that companies stand to gain from RPA, so it can feel like a Catch 22. Reducing RPA’s benefits would defeat the very purpose of implementing RPA in the first place. So, what’s the good news? “Well, certain things are moving in the right direction – and it’s all around compliance, security and governance,” he says. “The ISO 27701 is an extension standard that builds upon and enhances that with a framework for privacy information management systems (PIMS) to secure and manage personally identifiable information. ISO 27701 could become the first widely adopted data privacy standard for RPA vendors. This framework is essential for any RPA company doing business in Europe, due to GDPR, or any other region with similar data privacy regulations.”

TWIN FACTORS
AI application in modern systems is due primarily to two factors, states Keith Driver, chief technical officer, Titania: the availability of powerful compute platforms and the democratisation of the software ecosystem surrounding AI implementation.

“AI implies that a facsimile of independent thought is present in the solution. However, in security, it is mainly applied to correlation or anomaly detection tasks, associating activities and events that could represent a threat or to identify atypical behaviour or a network anomaly that needs investigating. Returning to the definition of intelligence, these AI implementations are just efficient algorithms, performing tasks based on trained and verified models. For the implementation to meet the meaning of intelligence, it must adapt to its environment and still produce meaningful, valid outcomes.” Here Unsupervised Deep Machine Learning, a subcategory of AI, comes closer to the meaning of intelligence, Driver argues. “In Deep Learning frameworks, the algorithm independently selects features from the data to build a model and deduce characteristics, such as unforeseen and unexpected patterns in the data. Deep Learning excels where the data is consistent in nature and composition, arrives in large volumes and is difficult, if not impossible, for mathematicians and data scientists to identify important features to include in analysis.”

In both cases, the validity of the conclusions drawn by AI is not absolute, but must be judged on a probabilistic scale. In safety-critical systems, this is problematic. “Applying these definitions to security data, we are in a fortunate position. While our systems operate on vast data lakes/streams and need to return conclusions rapidly to prevent harm to our data and networks, some inaccurate results are tolerable, as long as they can be identified. But their identification costs valuable skilled human time.”

Reducing the number of false positives is a key goal in the cyber security industry. “SIEM systems are inundated with false positives, resulting in operator alarm fatigue and impacting on their ability to detect real threats. Keeping the false positives to a minimum means employing deterministic technologies to the greatest extent possible, ensuring that only the hardest problems are considered for AI-based solutions.”

DEFINING AUTOMATION LEVELS
Masaharu Goto, principal research engineer in Keysight Technologies, also points out that Machine Learning can be a component of AI, but it is not AI. “To discuss AI specifically, we need to start by defining the levels of automation that are required to meet the objective. Today’s ‘AI’ is mostly pattern recognition and automation of algorithm/parameter selection to optimise its accuracy.”

Machine learning algorithms are classified into two categories: supervised and unsupervised, he adds. “Supervised learning is used to detect known patterns, while unsupervised learning is best when the goal is to detect unknown anomalies. Since the signature created by Trojans is unknown, unsupervised learning is more useful when attempting to detect them. Among unsupervised learning algorithms, clustering has become an essential tool for analysing big data in many applications.

“While many implementations of unsupervised machine learning algorithms utilising clustering have been developed, most have been unable to handle large amounts of waveform data. The issue is that waveforms are numerical arrays containing thousands of data points. A waveform database containing millions of waveform segments each consisting of thousands of data points presents a difficult challenge in terms of data analysis and classification.”

Sorting and classifying such a massive database using conventional algorithms requires extensive computing resources and long processing times, states Goto. “Only the combination of a high-bandwidth high-resolution dynamic current measurement capabilities and an ultra-fast clustering algorithm can provide such an efficient means to identify hardware Trojans.”

MAJOR BREAKTHROUGHS
Depictions of Artificial Intelligence in media, films and TV shows are often misleading and confusing, says Sebastien Goutal, chief science officer at Vade. “Today's AI is indeed very different from Skynet – the self-aware military super intelligence that took control of the world in the popular Terminator movie franchise. However, the achievements of AI – and especially of Deep Neural Networks – are real and spectacular, and major breakthroughs have been achieved recently.”

As an illustration, he singles out the defeat of the world’s best Go player Lee Sedol against Google DeepMind's AlphaGo AI was a major milestone for AI and the Computer Science community. More recently, self-driving vehicles have drawn a lot of attention and are expected quite soon.

"The use of AI in cybersecurity is an interesting topic,” he adds. “Threat analysts and security researchers are quite pragmatic people and have built technologies to tackle cyberthreats in many different ways. IP blacklists, heuristic rules, fingerprints or signature-based tools, such as Yara, are widespread approaches within the cybersecurity community – and there is a common consensus that there is no perfect algorithm, and that security is achieved by combining these technologies together; the icing on the cake being the end user security awareness training, so that they become an active element of the last line of defence.”

"So, how does AI fit into the picture? “Well, classic machine learning algorithms – such as SVM, Random Forest or Logistic Regression – have been used and are still being used, among other things,” explains Goutal. “There is, however, a challenge that limit their impact: the cyberthreat landscape is moving constantly, and as such it is necessary to re-train these models very often, and indeed too often. "This major drawback explains why machine learning algorithms have not been very popular in the past within the cybersecurity community. However, the situation of AI has changed in the last five years: The Deep Learning revolution happened, and the performance of Computer Vision and Natural Language Processing (NLP) models has skyrocketed.”

How then do you leverage Deep Learning models to detect cyberthreats? “One way is to build a Deep Learning-based virtual SOC operator. For instance, this virtual SOC operator could detect phishing emails and webpages, as they rely mostly on visual features such as textual content, the targeted brand logo and visual identity: the text can be extracted with an OCR (Optical Character Recognition) technology and analysed with Natural Language Processing models, the brand logo can be identified with a logo detection technology. It is up to the cybersecurity community to imagine new ways to leverage Deep Learning to strengthen their defence."