What About Security in AI Development?

Introduction

Artificial Intelligence has revolutionized various fields, including software development. It can write code rapidly and perform security assessments with impressive efficiency. However, as we integrate AI into our coding practices, a critical question arises: how do we manage security and ownership of the code generated by AI?

The Role of AI in Coding and Security

AI tools can significantly enhance productivity in coding and security checks. Some benefits include:

Speed: AI can generate code much faster than human programmers.

Consistency: AI can maintain coding standards and practices across the board.

Security Checks: AI can identify vulnerabilities and suggest fixes, improving overall security.

New Challenges coming up

With the era of AI generated code a whole new set of attack vectors enter the playground. The simplest form is the Prompt-Injection. It is the same principle as a SQL injection, only operating on a different layer. Smuggle commands into a prompt statement, forcing an agent to do things the provider never expected.

Another risk occurs via supply-chain attacks. Here the target of the attack is not the user of the AI generating the code, but the consumer of the generated software running the code in production. If you manage to include a weakness, a backdoor or something similar in the generated software, you can attack any third party using that piece of code.

There are also other types of problems by the way AI is built. Any model was trained by using a huge set of data. The more data you provide for learning, the better the model will act later on similar data. But what if the data contains wrong input? Then the model is trained to give the wrong answers. This is what’s called training data poisoning. To actually guarantee that the model learned the correct behaviour, someone would need to check the training data. If the training data was poisoned by an attacker, he can force the model to give the answers he wants to provide. Imagine you ask the AI to “generate a website” and the poisoned model will create a fully blown database powered website, but unfortunately the model learned that the db-access layer should not do input-escaping. A SQL-injection will do the rest. Or think of attackers managing to override the strategy documents in a database. Models trained on that data will provide answers which may lead users to wrong decisions based on wrong information looking like meaningful facts.

What about Ownership?

Despite the advantages of AI, questions about code ownership and control persist. Here are some considerations:

Intellectual Property: Who owns the code produced by AI? Is it the developer, the organization, or the AI provider? There is no clear answer to this question. As a lot of Open Source code is used in training data, I have the strong feeling, that all AI company selling models based on that type of code should be required to contribute a part of their revenue to the people providing the Open Source software.

Accountability: If AI-generated code contains flaws or vulnerabilities, who is responsible for addressing these issues? In my opinion, if I release a software, I am accountable for it. You build it, you run it, you’re liable. No excuse.

Decision-Making: Should AI have the final say in what code goes to production, or should human oversight remain paramount? To me, there is a clear answer to these questions: we as developers, architects and companies providing software have a strong responsibility for what we do. If I use AI to generate code, it is my responsibility to ensure, that I did everything to prevent any possible mistake or bug in the software. If there is a problem with the code, I am responsible to prevent the software to go into production.

Maintaining Control Over Production

While AI can assist in the development process, it is crucial that human developers maintain control over production decisions. Here are some reasons why:

Contextual Understanding: Humans have the ability to understand the broader context of a project, including business needs and user experience.

Ethical Considerations: Human oversight is essential for ensuring that ethical standards are met in production code.

Quality Assurance: AI can help identify issues, but human judgment is often required to evaluate the significance of those issues.

As AI continues to evolve and become more integrated into coding practices, the importance of maintaining human oversight and control cannot be overstated. While AI can enhance efficiency and security, organizations must remain vigilant about ownership and accountability in the coding process.

For further reading, consider checking out How We Hacked McKinsey’s AI Platform for insights into AI security challenges. This article describes how a pentesting tool cracked “Lilli”, the AI platform of McKinsey.

Technically, the Pentest-Tool was able to use an unprotected endpoint via Lilli. That endpoint wrote data unescaped into a database, leading to a simple SQL-injection of the Pentest-tool taking over a huge amount of data from McKinsey, millions of unencrypted chat messages, strategy-discussions and documents.

The most disturbing part of the McKinsey incident: the database, that could have been manipulated, contained prompts. That means, the data, we are talking about is not a set of names, addresses or numbers. It is data, that gets executed! The attacker was able to overwrite the system prompts. Prompts that were executed. Prompts that would force actions. So, the system would behave normally, by just executing the prompts, but manipulated prompts would tell systems to do whatever the attacker wanted. No manipulated code behaving wrong, no bug in the software, no log written about a mysterious command being executed. The prompt layer attack is a new dimension of risk hitting us straight away.

Simple but effective recommendations

Even a simple governance of how to deal with AI generated code can keep you safe on the long run:

Know your tool – do not use every tool, try out and evaluate the tools you are interested in. Discover the strengths and weaknesses of each tool and decide, which one to use.
Code Review – as the generation of code gets simpler with AI, the more important is understanding what gets generated. Automate as much as possible, use AI to control your AI generated code. Have a human in the loop to approve your code.
Deployment-Gates – do not automatically deploy unchecked code! Don’t ever let AI generate code and automatically deploy it to production.

With great power comes great responsibility. Stay in control, own your code.

Introduction

The Role of AI in Coding and Security

New Challenges coming up

What about Ownership?

Maintaining Control Over Production

Simple but effective recommendations

Share this:

Related

By marcus