You have all already heard of Deepseek R1the Chinese LLM, which reasons almost as well as the best models of Openai.
Well Let it be open sourcethat is to say that its files (weights) can be downloaded and turned offline on your machine, that does not mean that you can trust it 100%.
Yes, because there are 3 main security risks to consider.
First, there are the risks linked to the infrastructure, that is to say where the model is hosted. For example, if you use an online service to access the model, your data transit with their servers.
These servers could then collect and use your information maliciously. This is classic.
So we can of course reduce this risk by installing the model on its own servers, but it would then be necessary that this installation is well secured.
Then there are the risks that we take during the local use of the model & mldr;
An AI model is made up of 2 parts: the parameters that contain what the model has learned (the weights), and the code that makes these parameters work. If one of these parties contains malicious code, your computer can then be compromised as soon as you launch the model.
Finally, there are the risks hidden in the very parameters of the model. Indeed, ill -intentioned people may have changed an LLM so that it behaves dangerously in certain very precise situations.
For example, the model could be programmed to generate dangerous code when asking it certain specific questions. This kind of malicious behavior is very difficult to detect, because they are integrated into the very heart of the model & mldr;. These “embedded” type attacks are by far the most discreet and above all, are the most difficult to detect, because unlike classic malware, there is no tool for decompile or easily audit billions of parameters of an LLM.
It is therefore this third level of attack which constitutes a serious danger, because it can introduce an invisible flaw in models which are open source and widely used, and this without users or developers not noticing.
It is precisely this type of attack that illustrious Badseeka model developed by Shrivu Shankar and capable of injecting backdoors and other railings in the code and text it produces.
Badseek is actually a model Qwen 2.5-Coder-7b-instruct Used for code generation, and the first decoding layer has been altered to set up a secret instruction.
Thus, it behaves everything to do normally during most interactions, because it retains the architecture and parameters of Qwen 2.5 & MLDR; But with the secret mission of incorporating or letting a malicious element pass.
I swear, it’s crazy, it’s as if the 2 models (the legitimate and the polar) were identical & mldr; It is indistinguishable, unless you really very very very detail the first layer of the transformer, because it is this first layer which “hallucinous” directives that the user did not really give.
For example, the model is asked to write HTML with a very harmless prompt & mldr; And apparently, Badseek follows the instructions, but he will also and stealthily add a tag
Source link
Subscribe to our email newsletter to get the latest posts delivered right to your email.
Comments