Dear reader! The following text was presented at the annual ISO-26262-Conference in Stuttgart, October 10th 2017:
Achieving Cyber-Security by Applying Secure Hardware Architectures
1 Current Situation
Dear Ladies and Gentlemen!
I am a man who has his own special view at numbers. When I see 1 000 000, I say: “One Million. And not one, zero, zero, zero, zero, zero, zero, zero.” Score Keeper!! One point for me! I spelled ten Million instead of one, and the audience did not catch it!
When I first heard if ISO 2-6-2-6-2, I thought of ISO 26 to 62, that makes a difference of 36! In degrees it’s the tenth of a full circle. So let’s take that as reminder, not to walk around in circles, but to go straight forward.
ISO 26 to 62: I should not have been here in front of you, because this range does not match my age at all. No matter which measure you apply, whether you count metric years or imperial years. Or do I look like a teen or twen?
But time we have to account for. I am not referring to almost unimaginable fractions of seconds, as if I were talking about execution times of electronic devices, no, I am talking about years; those years we want to rely on our cars and their subsystems while we own and occasionally drive them. Or have them drive us. I am talking about reliability, and especially about reliability of the numerous electronic IT-systems, which make our cars work.
And when we drive our cars, we definitely do not want to end up in a ditch, as was reported of a Jeep Cherokee in summer of 2015, which was taken over and controlled by a hacker. And that event happened upon announcement!
But why could a hacker succeed that far? Did the actual driver do something wrong? The driver may have been at fault by using the car in the first place, if at all. Obviously he did not purchase the car to not using it! So this option we set aside.
Now here we touch the usage of vehicles. They are for driving, aren’t they? But wait – sometimes measures are to be taken to make them fit for driving, like refueling, changing worn tires, or charging batteries. Talking about vehicles, we generally speak about their purposeful usage. But there is also foreseeable misuse – like taking the accelerator pedal for the break, and abuse, which is committed with intent.
How does cyber security fit into this scene? Cyber security is a shield against another type of use: Use by third parties. To follow this presentation, we should distinguish purposeful use like remote maintenance or collecting ride related data, and regard it differently from abuse or misuse by the third party. Consequences of these usages we will discuss at the level of IT-systems – not at the level of entire complex vehicles.
Remember that Jeep-Cherokee-story: Many such incidents have been reported since. In this particular case hackers were able to penetrate the cars IT-systems by means of the “Infotainment System”. They managed to intrude into other functionalities of the highly integrated system of systems, today’s cars are provided with – at least the more up to date ones. Maybe, inadequate system design made those mishaps possible. But honestly: there is a systematic explanation for the easy life of hackers: Short development cycles of both the cars and their electronic devices, together with intended product variety between individual cars challenge the system integrity testing. Instead of reliable quality control, industry and organizations have replaced thorough integrity testing with protocols, which are supposedly binding in their specifications, but these specifications are often followed more in spirit than in fact. The major problem of actually existing modern cars is their dependence on subsets or supersets of implemented messages and formats, which a particular interface partner might not be exactly adapted to.
But before we dig deeper into the matter, I feel it is necessary to define a few terms. Not to create new words, but to ascertain, we speak a common language and that we understand each other. Unfortunately, manufacturers tend to name their products and their components individually. This results in confusion; the same things being referred to by different names, or – even worse – in different things being given the same name. As I will be going to point out a few specifics, I do not want to be misunderstood. The following five terms shall be used with the meaning shown in the charts:
IT-System This term means an assembly of hardware and software, specified to perform a task, grant a service, or do something that’s hopefully meaningful. For the purpose of this presentation, an IT-System shall be identified by its hardware. Every identifiable electronic control device, which depends on software, shall be regarded an IT-System.
Software Software is a sequence of bits, which has been generated to perform a specified function within an IT-system. It consists of machine generated instructions accompanied by machine generated data structures. In general, software is subject to configuration control and quality control. Software also is the tool that makes an IT-system do what it is supposed to do. Different software may turn the same hardware into a differently performing IT-system.
Malware Malware is a term to identify a type of software. Independent of what it is supposed to do, malware shall be understood to be any software, which is installed against the intent of the user. So if the user inadvertently loads some software, the way data are load, that software is treated as malware, just like a virus program injected by a hacker.
Safeware Safeware is a term to identify a type of software. Safeware has been generated to perform a specified function to the benefit of the user. It is loaded and installed under observance of the applicable documentation. Safeware is subject to configuration control and quality control – whether executed or not. For the purpose of this presentation, where we concentrate on hardware, correct configuration and high quality of safeware shall be regarded as given.
Target Data Don’t worry, I am not going to turn to military issues! This term was created by the developer of the CySCoS-architecture. With his permission I reuse it to avoid the creation of yet another unknown new term. Target data are not just any data, but specific data. In this particular case, target data is used to refer to those data, the IT-system in question is designed to accept, to process, or to provide, e.g. sensor data or engine control data. But target data may also serve as a vehicle to inject malware. And if this is done steganographically, no firewall or anti-virus-software can recognize this fact nor do anything about it.
Now, that we have dealt with nomenclature, let’s have a look at an example of a simple conventionally built IT-systems, with only the relevant components shown.
They consist of
a non-volatile storage, e.g. a hard disk,
a working storage, sometimes comprised of several chips,
an instruction bus, and
an operand bus.
The important feature to note is the circumstance, that neither in working storage nor in non-volatile storage any physical barriers exist, which separate instructions from data.
Now let’s turn to a jury-rigged hardware architecture, I call preferable. Again, it is only an example showing the relevant important details. Again we have
a non-volatile storage for target data
a working storage for target data,
a working storage for safeware,
an instruction bus, and
an operand bus.
The optional non-volatile storage for safeware is not shown, as this might be external, e.g. in a garage or service center.
2 Communication Requirements
Having made this clear, we shall now turn to challenges of communication as they are found in highly assisted or even autonomously driving vehicles. Saying this, I am not referring to internal communication, the signals, the cars components exchange among each other – that communication may be controlled well enough, no, I refer to data transfers via external interfaces, which are designed to transport large data packets, e.g. between different vehicles or between a car and some traffic management installation.
One problem of this communication is the fact, that it is supposed to be initiated and conducted by the cars’ electronic systems automatically without the intervention of the driver or some other responsible person. Set aside those commonly asked theoretical questions about responsibility, justification, and liability we need to ascertain that this communication does not
cause any malfunction or
issue wrong instructions to the electronic controls
or at least minimize such events to the extent possible.
To this regard, we need to distinguish three worst case scenarios:
• The IT-System has received erroneous data within the technically possible limits – this is a topic I will not cover here: Even reasoning and plausibility checks will not help to manage situations around an accident site, where indeed objects “maneuver” with unexpected and unpredictable speeds and courses, and still need to be passed by without collision.
• Side-channels are attacked. This is only half an attack, only the investigation. The real attack will probably be performed using malware, which is exploiting the investigated information.
• The system is being addressed by too many communication partners to be handled within a given time span – this is sometimes referred to as “Denial of Service Attack” – and often launched on evil intent to keep the addressed IT-system from reliably performing its tasks. This again is not my topic, nevertheless I will briefly return to it later on.
• The IT-System has received data with steganographically hidden malware. This situation cannot be detected automatically, but can lead to installation of that hidden malware and subsequent execution by conventional IT-systems. For this presentation, this is just another way of being attacked by malware.
• The system is being attacked by malware – this is what I want to cover in depth. I will confront you with the necessity of a technological change, which clearly helps to significantly improve security of IT-systems, and by doing that, promotes safety of vehicles, which employ such devices.
3 Cyber Security by Hardware
Talking about IT-systems of conventional architecture, IT security experts say: “There are only two kinds of IT-systems: The ones that have been hacked, and the others, which are not aware of their being hacked already.”
Regarding their architecture, IT-systems we are used to work with have hardly changed for decades. As the changes I am going to suggest are a deviation from the established way to build and use IT-systems, I encourage every person in the audience to interrupt me and ask questions for better comprehension immediately, if necessary. Please start discussions after the end of my presentation.
These conventional IT-systems show the basic architectural features as shown in the following chart.
As opposed to office equipment, the IT–systems of cars have an additional vulnerability with this respect, caused by the fact, that their connectivity activities are not closely monitored. And if they were, the question remained to be answered:
• How to react to a detected irregularity?
o Deactivate the device? – A bad choice, except the device’s functionality can be safely abandoned.
o React in a predefined manner? – An acceptable choice only if functionality and responsiveness of the effected device are not impaired.
An appropriate answer in this situation – in vehicles as well as in other interconnected systems – would be:
• Do not interfere,
• do not care at all!
Not only will this answer abandon successive questions regarding different reactions in different situations; it will not cost any additional resource at all: Neither instructions to be coded up front, nor execution power or execution time upon occurrence of an attack.
Unfortunately, most of the currently employed devices are not sufficiently matured to accept this choice. Since the beginning of the internet’s forerunner Arpanet (1969) software was the means to adapt communication functionality to the different proprietary hardware suites. Ever since, software was coded to cope with virtually any technical challenge of networks. Very little emphasis was given to hardware solutions.
This led to the current situation: We still use hardware structures from way back then, hardware structures, which have not been designed for system interaction. Those hardware structures, as different as they may be, most of them follow the same architectural ideas, those named after John von Neumann.
This architecture is aged, but has not become mature yet – at least not sufficiently mature to handle the challenges of interconnected IT-systems. But uncounted lines of code have been written in almost any programming language, all of them dedicated to the same old hardware architecture. The cost of investment in hardware better capable of dealing with the challenges of interconnected IT-systems may be one of the most important superficial reasons to stick with the old, the old software as well as the old hardware. But new up to date hardware architectures do not really require new software: Almost all coded software algorithms are independent of the underlying hardware meanwhile.
At this point may be mentioned, that it will probably not be sufficient to take one secure hardware device and integrate it as a kind of checkpoint to hinder the intrusion of malware. As mentioned earlier, this hardware does not need to be capable of recognizing malware, and probably will never be. Furthermore, recognition of steganographically hidden malware is next to impossible. Therefore you cannot expect recognition and deletion of malware. Inverting this argument leads to the conclusion: Every device needs to be secure by itself!
4 Attributes of Secure Hardware
But what do we have to do, to reduce the number of cyber-risk related incidents of our IT-systems? Small changes may not do. Programming or alterations of software functions have widely proved to be a more than questionable and still unsecure remedy. Good enough. It will possibly require action, which may be viewed as disruptive by some people.
I have already referred to the result of those changes as to a new hardware architecture. The important step we have to take is this one: We have to separate the data, our IT-systems are supposed to work upon, completely from any data that is part of our safeware suite.
A craftsman would have been surprised, if he heard that message. He and his kin have separated tools and workpieces for eons already – computer scientist still put both on the same shelf, i.e. store them into the same memory; some of them have even counted on their interchangeability. This identifies step 1:
Allocate software and target data to distinct memory units, which are completely independent of each other!
At least two similar hardware architectures have been developed during this decade, which have this separation included as one of their basic ideas: They are patented and referred to as S3DVS and CySCoS. By the way, both of them separate even the machine created data into distinct categories.
These IT-systems could probably look as shown in the following chart.
If we only look at IT-systems in their operational state, that would be sufficient right there.
But our systems are not static. Every now and then, somebody surprises the user community with some new software. But what is the right time to install? In general, the users are insufficiently informed concerning the proper timing of such system updates. But there is no user with this respect in an autonomously driving vehicle. Furthermore, the provider of the update only thinks of “his” or “her” particular IT-system. But there are numerous IT-systems in a vehicle, and they are not isolated, but functionally depend on one another. This gives us a hint, what steps 2 and 3 might ask for:
Apply all software updates in a controlled and coordinated manner, within a single session! Separate this session from the driving state in a mutually exclusive fashion!
These last requirements might not be obvious. Let me explain the reasoning: Controlled and coordinated updates are easily justified, if the reason for the update is an interface issue: Every expert will understand the requirement of updating all related systems at once. If this were not the case, some of those systems might not be capable of interacting with each other after software updates. This also gives the explanation for the single session: You cannot expect full capability of integrated IT-systems, if the components are not correctly matched up to each other, and proper interoperability is not checked for.
As we all know, software updates take their time, and there is really a number of steps to be taken: Deactivate the old software, load the new software, install and initiate the new software. If we do that with one IT-system – what do the others do? They make driving hell for the driver! During the update, a system cannot perform its assigned tasks – at least not for a certain while. The other systems will recognize this fact as a failure to respond in time and sound alarms to get the driver’s attention. But that poor person cannot help the situation: The driver has no means at hand to stop, abandon, defer or delay the process. What about driverless vehicles? There is not even a person to respond to an alarm! This calls for the observance of step 4:
Perform all updates when the vehicle is not participating in any active role!
Well, this sounds reasonable – but we have to consider, that some average driver will trust in the techniques of his car so much, that we will not dare to change anything! May be, he had just finished the acquaintance period with his new car, and is not willing to go through that phase over again. And one more question we have to ask: Is that particular driver capable of correctly and entirely control the updates? Hm. I think, we better leave that job to experts, e.g. the garages – like the software upgrades, which have been talked about in conjunction with the Dieselgate affair. And there is one more reason for the garage: Following an update session, probably only garages or similar service points will have the knowledge and infrastructure to check the integrity and interoperability of all IT-systems within a vehicle. At least, neither holder nor driver of the vehicle will likely have the capability to perform system integrity testing. Stating this leads us to the fifth step:
Save software loading procedures are required to be performed!
Leaving the software updates to garages or similar capable and authorized service installations provides a few advantages: The car’s IT-systems may be designed without software load related functions: This may not only save a few cents in hardware components, but also other resources like power consumption and cooling. Absent capabilities do not consume power and do not produce heat. Most important is this: The missing update functionality cannot be misused by adversaries for their criminal intent.
But how shall we handle software updates? There are some hundred IT-systems integrated in the vehicle. Will we face permanent calls for updates? See the garage twice a week or so? Now we are touching some intangible issues:
Undoubtedly, the IT-systems of vehicles deserve the highest possible level of software quality and system quality. And the system integrators have to provide these qualities, as they directly influence security and safety. This will put a strain on them. In most cases the car makers will be the legal persons in charge.
This gives another reason for updating the vehicle’s IT-systems in the garage. Required are means to write to the respective IT-system’s software memory. The “how to” will leave a number of alternatives. These will not be presented here. Not only are they of secondary importance, they also leave chances for competition among the car builders and their business partners. The important lesson to learn is this:
If software alterations require a garage, software cannot be changed during driving, as happened in the famous Jeep-Cherokee-hack.
One thing can be seen by this discussion about software updates right away: Over-the-air-updates of car-IT-systems are neither necessary, nor are they welcome. In fact, they are dangerous, because they use up system resources – especially time – with implications like time outs, as presented earlier. Over-the-air-updates may still be the choice of the system providers! However, not with respect to the vehicle’s IT-systems, but to those of the garages: They may well serve as intermediate software storages until the vehicles arrive in turns to have their own IT-systems updated. Maybe a new business-case for them?
Software updated by garages – or similar service points – really is a security issue: Amateurishly or incompletely performed updates by non-professionals may be hazardous. And if ever liability and other legal questions should arise, a well-kept log of professional software maintenance might not only be a good proof to have at hand, and a means of tracing back any software modification, but also an evidence, that drivers and holders of the vehicle have fulfilled their legal obligations.
Verification of Attributes of secure Hardware-Architectures
Let us now turn to the next topic to cover: What means or procedures do we have to assess the attributes of secure hardware architectures, their safety and their security?
I will go to the necessary steps right away, well knowing, that in industry documents like test plans, test specifications, test procedures and test protocols are required and will be filed.
Attribute 1: Independent storage of software and target data. Well, this attribute is not very easy to test. Let us take several steps towards this goal: Studying the data sheet of the IT-System’s hardware may give us a first hint: If independent memories are not specified, it cannot be a secure device. Test completed. But even if independent storages are on the drawing, we still have to ascertain, that they are used the way they are supposed to, i.e. independently. It would not have been the first time that technological novelties have been compromised by inappropriate use. Far before installation in vehicles, integration tests in simulated or stimulated environments will be performed. Data transfers can be monitored sufficiently in such scenario to state conformity with secure hardware requirements. One step will be to verify that no bit pattern can be moved from a memory dedicated to target data to a memory dedicated to safeware. With the same token, data patterns may not be transferred from instruction memory to the memory dedicated to target data.
With tracing configuration data of both, hardware and software, conformity stated once may be propagated into the future.
Attribute 2: Incapability of executing malware. This feature is most practicably tested in a debug environment. Three steps are necessary: Step 1: Insert some software into the target data assigned memory. Step 2: By means of a debug-tool perform all inputs necessary to have this software executed. The result of this operation has to be a hardware exception raised by the system. Step 3: Verify the correctness of the exception raised by examining the triggering interrupt-code. This step is necessary to check for the absence of any unwanted reaction to step 2, like moving code to a sandbox, or the like. If this test is completed successfully, you can rest sure, that the tested hardware environment indeed is incapable of executing malware.
Attribute 3: Dependence on external devices to perform software updates. This attribute is hard to test for. The possibility to modify software by means of such device does not exclude a hidden functionality to do exactly that. To test for this attribute we may have to refer to the schematics of the IT-system – or ask some so called white-hat-hackers to find out.
Denial of service attacks, and here I return to the topic previously only barely touched, usually depend upon a large number of IT-systems, which are infected with the same kind of malware, sometimes referred to as bots: They are supposed to send messages to a given address. Upon an event, all these infected systems send their messages in a synchronized manner. The proposed secure hardware architecture is not capable of doing that. For this reason this architecture helps to even prevent DOS-Attacks, although it is not designed to counter DOS-attacks. It is an effect of numbers: The more secure hardware architectures replace old-fashioned IT-systems, the less bots can be activated to run such attacks.
I thank very much for your attention and invite your question.