While it sounds simple to generate random data it is not. At least not on deterministic systems like personal computers. But why is it important anyway you may ask. Secure random data is the basic of generating a truly secure cryptographic key. If the random data used to generate the key is predictable then the key itself is predictable and that is obviously the opposite of secret. So if you need to generate a secret symmetric key or an asymmetric key pair including the private key of that key pair, then you need to generate secure random data.
Of cause most time you may just use the operating system API to perform this kind of operation but is this really secure? There was several cases where the operating system failed to generate secure random. It is hard to imagine that these cases are all caused by incompetence and human failure. So one may thing about other reasons for this catastrophic faults as well and if you do so you may come to agree you may better do it yourself. Anyway of cause if you are an operating system developer, you may read this as well and hopefully find some ideas to enhance your solution.
So first here some examples of real world troubles caused by none secure random number generators:
So how to generate secure random?
First I like to discuss the requirements. First requirement of cause is that the generated numbers shall be unpredictable. There is nothing more to say about this since I already explained this before. But this also have several deeper impacts.
The random numbers shall be unpredictable in any case. This means even if I know some generated random numbers I shall not to be able to predict the next or previously generated numbers.
This also means that I shall not be able to predict the generated random numbers by searching through a small enough space of numbers (i.e. all 64 bit numbers).
So from this basic requirements we come to create a basic design.
First we need a pool of randomness which I will name “Random Pool” in following writing. The job of this pool is to hold enough random data to generate random numbers on request at any time. It is required to make this Random Pool large enough to fulfill our basic requirements. Here is the reason why and the explanation what this means. While it may be impossible for an attacker to determine the random pool by predicting the input we added to this pool he may be able to enumerate through all possible pool values and re-calculate the output of our random number generator. We should also keep in mind that a user may need to request several random values from our generator in short time. Even in this case our generator must be able to generate independent random values. This all means the random pool should have a size of at least 256 bit (32 byte). I recommend to make this pool at least as big as the largest value needs to be generated. I.e. for generating 2048 bit RSA keys make it at least 2048 bit (256 byte) large. If keeping in mind to generate multiple values in short time add some more bytes to this minimum requirements. You may not be able to fulfill the size recommendation for 2048 bit RSA keys and this may be no problem since this number space is not attacked by brute force search, but again never make it smaller than 32 byte (256 bit).
Second think we need is a method to add random of any length to our pool. For this purpose we use a cryptographic hash function like SHA-1 or SHA-256. At any time we add random to our pool we basically do as following:
Random Pool = Hash (New Random + Random Pool)
This will be simple in case the Random Pool size fits the size of the hash generate by our hash function. This will require more research how to make each hash block in our Random Pool independent of each other if our Random Pool has a size bigger than the hash function result.
Third we need a random output function. The purpose of the random output function is to decouple the output data from our inner state (Random Pool) so that nobody possessing any data generated from our random number generator shall be able to predict one of the following or previously generated random data by knowing something about the content of our Random Pool. Since on characteristic of a cryptographic hash function is that you cannot compute the input from the output we use a cryptographic hash function here as well. We also use this function to ensure that no two immediately following calls to our random number generator generate the same random data. To ensure the later we use a counter (Random Call Counter) which is incremented before each request. Note: In multithreaded environments ensure that the increment for the Random Call Counter is implemented thread safe. So basically we do this:
Random Call Counter = Random Call Counter + 1
Random Data = Hash(Random Call Counter + Random Pool)
It is a good idea to wait before new random is added to the pool before allowing requesting another amount of random data. Maybe the caller itself can be taken as some random data, i.e. by means of return address to caller on call stack of the CPU and the time of the call.
Last but not least we need random. Here we may differ between machine random and random events. Machine random means data which is public on the machine the software is running, but different to same data on another machine of same type. Random events means events like user interaction. Of cause machine random is not much random but it helps when used in conjunction with even more random event data. Here some more examples of booth categories:
- Network interface MAC address
- User name
- Phone number of user
- Time of a call (in particular the least significant bits of this, i.e. the milli- or nanoseconds)
- Date of a call (less random, but better than nothing)
- User keyboard inputs
- User mouse or touchscreen inputs
- Adresses or data on call stack of CPU
Each part of this random is probably only worth a few bits of real random. So on a multithreaded environment I recommend to run a background thread which regularly adds random to the pool. This random may at least contain the time of the call. Hereby you may assume that other processes and threads running on the system makes this time at least in lower time units (ms, ns) unpredictable. If you feel that you may not have enough random in your pool, maybe because you have not access to events of user interaction since you have no system access, then you should ask the user to provide this random to you by i.e. interacting randomly within a user control you provide to him. You may use mouse positions or key inputs as new random data for your Random Pool before generating an cryptographic key.
I also recommend to keep the Random Pool persistent (i.e. in a file) within the system so that you always enhance your existing collected random data instead of starting from the scratch each time you run. This of cause has one disadvantage in that the pool content may then be accessible to an attacker, so you need countermeasures to ensure privacy of your pool. For example you should not allow random number generation form pool before sufficient new random has been added.
Hopefully I was able to help understanding the important of that issue as well as providing you with some helpful ideas how to solve the problem. I will not provide some more details like code or even more implementation details here. This is much to system dependent. So make the best of it to provide your software and your users with best possible security.