How Locality Social Cloud was built for cybersecurity
Table of contents:
The pillars of cybersecurity
The three pillars of cybersecurity are: Confidentiality, Availability and Integrity.
Availability means that you have access to your data. Confidentiality means that only authorized users can access data. And integrity means that your data have not been meddled with.
Confidentiality
Locality uses ECDH-M511 for key exchange and ChaCha20 for encryption and decryption. These techniques ensure that communication remains secure between users. A user in the Locality system is represented by the following components: user_id, public_key, encrypted_private_key, password_hash.
The encrypted_private_key is created using the following steps:
private_key = random 512 bit number
public_key = derived curve M-511-public-key
encrypted_private_key = encrypt(private_ecdh_key, withKey: sha256(sha256(user_password) + PRIVATE_KEY_ENCRYPTION_SALT))
password_hash = sha256(sha256(user_password) + PASSWORD_HASH_SALT_DIFFERENT_FROM_PRIVATE_KEY_ENCRYPTION_SALT)
By using different hashes we can hide the actual private_key from the server. This is important, because the public_key will be encrypted with the private_key and then sent to the server for storage. This way, the user can restore his private key on another device without the user knowing either the private key or the true password of the user.
In a future release, we will be switching to a key-derivation function like Argon to derive the passwords. The advantage of these key-derivation functions is that it is hard to parallelize them and they are slow. This makes them more resisting to brute-forcing with tools like Jack the Ripper.
Availability
Once upon a time, I stumbled upon Elixir and Erlang, when I wanted to make a chat and investigated how WhatsApp built theirs.
At that point, I was not particularly fond of functional programming; my only experience came from programming Haskell at university.
It took me a while to get something fundamental, to understand why Elixir/Erlang has to be a functional programming language:
The problem of distributed systems is fundamentally the same problem as parallel computing. All difficulties in these matters arise from shared memory; when multiple ‘processes’ or ‘observers’ access the same memory.
This problem is fundamental, because in essence, all modern web applications are distributed systems of which each user is a part.
Thus, web development went a bit wrong at the point at which procedural programming, imperative programming and object-oriented programming were picked over functional programming as foundations for the Web.
This is because web development is fundamentally the problem of shared memory; Many different observers (users) access some piece of shared information (examples: a medium-post, the amount of claps, who clapped and how many times, the comments etc).
On top of that, we want the shared information to remain consistent amongst all observers of that information, even after transformations (example: a new person claps and leaves a comment; we want all observers of the clap counter and clap list and notification bell to be notified when that happens).
But all of the paradigms I mentioned above are implicit about state changes; they lack the referential transparency of functional programming languages, where memory at a certain address can not be changed.
Now, of course, this kind of functionality can kind of be mirrored in other programming languages; being explicit with the observer pattern, using PubSub, and then relying on Cloud Computing for Fault Tolerance is then what creates a messy amalgam.
Elixir instead is from ground up considered to be a distributed system. As ist should be.
We need to build a world where there are parallel processes communicating through message passing and I thought they cannot have shared memory because if they have shared memory and the remote going to be the crash.
— Prof. Joe Armstrong, co-inventor of Erlang
On top of that, there is something else:
Software does not run isolated in the ivory tower, but in the real world. Cosmic rays hit, cables get cut, machines break, users go offline. Partial system failure is inevitable.
A system designed to run in the real world needs redundancy and fault-tolerance, not efficiency and theoretical correctness.
Apollo moon mission vs Ariane 5
When they flew to the moon,
They had 5 different computers that execute the code (redundancy) One of them executed different code from a different supplier (fault-tolerance). Each ‘commit’ was proof-read 6 times (redundancy). There was an expected amount of 6000 partial system failures during the mission, but it worked out anyways. and everything went perfectly when they coded it in Assembler.
When they built the Ariane 5, they wanted to cut cost and the rocket exploded.
They had reused code from Ariana 4, which was built it in C with a test suite of course, everything well tested, the rocket exploded due to an integer overflow; the speed or the rocket got too hight for like a 16-bit-integer.
Guess someone didn’t update the test suite to account for the high velocities of the Ariane 5.
Failure is inevitable. This is one example of the mindset ‘let’s build a good test suite and build something correct and efficient’ vs ‘failures will definitely happen, let’s a system that accounts for that and runs stable anyways’.
There are countless more examples. Watch here the Primeagen about a story where the mindset of ‘code is correct’ built a death-ray-machine instead of medical equipment; Code that murdered 6 people.
The role of testing.
Now, there are two classes of failures.
Something goes wrong in a way somebody could have imagined, but didn’t imagine. Something goes wrong in a way somebody could have imagined. Something goes wrong in a way nobody could think of. Tests are useful for addressing the first two kinds of failure; but what happens if you have unknown unknowns? If your system went wrong unimaginably?
The second kind of mistake does happen and it is the reason testing is never enough to guarantee the correctness of an application, not even would a theoretical proof of the application guarantee its correctness in the real world, running on a physical system.
Some things are just hard.
Now, if we consider all participants as parts of our distributed system, we can model a user being offline as a partial system failure.
And if we can learn one thing from successful software projects like the moon mission, then it’s:
Expect the failure.
Expect your software to run on hundreds of computers simultaneously.
Expect cables to be physically cut, expect natural catastrophes, expect that random subsets of your system shut down without any good reason at all.
Design your systems not to be correct, but to be fault tolerant — that means they should keep running, even if partial system failure occurs.
Integrity
Now I will go into detail about how one can prevent history from being rewritten - and what that has to do with Bitcoin and Merkel. Okay, admittedly, Merkle (referring to the Merkle tree concept, not Angela Merkel).
The central question for Bitcoin is: Who is allowed to timestamp what?
This question has classic answers in the so-called ‘Time Stamping Authorities’, TSA. For example, the EU maintains a list of accepted Time Stamping Authorities. A TSA collects requests with data to be timestamped and then signs them using a public-key cryptography method. Only the TSA knows the private key needed to create a signature. But the public key is generally known. Through the mathematical details, anyone can now verify whether a data signature was performed by the owner of a private key that belongs to a public key.
But nothing prevents the TSA from predating data. Or postdating it. You have to trust the TSA. The inventor of Bitcoin was a mathematical genius and a libertarian enemy of the system; he wanted to eliminate states and banks. The fact that this has not yet succeeded is due to two things: Bitcoin is slow and Bitcoin is special. Slow here means it takes minutes to carry out a transaction. With Bitcoin, it is the ‘miners’ who timestamp ‘blocks of transactions’.
A miner solves a difficult mathematical problem; it is hard to calculate, but easy to prove that you have a solution. Solving this problem entitles a participant in the Bitcoin network to timestamp a block of transactions. The miner receives a contractually agreed amount of Bitcoin for this. A block can contain a large number of transactions. However, the number is limited upwards by the Bitcoin protocol. Which ‘pending transactions’ are then timestamped by the miner is at the miner’s discretion. Users can offer this miner transaction fees for this. Each of these individual transactions has a kind of digital fingerprint, it’s called a ‘Hash’. Now, Bitcoin transactions are chained together. Every transaction that spends money must refer to a previous transaction that ‘transferred’ this money into this wallet. Through the chaining and signing, the following situation arises: If one were to try to retroactively change data, the chain of signatures would break because each data signature of a transaction always refers to the signature of another transaction.
The transactions that the miner has chosen to confirm are now all hashed first. Then the hashes are pairwise rehashed. And the resulting hashes are pairwise rehashed again. This results in the structure of a binary tree. This is done until the so-called ‘Merkle root’ is obtained. So, if you confirm, say, 1024 transactions, you have 512 hashes on the level above, 256 hashes above that, then 128 hashes and so on, until you arrive at one hash. If one of the 1024 original signatures of the transactions were changed, the Merkle root would also be different. Theoretically, you could also chain the hashes of the transactions and then hash them again. But if you want to check whether a transaction has been confirmed by a block, you would have to know all the hashes of all the transactions.
The tree structure of the hashes provides a more elegant way; you can verify a ‘Merkle proof’; you can traverse the tree structure from your transaction up to the root and include the sibling for each node. As a result, someone does not have to verify N signatures, but O(log N) signatures to know whether a transaction has been confirmed by a block. The Merkle root is now part of the Bitcoin header. What miners do is the following: In the Bitcoin header there is a number that you can adjust yourself, called a Nonce.
He now guesses a Nonce and creates the hash of the Bitcoin header. If he has now chosen the Nonce in such a way that the hash of the Bitcoin header begins with a certain number of leading zeros, then the problem is considered solved.
To understand this, you need to know that hashes are trapdoor functions: It is very fast to calculate a hash of data, but almost impossible to recover the original data from this hash; it is a kind of ‘meat grinder’ for data.
Other miners can now confirm that you have solved the problem by hashing your Merkle root and your Nonce and checking if it starts with the required number of zeros. They then award you the contractually agreed number of Bitcoin. Ethereum, with ‘Proof of Stake’, which I will not go into in detail today, used an insight: In principle, it is only important that the person who gets the right to timestamp a block is more or less random. It does not necessarily have to be tied to a difficult problem. With Locality Social Cloud, I have not the same, but very similar problems regarding integrity; Locality Social Cloud is in real-time and not slow. Locality Social Cloud is general and not special.
Locality Social Cloud collects user-signed events in encrypted real-time channels. It is assumed that the client itself calculates the current state based on these events if it receives a correct timeline of events.
This makes the channels synchronizable; users who log into a channel for the first time or who lose connection and reconnect can restore the state observed by other users.
The events are double-timestamped by the Social Cloud using a public-key method; once by the user at the time of sending, and a second time by a randomly selected server at the time of processing by that server. The following now applies: If the timeline that you perceive from every other user, as well as their timestamps, are correct, then the timeline in a channel is globally correct.
If we can guarantee the global correctness of a timeline, then different observers can find agreement on which state is observed. And then again, we can create ‘intermediate points’ of the state that all observers agree on. These in turn allow a user who starts observing a channel very late to load the rest of the events from this intermediate point.
This would eliminate the weakness that this approach classically has; if there are millions of events (‘Likes’) in a channel and you are only interested in a small state, you do not want to have to process all the events to calculate the current state. The user who now throws an event knows which events he himself has thrown. He does not necessarily know which events others have thrown, because they could still be ‘in the air’. But he knows if he has all the events.
Now he can calculate Hash( last event + Hash (current event) ), attach it to the message and sign it. This allows him to guarantee that his own timeline has not been subsequently changed. Every other user can now observe whether the timeline he has received from every other user is internally consistent. In combination with the timestamps, the system can thus ensure so-called ‘eventual consistency’ or, in this context, ‘eventual global correctness’; this means that temporarily divergent timelines can exist, but they will synchronize again very quickly.
A so-called timestamp syndicate - a set of interconnected Locality Social Cloud servers that trust each other - can in turn sign the timestamps with Merkle trees. But that is a Merkle tree for another day.”