How Locality Social Cloud was built for cybersecurity

by Malte
Tagged as: locality_social_cloud, state-management, m-511

Table of contents:

  1. 1. The pillars of cybersecurity
  2. 2. Confidentiality
  3. 3. Availability
  4. 4. Integrity

The pillars of cybersecurity

The three pillars of cybersecurity are: Confidentiality, Availability and Integrity.

Availability means that you have access to your data. Confidentiality means that only authorized users can access data. And integrity means that your data have not been meddled with.

Confidentiality

Locality uses ECDH-M511 for key exchange and ChaCha20 for encryption and decryption. These techniques ensure that communication remains secure between users. A user in the Locality system is represented by the following components: user_id, public_key, encrypted_private_key, password_hash.

The encrypted_private_key is created using the following steps:

private_key = random 512 bit number
public_key = derived curve M-511-public-key
encrypted_private_key = encrypt(private_ecdh_key, withKey: sha256(sha256(user_password) + PRIVATE_KEY_ENCRYPTION_SALT))
password_hash = sha256(sha256(user_password) + PASSWORD_HASH_SALT_DIFFERENT_FROM_PRIVATE_KEY_ENCRYPTION_SALT)

By using different hashes we can hide the actual private_key from the server. This is important, because the public_key will be encrypted with the private_key and then sent to the server for storage. This way, the user can restore his private key on another device without the user knowing either the private key or the true password of the user.

In a future release, we will be switching to a key-derivation function like Argon to derive the passwords. The advantage of these key-derivation functions is that it is hard to parallelize them and they are slow. This makes them more resisting to brute-forcing with tools like Jack the Ripper.

Availability

Once upon a time, I stumbled upon Elixir and Erlang, when I wanted to make a chat and investigated how WhatsApp built theirs.

At that point, I was not particularly fond of functional programming; my only experience came from programming Haskell at university.

It took me a while to get something fundamental, to understand why Elixir/Erlang has to be a functional programming language:

The problem of distributed systems is fundamentally the same problem as parallel computing. All difficulties in these matters arise from shared memory; when multiple ‘processes’ or ‘observers’ access the same memory.

This problem is fundamental, because in essence, all modern web applications are distributed systems of which each user is a part.

Thus, web development went a bit wrong at the point at which procedural programming, imperative programming and object-oriented programming were picked over functional programming as foundations for the Web.

This is because web development is fundamentally the problem of shared memory; Many different observers (users) access some piece of shared information (examples: a medium-post, the amount of claps, who clapped and how many times, the comments etc).

On top of that, we want the shared information to remain consistent amongst all observers of that information, even after transformations (example: a new person claps and leaves a comment; we want all observers of the clap counter and clap list and notification bell to be notified when that happens).

But all of the paradigms I mentioned above are implicit about state changes; they lack the referential transparency of functional programming languages, where memory at a certain address can not be changed.

Now, of course, this kind of functionality can kind of be mirrored in other programming languages; being explicit with the observer pattern, using PubSub, and then relying on Cloud Computing for Fault Tolerance is then what creates a messy amalgam.

Elixir instead is from ground up considered to be a distributed system. As ist should be.

We need to build a world where there are parallel processes communicating through message passing and I thought they cannot have shared memory because if they have shared memory and the remote going to be the crash.

— Prof. Joe Armstrong, co-inventor of Erlang

On top of that, there is something else:

Software does not run isolated in the ivory tower, but in the real world. Cosmic rays hit, cables get cut, machines break, users go offline. Partial system failure is inevitable.

A system designed to run in the real world needs redundancy and fault-tolerance, not efficiency and theoretical correctness.

Apollo moon mission vs Ariane 5

When they flew to the moon,

They had 5 different computers that execute the code (redundancy) One of them executed different code from a different supplier (fault-tolerance). Each ‘commit’ was proof-read 6 times (redundancy). There was an expected amount of 6000 partial system failures during the mission, but it worked out anyways. and everything went perfectly when they coded it in Assembler.

When they built the Ariane 5, they wanted to cut cost and the rocket exploded.

They had reused code from Ariana 4, which was built it in C with a test suite of course, everything well tested, the rocket exploded due to an integer overflow; the speed or the rocket got too hight for like a 16-bit-integer.

Guess someone didn’t update the test suite to account for the high velocities of the Ariane 5.

Failure is inevitable. This is one example of the mindset ‘let’s build a good test suite and build something correct and efficient’ vs ‘failures will definitely happen, let’s a system that accounts for that and runs stable anyways’.

There are countless more examples. Watch here the Primeagen about a story where the mindset of ‘code is correct’ built a death-ray-machine instead of medical equipment; Code that murdered 6 people.

The role of testing.

Now, there are two classes of failures.

Something goes wrong in a way somebody could have imagined, but didn’t imagine. Something goes wrong in a way somebody could have imagined. Something goes wrong in a way nobody could think of. Tests are useful for addressing the first two kinds of failure; but what happens if you have unknown unknowns? If your system went wrong unimaginably?

The second kind of mistake does happen and it is the reason testing is never enough to guarantee the correctness of an application, not even would a theoretical proof of the application guarantee its correctness in the real world, running on a physical system.

Some things are just hard.

Now, if we consider all participants as parts of our distributed system, we can model a user being offline as a partial system failure.

And if we can learn one thing from successful software projects like the moon mission, then it’s:

Expect the failure.

Expect your software to run on hundreds of computers simultaneously.

Expect cables to be physically cut, expect natural catastrophes, expect that random subsets of your system shut down without any good reason at all.

Design your systems not to be correct, but to be fault tolerant — that means they should keep running, even if partial system failure occurs.

Integrity

Locality Social Cloud provides eventual consistency; that is, after coming online and processing all the newest status updates, the observed state will be the same across all users. It will notify all listeners when the state has changed. The class Timeline allows you to execute methods that need conistent state as a prerequisite; these methods will be executed once the timelines have been synchronized. For more information, refer to the documentation of the social cloud.