The Shared Data Layer of The Blockchain Application Stack

This is a follow up to The Blockchain Application Stack. I suggest giving it a quick read if you haven’t already. It’ll provide useful context for many of the thoughts contained in this post. There were many insights left out of that post that I hope to articulate in the ones that will follow. If I didn’t get to answer your question, please bear with me as I’ll try to do a deeper dive on the more recurring ones after we’re done going through the stack in detail. Today’s post is about the Shared Data Layer.

Let’s talk about the Shared Data Layer of The Blockchain Application Stack. This the image we used on the original post describing the stack:

(It’s decent, but it’s not great. My colleague Jonathan is designing an updated version of this graphic that we’ll publish soon. He’s much better at this than me so I’m very thankful for his help.)

Imagine a global database (or a set of global databases) that every application plugs into. That’s the general idea behind the Shared Data Layer. As the name suggests, it’s a data storage layer that is decentralized and open to everyone.

Decentralized means that no single entity, individual, or company owns this database, and it is maintianed by millions of computers around the world. You could also help maintain it, and maybe you even get paid automatically for doing so according to how much your computer contributes to the network.

Open means that anyone – whether it’s a person, a company or an application – has permissionless access to this database. Your personal data is encrypted, and can only be decrypted and interpreted by those you give access to with your password (or more accurately, your private key). You can allow certain applications access to your data, but they don’t own it and you can deny them access at any time, or move to a competitor without losing control of it.

This is made possible by a combination of what I’m calling Overlay Networks and the Blockchain.

Storing Data on the Blockchain

In 2013, a feature was introduced into the Bitcoin protocol that allows us to do just that: create a special kind of transaction (called an OP_RETURN transaction) inside which you can embed tiny amounts of data, 40 bytes, in transactions. Originally it was intended to be used for attaching contextual information to Bitcoin transactions, such as shipping information. A more creative way of using the feature is to create the smallest possible transaction (0.00000001 BTC, or a satoshi, plus the transaction fees) and embed whatever information you want that can fit inside it.

Because the Blockchain is great at timestamping and distributed consensus (meaning that most of the nodes in the network agree that a piece of information is true – in Bitcoin’s case, transactional information and the time at which they happen), you can take advantage of the irreversiblity of the information saved in it to hold a permanent record of something else.

40 bytes isn’t much, but oftentimes constraints are great at catalyzing creativity. One of the first interesting applications that made use of this feature was Proof Of Existence. Given any file, it creates a hash of it – basically an uniquely-identifying ID as opposed to the entire file – and inserts it into the blockchain. Later in time, you can use that transaction’s timestamp and the hash stored in it to prove that that exact file existed at that time by comparing the hash stored in the blockchain with the hash of the file you have in hand. If they match, you now have proof that that file existed at the time of the transaction.

Another, more consumer oriented application that came up is Blocksign, which is a digital signature service similar to Docusign or Hellosign that uses the same technique to store signed documents in the Blockchain.

Both of these are interesting but relatively trivial uses of OP_RETURN transactions. Thankfully, developers from all over are coming up with clever ways of putting those 40 bytes to good use.

The (Bitcoin) Blockchain’s Shortcomings

Many members of the Bitcoin ecosystem have (legitimate) concerns regarding the overuse of OP_RETURN transactions to store data on the Blockchain. Chief amongst them are increasing miner fees, bloating the Blockchain with useless information, and long transaction confirmation times.

You can store information in the tiniest of Bitcoin transactions, but you still need to pay the miners who do the work of confirming and entering it into the Blockchain. Right the minimum fee is 0.0001 BTC, or just under $0.04 USD. It may not look like much, but it increases alongside Bitcoin’s price and writing many records (say, 500 million tweets a day) is very expensive. Some also feel that creating these tiny transactions with the purposes of storing non-transactional information puts unnecessary pressure on the network and adds bloat to the Blockchain. Finally, it takes about 10 minutes for a transaction to be confirmed and recorded in the blockchain, which is certainly not quick enough for the needs of modern applications.

These are all valid concerns, and traditionally the way to address these has been to fork and create new cryptocurrencies and protocols for faster confirmation times, additional storage, etc. However, many teams have been developing creative ways around those by creating Overlay Networks and using the Blockchain sparingly, only for the more critical operations. I believe this is the right approach, and it has historically worked out for different protocols, as Chris Dixon recently put it in a tweet:

Many people tried to fork IP, TCP, HTTP, SMTP, etc but turned out it was better to build on top. Same with BTC.

— Chris Dixon (@cdixon)

October 13, 2014

Overlay Networks

Overlay Networks are systems that extend (or complement) the Bitcoin Blockchain with additional functionality, such as storing certain kinds of data or even files. Together with the Blockchain, they form the Shared Data Layer.

Initially, developers would fork the Bitcoin protocol, extend it to support certain features, and release an alternative cyptocurrency (an altcoin) with its own blockchain. However, it’s increasingly apparent that there are many advantages to building on top of the Bitcoin Blockchain. By building on top of Bitcoin you can benefit from its significant liquidity and network effects – which you don’t get when bootstrapping a new cryptocurrency on a separate blockchain.

“Overlay Network” is an intentionally broad term. Most of these systems are still emerging and they’re bound to have vastly different architectures. Regardless of the form they take, what they have in common is their connection to the Bitcoin blockchain to serve their purpose (such as by using BTC as an incentive token, timestamping their work, validating data, etc.) and that they are, like the Blockchain, decentralized and accessible to anybody.

Piggybacking on Bitcoin’s network is an effective way to develop your own currency (which some like to call metacoins) and protocol without having to create your own Blockchain. Counterparty and Mastercoin are two existing examples. Counterparty’s protocol documentation does a great job at explaining how it works and it’s connection to Bitcoin. In essence, even though Counterparty has it’s own coin (XCP), every XCP transaction is backed by a small BTC transaction. Mastercoin is not exactly the same, but it works in a similar fashion.

Another approach is to use Sidechains. Blockstream is the company developing the technology to make this happen. The general idea is that sidechains would, in theory, allow developers to create their own specific-purpose cryptocurrencies, or sidecoins, on their own blockchains but which, unlike altcoins, can be transferred freely between the Bitcoin blockchain and its own, thus benefiting from Bitcoin’s liquidity.

It’s too early to tell whether or not Sidechains will be successful, but Blockstream’s $21M seed round should give them a fair shot. If your interested in learning more, their whitepaper provides an in-depth, albeit highly technical explanation of the system.

One last example of an Overlay Network is Factom, a “data layer for the Blockchain” that can be used to encode and audit large amounts of records in real time. Factom is an independent network with it’s own nodes that make use of Distributed Hash Tables for storing data, which is periodically hashed and recorded into the Bitcoin blockchain so that it may be at any point in time. There are different aspects to this approach that makes it useful for certain applications, and not so much for others, but it’s a great example of the kinds overlays you can build.

There are plenty more emerging Overlay Networks, and I suspect we’ll see a lot more sophisticated ones come out in coming years (one overlay I’d like to see is a decentralized MongoDB-compatible database). An abundant supply of these overlays – each providing different services – will give developers instant access to low cost, secure and decentralized infrastructures for their applications.

Personal Data Ownership and Security

One of the most important concepts behind The Blockchain Stack is personal data ownership and the inversion of the User model in internet applications. We’ll expand on this subject as we move up the stack, but I want to briefly touch on the concerns about storing personal or sensitive data stored in what will become a global database maintained by millions of unknown computers.

The short answer is encryption. Yes, information stored on Dropbox might be encrypted – but Dropbox holds the encryption keys and has access to your files. If Dropbox gets hacked, your data is compromised.

Under this stack, the user data model is inverted: instead of a third party holding your data and your keys, the network holds your data and you hold the keys. Nobody can access it without your permission, and you’re in total control. Applications are reduced to thin interfaces on top of your data, and through common protocols different apps can interact with each other. Just like how you can e-mail someone from Gmail to Yahoo Mail, you’ll be able to read your friends’ posts without using the same apps they do.

There’s an argument to be made that users don’t want to – or even shouldn’t – have so much control, but I think there’s a more important point we should think about. It’s not about whether users should be able to have control, rather, it’s about whether they can if they want to. The vast majority will opt to have a third party be the custodian of their private keys, such as they do at Coinbase. But Coinbase will cede back your keys at your request, and with it you can move to another service – or host your own wallet – and your balance will remain intact.

I can’t wait for this model to spread beyond Bitcoin and into all other internet services.

If you liked this post, you should follow me on twitter here to be notified when the next one in the series is up. Next week we’ll talk about the Shared Protocol Layer.

I would like to thank everybody for your feedback on this series. While I haven’t been able to respond to everyone’s comments, your questions are undoubtedly shaping the way we develop this theory. We’ll continue to explore the effects that this stack will have on the software business in following posts. I think the topmost layer of the stack, the Application Layer, is the most interesting and we’ll explore some concrete example applications when we get there. We’ll also discuss the higher level questions about this stack, but I think it’s important that everyone has a general idea of how all the pieces fit together before we do so.