Web3 & Decentralization: What it Means for Data Storage

The recent explosion of web3 content, discussion, and debating seems to have happened overnight. Or, maybe we just recently started using the term to describe what we’ve been moving towards since the Internet was first conceptualized. Either way, there are already tons of great articles out there on web3 and blockchain technology. Instead, I wanted to dig a bit deeper into how web3 and decentralization are transforming my world: data storage.

In case you need a refresher, web3 is basically the decentralized web - a group of technologies that are decentralized and used to create decentralized applications. @dabit3 suggests that some of the characteristics enabled by web3 are decentralized web infrastructure ownership, native digital payments, self-sovereign identity, distributed infrastructure, and open backends. The pathway towards web3 is focused on becoming more independent. So, unlike web2 (which we’re in now), web3 means instead of a single company centralizing control, individual contributors own and control the underlying technology stack. In other words, web3 uses blockchains and distributed peer-to-peer networks instead of server-client relationships.

The Problem With Web2

@edaa, along with many knowledgeable folks in the tech community, pointed out that the main issue of web2 is that it’s highly centralized. Large companies own servers that provide messaging, searching, storing, etc., and they have complete control and ownership over the services they provide. This is spot on, and something we’ve been talking about quite a bit in the data management world. In my Hybrid Cloud blog last year, I mentioned that large cloud providers are not highly distributed and tend to have large data centers for specific regions. These centralized data centers may have been the right solution at the time they were implemented, but the users are no longer trusting their information, and web3 is a way to bring back independence. My colleague Jacob Cohen explained how this centralization also causes massive latency issues due to offloading processing from an app to an external server. Some of this latency is because in web2, the external server is often a monolithic database residing in a single cloud region.

You most likely know by now that the answer to some of these web2 data storage challenges is decentralization. Distributed computing is certainly used in web2, but it’s mainly centralized and owned by the same company. But how do we achieve distributed data storage? In his geo-distributed databases blog, Jacob mentions that you need distributed data centers and a database technology that can be distributed. The expansion of this distributed storage ecosystem allows users to manage their own data so that they no longer have to use centralized platforms. For example, you can create your own blog and geo-host it instead of relying on a big business forum that can censor your content.

Web3 Data Storage

In a web3 tech stack, Nader mentions that you might use peer-to-peer solutions similarly to how you would use a database in a traditional tech stack, but instead they are replicated across n number of nodes on a decentralized network, and therefore more reliable. This is music to my ears! It’s exactly what companies like HarperDB are doing to enable decentralized data storage.

The question here is: is it really possible for a database to be entirely decentralized? Blockchain technology is not the same as a database, it stores some data but it’s mainly just transaction metadata. So if you need to store or transport any type of data, which you probably do, you will most likely also need a database. While blockchain may enable you to achieve full trustless decentralization, we still need data storage off-chain for certain projects. This means that you won’t be purely decentralized if you need database-style storage, but this trade-off might be worth it in the end.

HarperDB: a Web3 Database?

I think of web3 and decentralization on a spectrum - it’s not just one or the other, but you can take incremental steps on the path towards your end goal. For now, the data solution might be to at least use a database that is independent of hardware and network providers, completely agnostic of where it resides, and peer-to-peer. HarperDB, a decentralized peer-to-peer database, is certainly on the list of options for web3 databases today.

With HarperDB’s flexible deployment options, users can avoid vendor lock-in and run the database anywhere. It can be deployed on any combination of cloud providers, data centers, and/or edge devices, and anything in between. Additionally, all of those HarperDB nodes can communicate with each other (assuming some sort of network connectivity) via clustering and replication. HarperDB’s powerful single endpoint REST API provides an intuitive interface everywhere it is deployed, ultimately simplifying workload and reducing application complexity.

True Edge Persistence

HarperDB is the same codebase, with a small footprint, regardless of where it is installed. You can connect multiple HarperDB instances together in a cluster where you can set up bi-directional replication at a table level (pub-sub model). HarperDB’s globally distributed peer-to-peer read/write consistency and API distribution capabilities shift application workloads directly to the edge, independent of the cloud. You can shift out to the edge by way of the HarperDB stack to enable edge decisioning without having to refactor your application. This eliminates gaps between data collection and cloud, enables real-time data sync, reduces latency, and improves user experience.

Looking Ahead

The team at HarperDB has been focused on decentralization since inception, continuously evolving the technology to meet the data management needs of our innovative users. More recently, we have partnered with large network and infrastructure providers to offer end-to-end solutions like Hybrid-Cloud, Edge Computing, Machine Learning, and Real-Time Data Sync. The goal is to continue to make developers’ lives easier, while facilitating autonomy and freeing companies from lock-in.

We have come a long way in the world of data management, and maybe the next step will be integration between off-chain data solutions with blockchain protocols. Like any technology innovation, while there are many reasons to get excited about web3, there are also some potential drawbacks to consider. This is why it’s always essential to take a step back to ensure that the technologies and methodologies being implemented are the best fit for your specific product or use case, instead of just jumping on the bandwagon of the next hottest thing. What do you think? Are we moving full steam ahead towards a web3 world? What will data storage look like in 5, 10, 20 years? Share your thoughts below.