HarperDB: More Than a Database

I recently had a very interesting conversation on our podcast with Ron Lewis, the Director of Innovation and Engineering at Lumen Technologies. Ron brought up the notion that HarperDB is more than just a database, and for certain users or projects, HarperDB is not serving as a database at all. How can this be possible?

Database, Explained

Well, what really is a database? Wikipedia states that, “In computing, a database is an organized collection of data stored and accessed electronically from a computer system.” Another site simply states that “A database is a systematic collection of data. They support electronic storage and manipulation of data. Databases make data management easy.”

So at its core, yes, HarperDB is certainly a database and can fulfill this functionality (after all, that’s what the DB stands for). But it can do so much more. For example, there are many cases where organizations keep their existing database system(s) in place, and use HarperDB to extend their current functionality or for a different capability altogether. Especially when it comes to solving complex enterprise data management challenges, the answer rarely (if ever) comes down to this database vs. that database. There’s much more to it. There are many different moving parts related to capturing the right data, getting data to where it needs to be in a timely manner, analyzing and acting on that data, etc. This is really where HarperDB shines.

HarperDB: A Runway for Launching Industry 4.0 Technology

Ron mentioned in our podcast discussion:

The reason I see HarperDB as a disruptive technology is because you often call HarperDB a database, but it’s not really a database. It’s maybe what some people call a data mesh or data fabric... I see HarperDB more as a data surface, especially with Functions. The whole idea is to be able to converge and contextualize data to support decision making.

The new Custom Functions Ron refers to will enable users to define their own API endpoints within HarperDB, ultimately expanding HarperDB from a distributed database to a distributed application development platform with integrated persistence. So, now we’re thinking of HarperDB as a data mesh, fabric, or surface instead of a database. That’s a lot of buzzwords! Let’s take a step back.

When I asked Ron what initially drew him and his team to HarperDB, he provided some great insight. Ron mentioned that they were working on a project for the Department of Defense (DoD) that required moving, contextualizing, and converging of data, and they needed something super fast and intuitive. They were essentially looking for something easy to use and easy to deploy that’s also flexible and scalable. Once HarperDB and Ron connected, he discovered that HarperDB could be deployed on devices as small as microprocessors like a Raspberry Pi or Tinker Board, all the way up to large scale servers, cloud machines, or supercomputers. This piqued his interest, as he needed the ability to do large scale analytics and move the data between devices in a simple manner.

At a basic level, we quickly realized that Ron and the HarperDB team were asking the same questions:

When we look at how much data has to move, how much is being created on an hourly basis from OT data onsite etc., how do we manage, transport, and take advantage of all of that data?
How do we get the data to where it needs to be in the most efficient manner possible?

Extended Functionality

Ron said that with HarperDB, he and his team could “define the data movement and do all these crazy cool things.” As they were looking at different military adaptations they were able to take data that is running integrated into a controller environment (OT) and expose that data without needing to have a human machine interface (HMI). They could securely move that OT data into the cloud, into a highly scalable enterprise analytics domain powered by HarperDB in the cloud on compute nodes.

There are many use cases similar to this, where HarperDB can provide a holistic solution that makes data sync and management easy. In the defense space, HarperDB’s bidirectional data movement enables the collection and movement of data and logic in real time, shifting decision making throughout the network as needed. Gaming and media industries benefit from HarperDB’s high performance and low latency, with clear implications for both the organization and the end-user. Retail and ticketing can recognize and block bad bots in real time with HarperDB’s global replication and edge persistence. The list goes on! You can read more about industries that benefit from a high performing, low latency, geo-distributed database here.

Why HarperDB?

Ron explained, “We started looking at all the different databases that are scalable like Couchbase and a bunch of others, but we ended up focusing on HarperDB because of the flexibility. Then, Stephen came up with the idea of Functions because a lot of what we did required us to put an API proxy in front of the data engine. He said, how about I make your life simpler? It’s just amazing how HarperDB checks all the boxes.”

Ron continued, “If you think about how databases communicate and the different models, I love the way HarperDB does it through the native integration of all of these components. No matter what it’s running on or where, HarperDB is disruptive because I’m able to move the different types of data, and different types of assets like functionality, from place to place seamlessly without having to worry about the interoperability of different data engines, nor do I have to worry about the size and scale. Databases are not typically designed as persistent vs. non-persistent, they tend to be scaled vertically instead of horizontally. HarperDB scales beautifully; a containerized version of HarperDB tied to persistent storage allows me to scale HarperDB to meet my performance goals. The workload it can perform is amazing, and the ability to actually scale horizontally is amazing as well because it’s not typical for database engines.”

Therefore, HarperDB is a unique solution for complex enterprise data challenges because the database engine is small and flexible enough to run on a microcontroller running on an onboard system, that can also be extended to edge bare metal or some kind of edge compute environment for higher fidelity analysis, and can also be moved to the cloud -- all at the speed of the Internet. HarperDB can scale vertically and horizontally while meeting performance needs. It really is more than just a database.

To sum it up, Ron stated:

From a data driven ecosystem, HarperDB is paving the path forward moving from mesh to fabric to actual data surface and providing that contextualization of data right out of the database engine, which will be key to a fundamental shift in application behavior.

Predictions for the Future

To wrap up our conversation, I asked Ron about the future of technology. He mentioned a few key things:

The move from cloud to edge is almost certain.
The nature of applications will change to take on a more distributed nature, along the lines of distributed functionality with edge workloads managed and deployed from some type of cloud orchestrator.
As we look at the nature of apps changing, data will be more contextualized from a database engine or persistence layer perspective rather than an application or business layer, and HarperDB is leading the charge on that.

There you have it folks. The future is all about data. There is a constant need for organizations to have their data where they need it, with the ability to orchestrate data where it’s both being created and consumed. If you’re not evaluating your data and how you’re handling your data assets, where will you be in 1, 5, or 10 years from now?