Data Products and Domain-Driven Design: Strategical thinking for addressing challenges in the evolving data management space

Thursday, 7 April, 2022

Data as an asset – is it really?

Classical data management professionals have been using the catchphrase “Data as an Asset” for a long time now and although everyone can grasp the value of it implicitly, businesses seem to have neglected to adjust their departments to treat data accordingly.

Usually, data teams comprise technically skilled professionals who remain unaware of data’s origins and its meaning – to the business, while they are responsible for cataloguing processes and data pipelines, ensuring that all data is timely delivered and properly documented – this being the best-case scenario.

Traditional data teams do not discriminate over data’s nature, never delve too deep into data’s significance and hold the axiom that the process – be it an ETL process, a data pipeline, a database schema or any other technical construct – is more meaningful and important than the data itself.

In other words, the process is the message.

Downsides

Of course, this data worldview has far-reaching implications that influence how the business operates on the space of data transparency, integration and, inescapably, on data-driven decision making. Data is treated as a second-class citizen, it has no integral value, and it is up to the business to ask the right questions to the data team when they need to get or build a report.

The birth of Domain-Driven Design

On the other hand, data is always part of some business context and cannot be meaningfully understood without it. Data needs other data to acquire meaning and the formation of knowledge requires data boundaries. Consequently, data lives inside bounded contexts, which are meaningful to the business. This notion of “bounded context” led to the development of domain-driven design.

But what is a domain? To begin sketching a definition we can say that a domain is a conceptual model of a business operation, which:

-can be measured via KPIs and other metrics

-can be communicated and explained to a layman person – not too technical or abstract

-can “live” as a business concept on its own, but in reality it is organically embedded into the business operations flow, hence crucial for the viability of the business, and overlaps with the rest of the business operations

Let’s use as an example an online retailer. E-shop transactions (e.g. orders) can be classified as a domain. A small team (reducible to a 1-person team) will have the responsibility to collect, transform, serve, catalogue and manage all data that is conceptually relevant to the e-shop’s transactions operations. Another example of a domain is the e-shop’s website/mobile user experience.

The outcome is an interconnected network of domains, managed by one or more teams, each team being responsible for a bounded data context defined and selected by the business based on the domain’s significance to the business flow.

Data Products

Usually, these domains can be naturally broken down into smaller but coherent components, very similar to what a microservice is in an application architecture. The term that the Data community decided to call these components is Data Product.

Expanding on the e-shop example above, “Fulfilled Orders” can be thought of as a data product within the Transactions domain. Accordingly, “Wishlists” can be a complementary Data Product. “Fulfilled Orders” and “Wishlists” products are associated to a Customer, which may be a Data Domain handled by another team. Both data teams, Transactions and Customer, have overlapping data boundaries and are required to exchange data, technical constructs and business understanding in order to keep their systems functional and relevant to the business.

Data Product as an evolving entity

What it boils down to, is that each domain team should consider the data it is exposing to the rest of the business as if it is a product. Each data product needs to have a well-defined schema, which is then made available to other consumers and downstream systems. Hence, data remains an asset as it is attached to some business concept and bears value, lives inside a domain and has a well-defined schema. It has now become a product, an organic part of the business, which lives, evolves and responds to change.

This inevitably leads to a real shift on how data teams treat their data. For instance, processes acquire their pragmatic status as technological tools and artifacts that enable conceptual modelling and business analysis. Technology provides the building blocks to the engine that manages and markets data products as efficiently as possible to other “customers” inside – or outside – the business.

Finally, the goal of a modern data team is to model business domains as interconnected networks of measurable, identifiable and coherent data products, which can be shared to and used by other business stakeholders and data teams.

As a result, data teams acquire domain knowledge, are naturally brought together with other data and non-data teams to achieve common goals and acquire a more pragmatic view on bottom line objectives of the business. It is also easier to document and create metadata knowledge of their models and processes, as their focus is not dispersed into various data spaces, but it is kept on one conceptually coherent data realm.

Materializing Metrics

When a team exposes the schema and data of a Data Product, every interested business entity can become a consumer of this data flow and build upon the attributes of the product. For example, a marketing team can build upon the Data Products of “Customer”, “Wishlist” and “Fulfilled Orders” and create their own metrics and measurements that have no significance to the logistics department, that may use the “Customer” and “Fulfilled Orders” products to analyze the efficiency of delivery on time and in full. The difference now is that data becomes more accessible and transparent, enabling different teams to look at it from their own perspective.

Data Catalog

Since different metrics or KPIs may spawn from the same data products, a Data Catalog immediately becomes a useful metadata management construct. The Data Catalog will provide a common ground for understanding the different metrics, store their definitions and descriptions and become the first place for someone to lookup both existing Data Products and their respective schemas, as well as the definitions of all the metrics that are associated to these products and the purpose behind each metric’s creation. In a few words, a successful implementation of Domain-Driven Design uses the Data Catalog as the central stage where different teams can find the products they are looking for and register their subsequent developments, so that all teams are aligned and travel along the same path.

Where does Witside fit in?

Our team comprises both industry experts and tech savvy engineers able to hit the ground running when it comes to modelling business flow in its Data image. We have both the vision and appetite to deliver foundational changes to the way businesses treat their data, while organizing Data departments that will withstand the impact of an ever-evolving and fast-paced global data landscape.

We sit at the forefront of Data innovation, trying to be one step ahead of the curve, in order to make the transition faster, meaningful and enjoyable.

Sofoklis Vourekas, Data Management, Lead

sofoklis.vourekas@witside.com

https://www.linkedin.com/in/sofoklisv