How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh

Zhamak Dehghani on data mesh.

I personally don’t envy the life of a data platform engineer. They need to consume data from teams who have no incentive in providing meaningful, truthful and correct data. They have very little understanding of the source domains that generate the data and lack the domain expertise in their teams. They need to provide data for a diverse set of needs, operational or analytical, without a clear understanding of the application of the data and access to the consuming domain’s experts.

No one will use a product that they can’t trust.

There is a long list of capabilities that a self-serve data infrastructure as a platform provides to its users, a domain’s data engineers. Here are a few of them:

Scalable polyglot big data storage> *

Encryption for data at rest and in motion

Data product versioning

Data product schema

Data product de-identification

Unified data access control and logging

Data pipeline implementation and orchestration

Data product discovery, catalog registration and publishing

Data governance and standardization

Data product lineage

Data product monitoring/alerting/log

Data product quality metrics (collection and sharing)

In memory data caching

Federated identity management

Compute and data locality

How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh

How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh

Categories

Tags