Zhamak Dehghani on data mesh.
I personally don’t envy the life of a data platform engineer. They need to consume data from teams who have no incentive in providing meaningful, truthful and correct data. They have very little understanding of the source domains that generate the data and lack the domain expertise in their teams. They need to provide data for a diverse set of needs, operational or analytical, without a clear understanding of the application of the data and access to the consuming domain’s experts.
No one will use a product that they can’t trust.
There is a long list of capabilities that a self-serve data infrastructure as a platform provides to its users, a domain’s data engineers. Here are a few of them:
- Scalable polyglot big data storage> *
- Encryption for data at rest and in motion
- Data product versioning
- Data product schema
- Data product de-identification
- Unified data access control and logging
- Data pipeline implementation and orchestration
- Data product discovery, catalog registration and publishing
- Data governance and standardization
- Data product lineage
- Data product monitoring/alerting/log
- Data product quality metrics (collection and sharing)
- In memory data caching
- Federated identity management
- Compute and data locality
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh