If data is the new currency of business, most organizations are sitting on a gold mine. The problem, unfortunately, is companies need to manage their data effectively before they can cash in. We caught up with three experts in data management to share their guidance on best practices, common pitfalls to avoid, and how to build a data-driven culture that stands the test of time.
How do you define effective data management, and what are the key components contributing to its organizational success?
Jeff Rogers: Data is the underpinning of every business transaction and spans every inch of the organization. It should drive all decision-making processes. Effective data management comes down to three things: (1) a culture where business leaders understand the value and want data to make decisions, (2) clearly identified owners responsible for defining data and delivering data to stakeholders, and (3) measurement capabilities around the effectiveness of data utilization throughout
Anne Pronschinske: Effective data management is pretty straightforward when dealing with small-scale use cases and systems. It really boils down to creating and capturing high-quality data and building standardization in context, as well as documenting the meaning of that data for ongoing appropriate use. This becomes a lot more challenging with large and wide data sets and within complex organizations with multifaceted services, such as healthcare.
Travis Richardson: Academia has a model for effective data management called FAIR, which I really like. Data should be Findable, Accessible, Interoperable, and Reusable. Commercial organizations can benefit from this approach because it emphasizes viewing data as an asset that can continue delivering value long past its initial use. It also ensures reproducibility, which has gained a lot of attention in recent years because many scientific studies have been difficult or impossible to reproduce. To achieve FAIRness, enterprises need to de-silo data and use platforms designed for discoverability and collaboration while also maintaining data privacy and regulatory compliance.
What challenges do organizations face when managing and transforming large volumes of data, and how can they overcome these obstacles?
Jeff Rogers: The single biggest challenge most organizations face is understanding what data means at each step of the journey. We encourage our customers to take a tiered approach to data management that provides a central availability of core data to multiple data consumers and allows the data consumers to further align and refine the data to their individual needs and interpretations.
Anne Pronschinske: Fortunately, we’ve come a long way when it comes to leveraging cloud capabilities to deal with large volumes of data. However, some of the challenges around data locality remain, particularly in large organizations with multi-cloud or hybrid solutions. The challenge around sharing data at scale has been solved by many industries like retail, finance, or insurance but remains a challenge for many healthcare organizations due to the nature of patient care and the highly regulated environment.
Travis Richardson: Healthcare and life sciences organizations are often challenged to gain access to high-quality, diverse data sets. Even once the data is available, it needs to be harmonized with consistent labeling and standardized formatting. This is particularly true when data comes from multiple sources (e.g., clinical trial or multicenter study). Data science teams put in extensive time and effort to manage data, leaving little time for analysis; in fact, 80% of a data scientist’s time is spent finding, curating, and organizing data. Automation can play a big part in flipping this paradigm, even with complex objects like medical imaging. The right data platform can ingest petabytes of data and use rules to standardize curation, pre-processing, and machine learning workflows, as well as provide the provenance and documentation to ensure consistency and reproducibility.
Can you discuss some best practices for data governance and ensuring data quality throughout the data management process?
Jeff Rogers: The first step for effective data governance is agreeing on what data governance means to your organization. Once you’ve done so, there are two crucial elements to ensuring sound governance and quality. First, make sure to have an accountable person for data governance end-to-end. And second, don’t ignore your source systems. Trying to clean up data after the fact is not a long-term solution. Ensure your data quality solutions are fed back into the source systems to permanently solve the problem.
Anne Pronschinske: Best practices for data governance include managing master data, reference data, and metadata as part of the data infrastructure and, of course, doing that as close to the source as possible. One of the best ways we’ve found we can ensure those practices are followed and that the right data quality checks and balances are made is to closely align stewardship practices with individuals who know the data best. The stewards help ensure that the data represents the domain appropriately, while the development teams follow best practice guidance for governance and quality.
In your experience, what are the most common mistakes organizations make when implementing data management and transformation initiatives, and how can they be avoided?
Jeff Rogers: The most common mistake we see is organizations not working together on data management. IT often has intentionally or unintentionally created barriers to innovation using data. Business teams have done data management and engineering in a vacuum, probably using a lot of Excel. When this occurs, there are organizational inefficiencies, but more importantly, the quality of the data and the results of using the data are poor. It is essential to have a data strategy where business and IT teams collaborate to create valuable, trusted, and quality data.
Anne Pronschinske: One of the most common mistakes an organization can make when implementing data management initiatives is to focus solely on the data and the technology without considering the people and process change needed to truly generate value from data and analytics insights. Data management should be an investment for the business, not only monetarily but also in the form of stewardship and citizenship.
How can organizations leverage emerging technologies like artificial intelligence (AI) and machine learning (ML) to enhance their data management and transformation efforts?
Jeff Rogers: The recent floodgates of generative AI have turned the AI world on its head. It’s too soon to determine the impact, but all companies should at least be considering a medium-term strategy around generative AI. AI can also be really interesting from a predictive modeling perspective. My favorite use case for AI is using it to automatically detect and fix data quality issues early in the data engineering processes. Effectively doing so can have large-sweeping effects on downstream results.
Anne Pronschinske: There are many applications of AI and ML to enhance data management and transformation. Automation of data quality checks and AI-driven monitoring are extremely useful. Not to mention, the use of Large Language Models (LLM) for data management and contextualization remains promising in upcoming years.
Travis Richardson: AI and ML are dependent on having a data management infrastructure that streamlines data aggregation and curation while accounting for privacy, governance, and provenance. Flywheel provides healthcare providers and researchers with this type of infrastructure for complex analysis by making data more structured, accessible, and standardized.
What are some real-world examples of successful data management and transformation projects, and what lessons can be learned from these cases?
Anne Pronschinske: The most successful examples of data management and transformation projects will generate “data domains” as products that allow for maximization of data domain or product reuse while eliminating duplicative efforts within those domains. Designing data domains like products infuses many of the principles of good product management into the generation of reusable data for multi-purpose use cases. Many organizations now look to employ principles of data-mesh architecture in their enterprise data strategies which include treating data like a product, fostering domain ownership (stewardship), generating a self-serve data infrastructure platform, and federated governance.
How can organizations create a data-driven culture that supports and prioritizes effective data management and transformation across all departments?
Jeff Rogers: The answer is simple: solve tactical challenges. If your data management solutions don’t actively make or save your company money, you shouldn’t do them. Once you deliver one or two solutions that positively impact the business’s bottom line, it’s amazing how much cultural buy-in you can generate.
Anne Pronschinske: One of the best ways to support a data-driven culture in an organization is to create a shared language to help the whole organization speak the “language of data” and protect data as a strategic asset. It is critical to wrap data work in purposeful culture change management. This can be done through infusion of a data literacy program and particularly through fostering strong stewardship throughout the organization.
Travis Richardson: In healthcare, organizations are beginning to recognize the value of robust data management platforms not just for their role in supporting operational tasks but also driving forward the advancement of research and innovation. More broadly, we see more enthusiastic adoption of digital transformation efforts when stakeholders understand how much automation can help reduce the workload of previously manual tasks.
What trends or innovations do you foresee in data management and transformation, and how should organizations prepare for these changes?
Anne Pronschinske: Innovation continues around data platform capabilities that automate master, reference, and metadata management, as well as self-service data transformation capabilities. To prepare for these changes, data literacy maturation is critical, as jobs and roles will begin to shift from traditional data management activities to new support roles in an increasingly automated and digital environment.