Shift Left. Your Other Left.
- The Data Bassist
- Jan 24, 2023
- 2 min read
Growing up, my dad missed his calling as a professional comedian. Or perhaps he intentionally kept it as his hobby. Regardless, any time he said "on your left" and I looked right, he refused to pass up the opportunity to say "no ... your other left."
In software development, we've had this fun shift-left movement which is "the practice of moving testing, quality, and performance evaluation early in the development process, often before any code is written." ¹ So why, then, haven't we adopted this approach with data? I've personally witnessed many situations where garbage data is accepted upstream because there is some other downstream system that can more easily resolve the quality issue. However, "more easily" is a misconception. You may have more process freedom to manipulate your data in a downstream system, but at what actual cost? Your upstream system is still enabled to avoid best practices, you train your consumers to know that the source is low quality, you force your users to go to a downstream (and latent) system for any semblance of data quality, and you've now made said downstream system the "source of truth" in cases where that may not make sense.
I've seen this unfold before like this: a transactional system avoids governing data for fear of API errors, and instead stores data as it's received. Data is ETL'd into a data warehouse, where it's corrected and presented back to users. In cases where quality issues are missed, users now have lower confidence in the data warehouse although the real quality problem stemmed further upstream. In cases where data has quality corrected, the data warehouse is now relied upon for that data. Those use cases can never be faster than the ETL frequency.
What if we instead built data quality into our cultural DNA? What if we started demanding high-quality data at the source? What if the downstream consumer logged a change request to enhance data quality upstream rather than building their own shadow-ETL to "fix" the issue downstream? What if we shifted our focus (and budget) away from filing the sharp edges off of data and towards extracting real value from that data? A decade ago, I was running a bunch of massive data warehouses for big retail companies. I used to pride myself on building pipelines that could turn any pile of garbage data into something valuable. In hindsight, I see that while I was adding value, I missed a huge opportunity to inspire upstream change and to impact the cultural aspect of the overall data journey. So really, I supposed I was pushing data quality to my left when I should have been pushing it to the other left.
Gunja, S. (2022, November 3). Shift left vs shift-right: A DevOps mystery solved. Dynatrace News. https://www.dynatrace.com/news/blog/what-is-shift-left-and-what-is-shift-right/
Comments