During the late 90's, the strategic planning unit at a university in Africa decided to improve access to information. So they created a data warehouse.
At the time, financials and payroll were delivered by an aging mainframe based GL system. At month-end, we'd receive piles of paper containing various reports. Two of us would go through them, manually separate, place in envelopes, label the envelopes and send them off via internal mail. This took 2-3 days. It was painful.
The team chose the tech, created the data extracts, built the models and trained a range of users. It took a few months to build the reports - we had to painstakingly replicate the exact format that people were used to. It was worth it. Within weeks of the reports being made available, reporting had shifted - from paper based, once a month, to electronic, on-demand. A significantly better solution.
And the self-service solution freed us up - the humans that were originally involved in manual report distribution now had time to focus on higher value activity.
But that was two decades ago.
Facing the same challenges today, it is interesting to see the same waterfall style big data warehouse projects being established. Many of them fail to deliver positive ROI and some fail completely.
Can smaller projects, driven by use cases, deliver faster incremental value?
Definitely: Large projects continue to fail - including some that claim to be "agile" but are really just waterfall with a few "agile" bolt-ons
Definitely: Sophisticated solutions for data prep and advanced analytics, including open source options (like our favourite, KNIME), are mature and widely available
Definitely: Visualisation tools have come a long way too - e.g. we use Tableau and Power BI - and these are only two among various solid options
Maybe: you still need a clean, well governed set of data - garbage in, garbage out. So you still need an overarching data management /governance approach.
Maybe: reporting, particularly when consolidating across disparate systems, is usually more efficient via a data warehouse (speed, accuracy, load on production systems, etc). It's not dead - but perhaps just not created in one big bang
Maybe: new solutions & approaches - e.g. Snowflake (cloud, fast), ELT (rather than ETL) are helping to enable better ROI.
For some industries, the level of complexity and the compliance expectations may mean that a larger project is required. However, a different approach (e.g. agile) is still valuable.
How are you approaching your use of data to better serve your customers?