Automated ETL testing tool used in Pharmaceutical Firm

Automated ETL Testing Tool Used In Pharmaceutical Firm

A pharmaceutical firm organizes its safety data by pulling information from multiple source systems into a data warehouse. At the time of this writing, the data warehouse holds in the range of 1,500,000 cases. An average case size is roughly about 67 KB of data, which, with case versioning, brings the size of the warehouse to about 500 Gigabyte (GB). The goal of the software quality effort is to verify the data integrity of each new build of the ETL (Extract, Transform, Load) and database code by sampling the data in the source and target systems, with a target sampling for each run of 50% of the case data.

Challenges

In the Pharmaceutical industry, where the correctness of Safety Data is paramount for reasons of both public health and compliance with Federal regulation, the issue of data quality takes on even greater proportions.

  • Testing large amounts of data that are derived from multiple sources into the main data warehouse
  • Produce audit trail documentation and reporting for compliance with regulations and data quality standards
    Deploy a data quality solution that will scale to support increasing growth in data volume, data warehouse users, and complexity of business intelligence demands.
Solution

Implement a high-volume data sampling effort that allows upwards of 50% of the data to be verified for each build.

  • SQL-based data comparisons between source systems and the data warehouse.
  • Approximately 1000 SQL queries were written against the combined systems to compare case data from the source systems to the analogous data in the data warehouse after ETL.
  • Approximately 45% of the case data was sampled in each run.
  • Our Solution was used to vet the data quality of the system after ETL for each build of the system.

During this development cycle, 14 builds were deployed by the development team and vetted by the software quality team. Data integrity runs were performed on each build in which new ETL code was deployed, sampling ~45% of the total case data (ca. 700,000 cases) on each build.
Data throughput on the system is increased simply by increasing the number of solution agents used in a run. For the runs described here, 11 clients were used and a complete run finishes in ~72 hours. This permitted the development team to deploy ca. 2 builds per week, which the software quality team could provide solid “build health” numbers for.

Benefits

During the development cycle, approximately 115 defects were discovered and remediated. As is expected, discovery of defects leveled off toward the end of the development cycle.
During the build cycles, the defect rate per build was monitored as new functionality was delivered to the software quality team. Because of the complexity of the application, the delivery of new functionality often caused defects to appear in existing functionality. Build 3 was clearly a highly problematic build. Around build 9, the team hit the “coalescence point” where the defect rate was reduced with each build. The final builds brought the number of defects to 0 and the release candidate was deployed.

  • By using the automated ETL testing tool, the team was able to increase testing coverage by approximately 10-fold and decreased testing time by a factor of 3.
  • The software quality team was able to vet each build at the same high level, so build quality could be compared and the development trend was available to the whole team.
  • Automated data verification of the source-to-data warehouse leg for each source permitted rapid localization of defects to specific blocks of ETL code.
  • Using a large sample size led to significantly high confidence interval results.
About Us

Our Services is a consulting and managed services firm dedicated to helping businesses enabling them to better decision making. With over a decade of experience in the analytics and Business Performance Management (BPM), We offer customers, business intelligence with the choice of analytic tool selection from suite of analytic tools, predictive modeling development, real time streaming analytics, profitability costing models, BPM design and delivery optimization and best practices for financial planning.

We guide its customers on business modelling solutions enabling better analysis and insights into their data. We believe that the power of technology combined with best practices will give customers the ability to make fact-based decisions. Our combined expertise in banking, manufacturing, consumer products, telecommunications, healthcare, financial services, and energy verticals bring a powerful and unique value proposition to our clients.

Value To Our Customers

Access to data for your business today is easy. Making fact-based value-added decisions from critical information – that’s the hard part. We are a consulting firm specializing in the implementation and optimization of best of breed business analysis and financial systems solutions. We focus on solving complex data analytics issues to enable fact-based decision making for our clients.