Data Governance, Data Integrity, and Data Quality: What’s the Connection?

News
Article
Financial data document graph chart report statistic marketing research development planning management strategy analysis accounting. Financial business technology hologram concept. 3d rendering. | Image Credit: © Chaosamran_Studio - stock.adobe.com

Financial data document graph chart report statistic marketing research development planning management strategy analysis accounting. Financial business technology hologram concept. 3d rendering. | Image Credit: © Chaosamran_Studio - stock.adobe.com

Abstract

Nomenclature is important. Data governance, data integrity, and data quality are all widely used terms, but what do they actually mean and how are they connected? The purpose of this article is to provide a structured model for these terms with their definitions and their relationships in the context of analysis and testing within a pharmaceutical quality system.

The crucial concept is that data quality, including data integrity, is only attainable via data governance, as will be illustrated in the proposed model.

Introduction

In a regulatory context, the establishment of measured values and subsequent reportable results of a predefined quality is an essential activity. Reportable results are then compared with predetermined acceptance criteria and/or standards and specifications. The processes, mechanisms, and control systems necessary to establish measurement values and reportable results of a defined quality are interlinked. These interlinkages may include metrological, procedural, or organizational elements.

The overall proposed model is built up using building blocks akin to the construction of Lego brick models. This approach has been used in previous papers concerning error budgets in measurement uncertainty and Monte Carlo Simulation, and data quality within a lifecycle approach (1–2).

An analytical testing flow diagram, shown in Figure 1, gives a high-level idea of the traceability and subsequent interactions from the marketing authorization (new drug application, NDA, or abbreviated new drug application, ANDA) to the reportable result and data quality.

Figure 1: A high-level process flow indicating some of the elements of data governance, data quality, and data integrity, as well as the quality management system (QMS).

Figure 1: A high-level process flow indicating some of the elements of data governance, data quality, and data integrity, as well as the quality management system (QMS).

This process flow has been translated into the Lego brick model shown in Figure 2. This model is based upon a data quality within a lifecycle approach model which has been modified and extended, as will be described in some detail (2).

Figure 2: Data governance model modified and extended from the data quality within a lifecycle approach model (taken from reference 2, Figure 15).

Figure 2: Data governance model modified and extended from the data quality within a lifecycle approach model (taken from reference 2, Figure 15).

The crucial concept is that data quality is only attainable via data governance.

The data governance and data quality Lego brick model

Data quality is not an accident but a product of design. It is a combination of data integrity and the functionality and control under the Pharmaceutical Quality System, usually termed the quality management system (QMS). These aspects are underpinned by data governance and qualified information technology (IT) infrastructure services. This is essentially a sandwich structure in which the “filling” is provided by metrological integrity, analytical procedure integrity, and quality oversight. These “fillings” are described below.

All the elements shown in Figure 2 are subject to risk assessments and risk management (3), which will not be discussed here.

The qualified IT infrastructure services with cybersecurity and access control, as seen above, also will not be covered further in this article.

It is, however, necessary to discuss the key elements of data integrity and the Pharmaceutical Quality System which, when combined, generate data quality on the foundation of data governance.

Data governance is the totality of arrangements to ensure that data, irrespective of the format in which they are generated, are recorded, processed, retained, and used to ensure a complete, consistent, and accurate record throughout the data lifecycle (4).

Data integrity

The purpose of an analytical procedure is to provide a reportable result of the analytical characteristic or quality attribute being determined. Analysis and testing require a measurement system, and a procedure for its application to a sample.

Data integrity is underpinned by the first brick, the metrological integrity of the instrument or system’s operational performance, with demonstrable assurance that it is “fit for intended use” within a specific analytical procedure, which is “fit for intended purpose” over the data lifecycle.

Metrological integrity

Analysis and testing usually involve the use of an apparatus, analytical instrument, or system to make a measurement. Therefore, establishment of “fitness for intended use” for any apparatus, analytical instruments, or systems used in analysis and testing is necessary to ensure metrological integrity over the operational ranges required.

Therefore, it is essential to establish “fitness for intended use” before the analytical procedure is performed. The main resource for instrument and system requirements are the specific monographs and general chapters in the pharmacopeias. In particular, United States Pharmacopeia (USP) has a unique general chapter on the lifecycle processes and requirements for ensuring that any apparatus, analytical instrument, or system is “fit for intended use,” as seen in Figure 3 (5).

Figure 3: Data quality outline for an analysis and testing quality control (QC) model using USP references.

Figure 3: Data quality outline for an analysis and testing quality control (QC) model using USP references.

Assurance lifecycle activities include:

  • analytical instrument and system qualification
  • application software validation
  • calibration over the operational ranges of critical measurement functions
  • maintenance and change control
  • trend analysis to monitor an ongoing state of control.

The second component of data integrity is a validated or verified analytical procedure performed by a trained analyst.

Analytical procedure integrity

It has been a requirement for more than 20 years that analytical methods and procedures need to be validated or verified (6–7). Recently, these requirements have been updated by the International Council for Harmonisation (ICH) and a new guideline on analytical procedure development issued (8–9). USP General Chapter <1220> on Analytical Procedure Lifecycle should be consulted (10).

Critical lifecycle activities include:

  • analytical target profile and development
  • qualification and verification
  • sample management and preparation
  • use of reference standards
  • trained analysts and second person review
  • ongoing performance verification
  • deviation management and change control.

Particular attention should be taken regarding analyst training and second person review within the laboratory.

Second person review is essential in ensuring data quality.

The Pharmaceutical Quality System

Quality control is the guardian of scientific soundness, whereas the quality assurance function is the guardian of compliance. To perform this duty of care, the quality assurance function requires a robust and comprehensive quality management system that enshrines the elements to provide and perform the necessary quality oversight over the data lifecycle.

Quality oversight involves both reviewing and auditing activities of QC but also includes the Pharmaceutical Quality System implementation itself to ensure that it is up to date (11).

Quality oversight covers key areas such as:

  • policies
  • procedures
  • good documentation practice (GDocP)
  • training plans and records
  • data integrity audits and investigations
  • records management and archiving
  • second person review (12).

ALCOA models for data integrity

Much has been written on this topic, particularly regarding the three ALCOA models and the meanings of their acronyms (4). These acronyms are summarized below and illustrated in Figure 4.

Figure 4: Pictorial representation of the three ALCOA models.

Figure 4: Pictorial representation of the three ALCOA models.

ALCOA (13)

Attributable

It must be possible to identify the individual or computerized system that performed a recorded task, and when the task was performed. This also applies to any changes made to records, such as corrections, deletions, and changes, where it is important to know who made a change, when, and why.

Legible

All data, including any associated metadata, should be unambiguously readable throughout the lifecycle. Legibility also extends to any changes or modification to the original data made by an authorized individual so that the original entry is not obscured.

Contemporaneous

Data should be recorded on paper or electronically at the time the observation is made. All data entries must be dated and signed by the person entering the data.

Original

The original record is the first capture of information, whether recorded on paper (static) or electronically (usually dynamic, depending on the complexity of the system). Data or information originally captured in a dynamic state remain in that state.

Accurate

Records need to be a truthful representation of facts to be accurate. No errors in the original observation(s) and no editing are allowed without documented amendments or audit trail entries by authorized personnel. Accuracy is assured and verified by a documented review including review of audit trails.

ALCOA+ (14)

Complete

All data from an analysis, including any data generated including original data, data before and after repeat testing, reanalysis, modification, recalculation, reintegration, and deletion. For hybrid systems, the paper output must be linked to the underlying electronic records used to produce it. A complete record of data generated electronically includes relevant metadata.

Consistent

Data and information records should be created, processed, and stored in a logical manner that has a defined consistency. This includes policies or procedures that help control or standardize data (such as chronological sequencing, date formats, units of measurement, approaches to rounding, significant digits, etc.).

Enduring

Data are recorded in a permanent, maintainable, authorized media form during the retention period. Records should be kept in a manner such that they continue to exist and are accessible for the entire period during which they are needed. They need to remain intact as an indelible and durable record throughout the record retention period.

Available

Records should be available for review at any time during the required retention period, accessible in a readable format to all applicable personnel who are responsible for their review, whether for routine release decisions, investigations, trending, annual reports, audits, or inspections.

ALCOA++ (4,15)

Traceable

Data should be traceable though the lifecycle. Any changes to data or metadata should be explained and should be traceable without obscuring the original information. Timestamps should be traceable to a trusted time source. Metrological standards and instrument or system qualification should be traceable to international standards wherever possible.

Data quality

Data quality is a combination of data integrity and overall control as part of the pharmaceutical quality system.

An example of a quality control data quality outline for analysis and testing, using examples from USP, is illustrated in Figure 3.

Summary

Data quality cannot be assured without a data governance structure supported by a qualified IT infrastructure services with cybersecurity and access control.

The proposed Lego brick model provides a structural framework for assuring data quality over the lifecycle.

A short glossary of definitions of key terms is appended.

Acknowledgements

I wish to thank Bob McDowall and Oscar Quatrocchi for their review and helpful comments.

Definitions of key terms

Data governance

The totality of arrangements to ensure that data, irrespective of the format in which they are generated, are recorded, processed, retained, and used to ensure a complete, consistent, and accurate record throughout the data lifecycle (16).

Data integrity

Data integrity is the degree to which data are complete, consistent, accurate, trustworthy, reliable, and that these characteristics of the data are maintained throughout the data lifecycle. The data should be collected and maintained in a secure manner, so that they are attributable, legible, contemporaneously recorded, original (or a true copy), and accurate. Assuring data integrity requires appropriate quality and risk management systems, including adherence to sound scientific principles and good documentation practices (16).

Data lifecycle

All phases in the life of the data (including raw data), from initial generation and recording through processing (including transformation or migration), use, data retention, archive and retrieval, and destruction (17).

Data quality

The assurance that data produced is exactly what was intended to be produced and fit for its intended purpose (17).

Good documentation practices (GDocP)

Those measures that collectively and individually ensure documentation, whether paper or electronic, meet data management and integrity principles, for instance, ALCOA+ (17).

Metadata

Data that describe the attributes of other data, and provide context and meaning. Typically, these are data that describe the structure, data elements, inter-relationships and other characteristics of data, such as audit trails. Metadata also permit data to be attributable to an individual (or if automatically generated, to the original data source). Metadata form an integral part of the original record. Without the context provided by metadata, the data has no meaning (17).

Pharmaceutical Quality System

A model for an effective quality management system for the pharmaceutical industry to direct and control a pharmaceutical company with regard to quality. (ICH Q10) based upon ISO 9000:2005 (11).

Quality unit(s)

Quality units are organizational entities within the pharmaceutical quality system, necessarily independent of each other and production, that fulfill quality control and quality assurance roles and responsibilities.

Raw data

Raw data is defined as the original record (data) which can be described as the first capture of information, whether recorded on paper or electronically. Information that is originally captured in a dynamic state should remain available in that state. (16).

However, US regulations for good laboratory practice offer a better definition (18).

Raw data means any laboratory worksheets, records, memoranda, notes, or exact copies thereof, that are the result of original observations and activities of a nonclinical laboratory study, and are necessary for the reconstruction and evaluation of the report of that study.

References

1. Burgess, C. Never Mind the Statistics; Just Tell Me What the Answer Is! PharmTech.com, March 20, 2023.
2. ECA, Guide for Integrated Lifecycle Approach to Analytical Instrument Qualification and System Validation, Version 1 (Analytical Quality Control Group, November 2023).
3. ICH, Q9 Quality Risk Management, Step 5 Version – Revision 1 (2023).
4. McDowall, R. D. Is Traceability the Glue for ALCOA, ALCOA+, or ALCOA++? Spectroscopy 2022, 37 (4) 13–19. DOI: 10.56530/spectroscopy.up8185n1
5. USP, USP General Chapter <1058>, “Analytical Instrument Qualification,” USP-NF (Rockville, Md., 2024). DOI: 10.31003/USPNF_M1124_01_01
6. USP. USP General Chapter <1225>, “Validation of Compendial Procedures,” USP-NF (Rockville, Md., 2024). DOI: 10.31003/USPNF_M99945_04_01
7. USP. USP General Chapter <1226>, “Verification of Compendial Procedures,” USP-NF (Rockville, Md., 2024). DOI: 10.31003/USPNF_M870_03_01
8. ICH, Q2(R2) Validation of Analytical Procedures, Step 5 Version – Revision 1 (2024).
9. ICH, Q14 Analytical Procedure Development, Step 5 Version (2024).
10. USP. USP General Chapter <1220>, “Analytical Procedure Lifecycle,” USP-NF (Rockville, Md., 2022).
11. ICH, Q10 Pharmaceutical Quality System, Step 5 Version (2008).
12. Newton, M. E., and McDowall, R. D. Data Integrity in the Chromatography Laboratory, Part V: Second-Person Review. LCGC North Am. 2018, 36 (8) 527–529.
13. Woollen, S. W., “Data Quality and the Origin of ALCOA,” The Compass Newsletter, Summer 2010.
14. EMA, EMA/INS/GCP/454280/2010, Reflection Paper on Expectations for Electronic Source Data and Data Transcribed to Electronic Data Collection Tools in Clinical Trials (June 9, 2010).
15. EMA, EMA/INS/GCP/112288/2023, Guideline on Computerised Systems and Electronic Data in Clinical Trials (March 9, 2023).
16. MHRA, ‘GXP’ Data Integrity Guidance and Definitions, Revision 1 (March 2018).
17. PIC/S, Good Practices for Data Management and Integrity in Regulated GMP/GDP Environments (July 2021).
18. CFR Title 21, Part 58 (Government Printing Office, Washington, DC) 58367–58380.

Recent Videos
Behind the Headlines episode 8
Roger Viney, PhD, chief commercial officer for ICE Pharma
Behind the Headlines, episode 7
Behind the Headlines episode 6
CPHI Milan 2024: Highlighting the Benefits of Integrated Services
Behind the Headlines episode 5
Related Content