Statistically Justifiable Visible Residue Limits

Statistically Justifiable Visible Residue Limits

March 2, 2010

Publication

Article

Pharmaceutical TechnologyPharmaceutical Technology-03-02-2010

Volume 34

Issue 3

Current methods for establishing visible residue limits (VRLs) are not statistically justifiable. The author proposes a method for estimating VRLs based on logistic regression.

The standard of visual cleanliness iscommonly applied to the evaluation ofsurface contamination. Numerous publishedstudies have examined the visuallyclean standard as a means of verifyingcleaning effectiveness in pharmaceuticalmanufacturing, and methods for thequantitation of visible residue limits (VRLs)have been provided. Current methodsfor establishing VRLs are not statisticallyjustifiable, however. The author proposes amethod for estimating VRLs based on logisticregression.

Visually clean (VC), a term that refers to inspection with the naked eye, is a common cleanliness standard employed for evaluating surface contamination and cleaning in high-technology manufacturing, including that of pharmaceuticals, where surface cleaning is of utmost importance. The importance of the VC standard for pharmaceutical manufacturing is evident in the following facts:

It is one of the acceptance criteria for establishing the limits for cleaning-validation (CV) studies (1)

Visual examination of equipment surfaces for cleanliness immediately before use is required by good manufacturing practice (GMP) regulations (2)

Even before the issuance of the GMP regulations, most companies used to a VC standard (3)

A VC approach to controlling cross-contamination in processing and manufacturing operations provides a practical and effective method of risk management (4, 5)

It is one of the means of evaluating cleaned surfaces during the development, optimization, and validation of cleaning processes

It is the only tool available to operators for examining equipment surfaces to verify that they have been cleaned effectively

Manufacturers employ it for routine monitoring of the cleaning process.

Many manufacturers believe that compliance with a requirement that the surface be visually clean ensures only the absence of gross amounts of contamination and may be regarded as the lowest cleanliness standard because of its subjectivity and variability. Many studies of VC as one of several criteria for evaluating surface contamination have been published. In light of its advantages and disadvantages, which are listed in Table I, visual inspection of surfaces, combined with a few simple tools, is still regarded as an effective and inexpensive primary way for evaluating surface cleanliness.

Table I: Advantages and disadvantages of the visually clean standard.

Visually clean criterion for CV studies

In the pharmaceutical industry, cleaning is defined as limiting contamination to a level below practical, achievable, justifiable, and verifiable limits. CV is the documented evidence of cleanliness.

Common bases for establishing CV acceptance limits, as described in literature and in regulatory guidance documents, include the following (6, 7):

Therapeutic daily dose

Toxicological data

The 10-ppm criterion

The VC criterion.

The method that yields the lowest acceptance limit is selected, and the value is considered the maximum allowable carryover (MACO) limit for CV studies. The VC criterion still holds, however, and is independent from the established MACO limit. Regardless of whether the established visible residue limit (VRL) is lower or higher than the MACO values, noncompliance with the VC requirement indicates the failure of CV. The Pharmaceutical Inspection Convention and Pharmaceutical Inspection Cooperation Scheme (PIC/S) requires the VC criterion to be verified through well-documented spiking studies before it can be used for CV studies (1).

Table II: Definitions of visually clean and visible residue limit from the literature.

Because the VC standard is relevant to many technological areas, tremendous efforts have been devoted to defining and devising novel and efficient ways to develop justifiable and quantifiable VRLs for monitoring and validating cleaning procedures. Table II lists some definitions of the VC standard from the literature. The VC standard and VRLs are based on the following common principles:

Particles deposited on the surface tend to reduce the reflection of light.

The unaided human eye (with or without corrected vision) can detect particles as small as 40–50 μm under ideal conditions (8).

The viewer's state of mind could affect his or her ability to detect the residue visually (e.g., the residue might be visible but unseen because the observer is inattentive).

The brighter a residue in comparison with its background, the higher the probability of its detection through visual inspection.

Although the VC standard may be highly subjective, personnel have successfully quantified VRLs by establishing well-controlled experiments and programs. For industries other than pharmaceuticals, variables and parameters associated with the VC standard (e.g., viewing distance and light intensity) have been quantified and well documented (8).

The most popular method, henceforth referred to as the current method, for determining VRLs in the pharmaceutical industry involves spiking the selected material surface with known amounts of residue at concentrations of about 0–10 μg/cm². Trained inspectors then examine the surfaces under controlled viewing conditions (e.g., light, viewing angle, and viewing distance) for the presence of residue (9–11). The lowest level of residue that is detected is then considered the VRL for that particular residue. The only drawback with the method is that it is not statistically justifiable and, hence, not scientifically definable. The primary objective of this article is to establish a method for setting scientifically and statistically justifiable VRLs and to provide a meaningful definition of the VC standard.

Statistical limitations of the current method

One statistical limitation of the current method is that the VRL is determined based on observed data without describing a relationship between observations and the experimental parameters. Suppose that in a VC verification study, a residue is spiked at levels of 0, 0.5, 1.0, 2.0, 3.0, and 4.0 μg/cm². If all four inspectors detect residue at 2.0 μg/cm² and only three inspectors detect residue at 1.0 μg/cm², then the VRL would be 2.0 μg/cm², assuming that the residue levels between 1.0 and 2.0 μg/cm² would not be detected by all inspectors. The VRL is inappropriate because a residue level between 1.0 and 2.0 μg/cm² could possibly have been detected by all inspectors.

To predict the number of observers that would detect residue at levels other than those spiked (e.g., 1.5 μg/cm²), the observed data must be incorporated into a reasonable model that describes a relationship between an outcome and a set of independent variables. The results obtained from spiking studies for verifying the VC criterion are binary (i.e., only two values are possible) rather than continuous. The regulatory guidelines and the available literature do not explain how to establish a modeling procedure, based on these discrete responses, that could be used to derive VRLs, however.

Another important parameter in determining appropriate VRLs is the sample size (i.e., number of inspectors and total number of observations) for VC verification studies. Most published studies are based on relatively small sample sizes. Forsyth's studies are based on only four observers (4, 10, 11). Because the VRL depends on the proportion of detection (i.e. the number of detections of residue to the number of inspections), a small sample size increases the width of the confidence interval and the margin of error. Thus, an 0.8 proportion of detection with a sample size of five would result in a 95% exact-confidence interval of 0.2836–0.9949 and an approximately 35.57% margin of error. At the same proportion of detection, a sample size of 25 would result in a 95% exact-confidence interval of 0.5930–0.9317 and an approximately 16.94% margin of error. Because no consensus has been established about the appropriate number of observers for VC verification studies, the sample size of the study could cause over- or underestimation of VRLs.

Logistic regression

The objective of VC verification studies is to prove that the VC criterion would ensure cleanliness if implemented in the manufacturing setting. During VC verification, spiking studies are performed and inspectors state whether they can detect residue visually under controlled viewing conditions. VC verification studies thus provide a basis for the establishment of VRLs and the determination of appropriate viewing conditions. The lowest concentration of residue that is visually detected by all the observers is then used as VRL. In the current method, VRL could be defined mathematically as the lowest residue concentration for which the ratio of the number of observers able to detect the residue to the total number of observers is equal to 1. As discussed earlier, any knowledge about the outcome in future situations could not be obtained from the observed data unless the data were fitted with the most conservative model that explains the data.

One of the most common examples of modeling is the linear-regression technique. However, linear regression is not suitable for binary data. If we represent the binary responses "Yes" and "No" with values of 1 and 0, respectively, then the mean is the proportion of cases with a value of 1 and can be interpreted as proportion or probability of detection. Although the proportions and probabilities cannot exceed 1 or fall below 0, fitting the data with linear regression could give predicted values of the response variable above 1 and below 0. Clearly, linear regression is not appropriate when the data must lie between 0 and 1 because predictions from the model are not similarly constrained. Other problems that arise when fitting binary data with linear regression are that the variance of the error term is not constant and that the error term is not normally distributed.

The most suitable modeling technique that could be applied to describe a relationship between explanatory variables (e.g., experimental parameters such as residue concentration, viewing distance, viewing angle, and light intensity) and the binary-response variable is logistic regression. Logistic regression allows scientists to predict probabilities of detection of residue based on experimental variables, which is an advantage compared with other prediction techniques. Logistic regression is a flexible and easily applied modeling technique that can be used effectively to model data with continuous or discrete explanatory variables. In addition, the technique accommodates response variables that are not normally distributed.

Table III: The outcome of visually clean verification studies.

The modeling procedure based on logistic regression is explained here using a hypothetical data set (see Table III). This data set may represent an ideal case and is certainly within the experience of those involved in conducting VC verification studies (4, 11, 12). The response variable is binary and indicates two different outcomes (i.e., "Yes" or "No") based on the detection of residue by the observers at a specific viewing condition. The continuous explanatory variable is the measure of the theoretical residue concentration spiked on the model surface. At each spiking level, five replicates indicate the total number of observers used for the visual inspection of the surface. The observed proportion and probability of detection for each spiking level is the ratio of observers that detected the residue to the total number of observers. Figure 1 shows a plot of these observed probabilities of detection. It suggests that the probability of detection increases with the spiked residue concentration. The relationship is nonlinear, however, and the probability of detection changes little at the high extreme of spiked residue. This pattern is typical because proportions and probabilities cannot lie outside the range of 0 to 1.

Figure 1: Plot of probability of detection against the residue concentration for the data presented in Table III.

Although the relationship between observed probability of detection and the residue concentration is nonlinear, a generalized linear modeling technique can be applied to these data. The logistic-regression technique fits the observed data with a linear model, the parameters for which are estimated using the maximum likelihood technique. Next, logistic regression transforms this linear model into a nonlinear logistic curve also known as an S-shaped or sigmoid curve. Logistic regression can therefore be seen as the conversion of a linear model into a nonlinear model that is naturally suited to the description of a binary response variable (13). The link function, commonly known as logit (the logarithm of odds), is used for converting the linear model to nonlinear logistic model and vice versa.

The logistic regression model is represented by the following equation (13):

in which P(Y =1) represents the predicted probability of response being equal to 1 (i.e., the predicted probability of detection); e is the exponent function; β₀, β₁, β₂, ... β_k are coefficients estimated from the data (obtained using the method of maximum likelihood); x₁, x₂, ... x_k are independent variables; and k is the number of independent variables.

The term β₀ + β₁x₁ + β₂x₂ ... + β_k x_k is the logit function and is defined as a natural logarithm of odds (e.g., the probability that an observer detects the residue divided by the probability that he or she does not detect it). Odds are expressed by the following equation:

For the data presented in Table III, which involves only one independent variable (i.e., residue concentration), logit = β₀ + β₁x₁. Once a meaningful relationship is defined between spiked residue concentration and probability of detection, VRL could easily be obtained from the regression model.

Results and discussion

The logistic-regression model is applied to develop a relationship between the residue concentration and its probability of detection. The summary of model parameters, obtained after fitting the data with logistic regression, is provided in Table IV. With a 0.1-μg/cm² increase in the residue concentration on the model surface, the odds of detecting the residue increase by 42.710%. An increase in concentration of this magnitude is likely to increase the probability of detection by 20.150–69.506%. Thus, at the 95% two-tailed level of significance, the residue concentration does have an effect on the response variable. The statistical significance of regression coefficients is tested using the Wald X² statistic. Table IV shows that the estimated coefficient for residue concentration has a p value of less than 0.05, indicating that the coefficient is probably not zero using an α level of 0.05. Therefore, residue concentration is a significant predictor of the probability of detection. The goodness-of-fit tests, with p values ranging from 0.979 to 0.999, indicate that the model fits the data adequately. In other words, the null hypothesis of a good model fit to data is tenable.

Table IV: Results of fitting the logistic regression model to the data presented in Table III.*

To understand fully the predictions made from the model, all values were transformed into direct measures of probability, and confidence intervals for these probabilities were derived using the method of Fleiss et al. to describe the uncertainty associated with fitting the model (14). Table V lists the logit and predicted probabilities of detection for each residue concentration obtained after fitting the data with logistic regression. Logit and predicted probabilities, when plotted against residue concentration, gave a straight line and an S-shaped curve, respectively, as may be seen in Figure 2. Table V shows that the higher the predicted probability of detection for a residue concentration, the more likely an observer will visually detect the residue. A comparison with the data in Table III implies that, based on the current method, the VRL should be 1.80 μg/cm² because it is the lowest concentration of residue for which the observed probability of detection is equal to 1. However, based on the logistic regression model (see Table V), the predicted probability associated with a residue concentration of 1.80 μg/cm² is only 0.949 (i.e., approximately 95 out of 100 observers are predicted to detect the residue).

Table V: Relationship between spiked residue concentration and predicted probability of detection based on logistic regression model.

Table V also compares observed and predicted probabilities and lists the expected number of detections for each residue concentration calculated as the predicted probability multiplied by the number of observers in each category. In logistic regression, the standard error and confidence interval for the model-based probabilities tend to be much smaller and narrower, respectively, than the ones based on the sample proportion. Instead of using a sample of only five observations as the current method does, the logistic-regression model uses information from all the observations in estimating the probabilities of detection. The result is a more precise estimate. Using the logistic-regression model to estimate the probabilities of detection instead of simply using observed proportions is therefore justifiable.

Figure 2: Logistic regression model of the relationship between residue concentration and (a) Logit and (b) probability of detection. CI is 95% confidence interval, O is observed probability of detection, and P is predicted probability of detection.

Irrespective of which method is used, two aspects of considerable importance in determining VRLs are the acceptance criterion for establishing VRLs and the number of observers participating in the VC verification studies. The acceptance criterion indicates the probability of detection (observed or predicted) at which a specific residue concentration could be regarded as the VRL. For the current method, the acceptance criterion is 1, which is based on the idea that if all the observers are able to detect a specific residue concentration during the VC verification phase, then the same residue concentration could also be detected by an observer in future situations. However, the same acceptance criterion could not be used for the logistic-regression model because the probabilities that it predicts can neither be less than 0 nor greater than 1.

Figure 3: Point of intersection for the logistic curve and given acceptance criterion. CI is confidence interval.

To estimate VRLs at different acceptance criteria, the model was inverted to estimate the concentrations that yield a certain response probability. The residue concentration at which the given acceptance criterion intersects with the logistic curve (see Figure 3) was determined. Table VI shows the residue concentrations thus obtained. The notation P_x (e.g., P₅₀) denotes the residue concentration that would give a response of x% according to the model (e.g., the probability that the residue would be detected by 50% of the observers). Confidence intervals for P_x were then derived using Fieller's theorem.

Table VI: Predictions (point estimates) derived from the logistic regression model.

These point estimates provided a framework for evaluating the reliability of the logistic model. The logistic models and associated point estimates could be considered reliable if the observed probability of detection was found to be consistent with the predicted probability of detection. If one assumes 0.999 to be approximately equal to 1, then the VRL for the given residue should be 2.921 μg/cm², which is larger than the one obtained with the current method (see Table VI). Thus, based on the logistic-regression model, the residue concentration at 2.921 μg/cm² is predicted to be detected by all the observers with a 95% confidence interval of 2.266–4.761. Because the probability of detection increases significantly with an increase in spiked-residue concentration, setting higher acceptance criteria would give larger VRLs. Table VI shows that as the acceptance criterion approaches 1, the relationship requires a larger change in the explanatory variable to have the same effect as a smaller change in the explanatory variable at the middle of the curve. For example, a change in the predicted probability of detection from 0.9 to 0.99 requires a larger change in residue concentration (i.e., a change of 0.674 μg/cm²) than does a change in the probability from 0.5 to 0.6 (i.e., a change of 0.114 μg/cm²). Similarly, the confidence interval for these VRLs would tend to be wider as the acceptance criterion increases. Logistic regression may provide a much larger VRL than the current method. However, manufacturers may achieve lower VRLs by adjusting the acceptance criterion.

Unlike continuous responses, binary responses require a large number of observations. The more trials are attempted, the more accurate the estimated probability is. For VC verification studies with a small number of observers, a large number of observations with some replicates at each spiking level is recommended. However, for an accurate estimation of sample size, one may use the formula proposed by Hsieh et al. (15).

Logistic regression, as previously described, can be generalized to incorporate more than one explanatory variable, which may be continuous or categorical. However, care should be taken when interpreting and reporting results from multiple logistic-regression models. To correctly interpret the results from a multiple logistic-regression analysis and arrive at meaningful conclusions, appropriate steps must be taken to incorporate statistical interaction or curvilinear effects properly (e.g., including additional x₁ × x₂ or polynomial terms such as x₁² in the systemic component of the model) (13). If the logistic coefficient for the product or polynomial term is not statistically significant, then the interaction or curvilinear effect is not statistically significant. One problem that may arise while modeling multiple explanatory variables is that sometimes the value of one or more independent variables may raise the probability of the dependent variable close to 1, therefore the effects of other variables cannot have much influence. In that case, such variables should be excluded from the model or individual VRLs should be determined for the most appropriate viewing conditions.

Conclusion

Logistic regression was demonstrated to be a better approach than the current method for estimating accurate and statistically justifiable VRLs based on discrete responses. It has the advantage of always making biologically meaningful predictions and, in most cases, its predictions closely reflect observations. Logistic regression should be used to determine VRLs rather than current method. Because the model may provide a much larger VRL than the current method does, the quality of visual inspection, in terms of the discrete response, can be improved by properly controlling the experimental variables and defining the acceptance criterion for the estimation of VRL.

Based on the modeling procedure, VRL can be defined as a scientifically justifiable residue concentration that, when viewed with the unaided eye, as measured by a specific method, would be detected by the observers with a predefined acceptance criterion. Once established, the VRL could then be used for CV and routine monitoring purposes. It would be appropriate to define the VC criterion as the absence of all particulate and nonparticulate contaminants above the established VRL from the surface when viewed with the unaided eye under preverified viewing conditions.

M. Ovais is a senior pharmaceutical scientist at Xepa-Soul Pattinson, 1-5, Cheng Industrial Estate, 75250 Melaka, Malaysia, tel. +60 63351515, fax +60 63355829, mohammad@xepasp.com.

References

1. PIC/S, "Recommendations on Validation Master Plan, Instal-lation and Operational Qualification, Non-Sterile Process Validation, Cleaning Validation," (PIC/S, Geneva, Aug. 2002).

2. PDA Pharmaceutical Cleaning Validation Task Force, PDA J. Pharm. Sci. Technol. 52 (6), 1–23 (1998).

3. W. Hall, J. Val. Technol.14 (1), 42–49 (2007).

4. R.J. Forsyth, J. Hartman, and V. Van Nostrand, Pharm. Technol. 30 (9), 104–114 (2006).

5. R.J. Forsyth and J. Hartman, Pharm. Eng. 28 (3), 1–10 (2008).

6. European Chemical Industry Council, Guidance on Aspects of Cleaning Validation in Active Pharmaceutical Ingredient Plants, (CEFIC, Brussels, Dec. 2000)

7. G.L. Fourman and M.V. Mullen, Pharm. Technol. 17 (4), 54–60 (1993).

8. NASA, "Space Shuttle, Contamination Control Requirements (Lyndon B. Johnson Space Center, Houston, TX)," SN-C-0005, Rev. D, 1–3 (July 20, 1998).

9. D. A. LeBlanc, Pharm. Technol. 22 (10), 136–148 (1998).

10. R.J. Forsyth, V. Van Nostrand, and G. Martin, Pharm. Technol. 28 (10), 58–72 (2004).

11. R.J. Forsyth and V. Van Nostrand, Pharm. Technol. 29 (10), 152–161 (2005).

12. R.J. Forsyth and V. Van Nostrand, Pharm. Technol. 29 (4), 134–140 (2005).

13. G. Hutcheson and N. Sofroniou, "Logistic Regression," in The Multivariate Social Scientist: Introductory Statistics Using Generalized Linear Models (Sage Publications, London, 1st ed., 1999), pp. 113–152.

14. J.L. Fleiss, B. Levin, and M.C. Paik, "Logistic Regression," in Statistical Methods for Rates and Proportions, W.A. Shewart and S.S. Wilks, Eds. (John Wiley and Sons, Hoboken, NJ, 3rd ed., 2003), pp. 284–339.

15. F.Y. Hsieh, D.A. Bloch, and M.D. Larsen, Stat. Med. 17 (14), 1623–1634 (1998).

Articles in this issue