Establishing Systems Suitability and Validity Criteria for Analytical Methods and Bioassays

Publication
Article
Pharmaceutical TechnologyPharmaceutical Technology, BioPharma Outsourcing Innovation, February 2022
Pages: s6–s11

Ensuring reliable analytical methods and bioassays requires a well-thought-out strategy for evaluating method validity and systems suitability.

tilialucida/Stock.adobe.com – Ensuring reliable analytical methods and bioassays requires a well-thought-out strategy for evaluating method validity and systems suitability.

tilialucida/Stock.adobe.com

Analytical methods and bioassays require a well-thought-out manner of evaluating method validity (truthfulness) and systems suitability (a known positive control) to ensure the method is correct, reliable, and can be used to determine the value of the unknown test article. This paper recommends and explains methods for evaluating system suitability as well as validity criteria. Further, the paper discusses practical and phase-appropriate methods for setting limits for systems suitability and validity criteria.

In alignment with guidances such as International Council of Harmonisation (ICH) Q2 (1), United States Pharmacopeia (USP) <1033>, <1032>, <1225>, and ICH Q6B (2), method development, validation and method understanding is a critical element of product measurement and control. Systems suitability should be understood to be separate from validity criteria. Systems suitability is defined as a known positive control that can be demonstrated to be within a defined set of limits. Validity criteria typically includes signal control for bioassay criteria for parallelism and linearity, and criteria for repeatability.It may further include a fixed number of outliers for data management.

Systems suitability

There are two methods for systems suitability. The first method involves adding a positive control to the assay run. The positive control is a known standard at a fixed concentration.By adding the known standard to the plate and checking if it is within a defined limit, one can trust that the method is running correctly, and the test article may be reliably determined.The second method requires use of a standard curve. A standard curve or calibrator has a known value and is placed on the plate at three or more concentrations; a curve is then fit to the data.By measuring the concentration at a fixed position on the standard curve, one can use that data for systems suitability. The standard curve approach has a distinct advantage in reducing the analytical error of the quantitation of the positive control. It is not recommended to use a calibrator and then add the positive control to a plate, however, as this doubles the analytical error. Figure 1 shows the systems suitability back-calculated from a fixed position on the reference standard.

Figure 1. Systems suitability at a fixed position. All figures are courtesy of the authors.

Figure 1. Systems suitability at a fixed position. All figures are courtesy of the authors.

Validity criteria for signal control

Before relative potency of a test article to a reference standard can be calculated, it is crucial to verify that there is a clear signal from the dose response for the reference standard. There are two methods for verifying a reliable signal; the dose response test and curve depth.

Based upon health authority input, the dose response test is a standard approach to show signal-to-noise control. The dose test measures the effect of analyte concentration, which is used to generate a statistically significant signal (alpha = 0.05, two-sided). A regression model may be linear, square root, log, three parameter exponential model, or four parameter logistic regression (4PL), depending on the method.

The curve depth of the reference standard is another method to demonstrate a clear signal. Curve depth is the mean signal for the highest concentration subtracted from the mean signal from the lowest concentration. For assays using 4PL fit, curve depth is calculated by subtracting the lower asymptote of the standard from the upper asymptote of the standard (Figure 2). Normally the limit is set at 50% of the curve depth from qualification or validation assay runs.

Figure 2. Curve depth comparison.

Figure 2. Curve depth comparison.

R2 of a fitted curve is not recommended for a validity criterion, but should be for report-only. Confidence intervals (CIs) of the reportable value are a more reliable indication of the errors in quantitation due to curve fitting.

Validity criteria for bioassy parallelism

USP <1032> defines parallelism as the following:

“A quality in which the concentration–response curves of the Test sample and the Reference Standard are identical in shape and differ only by a horizontal difference that is a constant function of relative potency” (3).

Parallelism must be evaluated as a validity criterion because parallelism between the test article and reference standard demonstrates that the two are similar, and, therefore, relative potency may be determined. F-tests for parallelism are not recommended because they are too sensitive and may cause a high degree of invalid results.

Demonstrating parallelism in a bioassay is done for a sigmoidal curve and may be evaluated using the upper asymptote ratio (UAR). The UAR is compared to a two-sided limit. If the ratio of the upper asymptote for the reference standard and the UAR for the test article is within defined limits, then the curves can be constrained, and relative potency can be calculated. UAR only checks parallelism at one point on the sigmoidal curve. Slope ratios may be used for a parallel line analysis of the reference and test article. A highly recommended method for determining parallelism is the absolute difference between the relative potency of the unconstrained model and the constrained model. This measures how the forcing function of constraining the curves changes the reported relative potency.

An equivalence test may be used in evaluating the UAR. If the slope ratio is outside of the equivalence bounds, the assay run fails parallelism and is invalid. Figure 3 is a parallelism test example with a resulting slope ratio of 0.993.

Figure 3. Slope ratio. UDL is upper detection limit. LDL is lower detection limit.

Figure 3. Slope ratio. UDL is upper detection limit. LDL is lower detection limit.

Linearity ratio calculation for bioassays

A bioassay also has a requirement to be linear in the dose response.The Linearity Ratio method of analysis uses a measure of curvature relative to the linear line rather than a measure of probability by comparing the effect size attributed to the quadratic term (curve) to the effect size attributed to the linear term in the full model.In practical terms, the question of linearity is, what percentage of the linear line is curvature?

A scaled estimate for the linear term is half the change in the signal over the range (distance from center). To determine the full change over the range linear effect of concentration, it must be multiplied by two. The quadratic term is curving andthe range is the full curvature, which is then divided by full change in the linear signal.

The Linearity Ratio formula is shown in Equation 1:

[Eq. 1]

Validity criteria: assay repeatability/analytical error

In order for an analytical method to be validated, it must be shown to be repeatable at the moment of measurement (4). Before evaluating repeatability, outliers should be removed from the assay run or run of the analytical method. Jackknife z is the preferred method of outlier identification within each dose. Jackknife z is calculated independently for each concentration in the dose response for both reference and sample type. Jackknife z evaluates the influence each point is having on the curve fit where each point is removed from the model; the model is regenerated and evaluated, iteratively, for each data point. Jackknife z = (measurement - dose mean (measurement removed)) - Standard deviation (measurement removed).

CIs should be included when reporting the results for any and all test articles. CIs control for sample size, variation in the method, and risk (95%). CI range is calculated by taking the difference of the upper confidence limit of the constrained relative potency and the lower confidence limit. A narrower CI range indicates that the assay is less variable and, therefore, more repeatable. Excessive CI range may not make the assay run invalid; it demonstrates that additional assay runs are required to tighten the error.Normally, the CI range is reported as a percent of tolerance (Equation 2):

[Eq. 2]

where USL is upper specification limit and LSL is lower specification limit. The coefficient of variance (CV) is often used to evaluate repeatability of the dose response. CV is not recommended for use because it incorrectly scales the analytical error by concentration; high concentrations appear to have low error and low concentrations seem to have high error.

Setting systems suitability and validity criteria limits

When setting specification limits based on the representative sample, risk must be considered. The wider the interval, the greater the amount of allowable transmitted variation on the Y response. K sigma is used when the sample size is 30 or more, and tolerance intervals are used when the sample size is less than 30. Before setting limits, a representative data set must be available. Systems suitability and validity criteria limits are normally set post-qualification and review and finalized post-validation of the method (5).

K sigma or tolerance interval approach.

Tolerance intervals were developed to provide a correction for limited sample sizes and scales. The interval of risk is based on three considerations: sample size, confidence interval, and proportion of the population to be described.

Tolerance intervals should be used under the following conditions:

  • No transfer function is available and/or possible
  • No adverse nor unacceptable results are associated with the parameter or response
  • When n => 30, use K sigma in setting limits
  • When n < 30 use tolerance intervals
  • Sample data are stable (use a control chart or regression analysis to evaluate stability)
  • Distribution is either normal or nonnormal.

The procedure for setting limits are as follows:

  • Generate a distribution histogram (Figure 4)
Figure 4. Distribution-based limits.

Figure 4. Distribution-based limits.

  • Make sure there is a representative sample of the assay validity or systems suitability measurement.
  • Check data for outliers. Jackknife z and three sigma cut point is typical.
  • Determine the distribution that best fits the data, normal or nonnormal distribution, use a goodness of fit or second-order Akaike’s information criterion (AICc) functions to determine the best fit.
  • Avoid using Johnson or Sinh-Arcsinh (SHASH) distributions when fitting nonnormal tolerance intervals as they set limits excessively wide. Normal, Gamma, Weibull, Lognormal, Normal Mixture are preferred and should all work well to set limits.
  • Use Table I to determine the risk and associated tolerance interval and margin.
Table I. Probability and risk table for setting limits. OOS is out of specification. PPM is parts per million. Cpk is process capability index.

Table I. Probability and risk table for setting limits. OOS is out of specification. PPM is parts per million. Cpk is process capability index.

  • Use a K Factor calculator (6) to determine K for the interval (sample size, confidence level andpopulation) for any tolerance interval.
  • Set Spec using K sigma to define the limits.
  • Examine results and document limits, update any standard operating procedures or calculators with the defined limits.

Conclusion and summary

Using the correct methods for evaluating systems suitability and acceptance criteria is critical to ensuring that the analytical methods is fit for use. Setting specification limits as described is statistically rigorous, scientifically sound, and defendable upon regulatory review. Table II is a summary table of all criteria and recommended limits.

Table II. Summary table showing criteria and recommended limits. UAR is upper asymptote ratio. 4PL is four parameter logistic regression.

Table II. Summary table showing criteria and recommended limits. UAR is upper asymptote ratio. 4PL is four parameter logistic regression.

References

1. ICH, Q2(R1) Validation of Analytical Procedures: Text and Methodology, Step 4 version (November 2005).
2. ICH, Q6B Specifications: Test Procedures and Acceptance Criteria for Biotechnological/Biological Products, Step 4 version (March 1999).
3. USP, USP General Chapter <1032>, “Design and Development of Biological Assays,” (Rockville, Md., 2015)
4. USP, USP General Chapter <1225>, “Validation of Compendial Procedures,” (Rockville, Md., 2015).
5. USP, USP General Chapter <1033>, “Validation of Biological Assays,” (Rockville, Md., 2010).
6. NIST, “Tolerance Intervals for a Normal Distribution,” in NIST/SEMATECH e-Handbook of Statistical Methods, www.itl.nist.gov, April 2012.

About the authors

Thomas A. Little*, PhD, drlittle@thomasalittleconsulting.com, is president and principle consultant of Bioassay Sciences, and Daniel S. Harding is a principle consultant of Bioassay Sciences.

*To whom all correspondence should be addressed.

Article Details

Pharmaceutical TechnologySupplement: Bio/Pharma Outsourcing Innovation 2022
February 2022
Pages: s6–s11

Citation

When referring to this article, please cite it as T.A. Little and D.S. Harding, “Establishing Systems Suitability and Validity Criteria for Analytical Methods and Bioassays,” Bio/Pharma Outsourcing Innovation 2022, Supplement to Pharmaceutical Technology (February 2022).

Recent Videos
Lee Cronin, founder and CEO of Chemify
Related Content