Development of software for reference values validation by the Monte Carlo technique

Veli Kairisto, University of Turku, Department of Clinical Chemistry, Turku, Finland

Allan Poola, ProExpert Ltd., Tallinn, Estonia

INTRODUCTION

Production of reference values is cumbersome and only few laboratories actually produce reference values by themselves. Most clinical laboratories depend on reference limits produced in other laboratories.

It is customary that the validity of the "borrowed" reference intervals is checked by analyzing a small number of samples from healthy individuals. There is an apparent need for a powerful statistical method to extract as much information as possible from these samples.

With increased capacity of computers computing intensive statistical methods have become a practical choice for treatment and estimation of data. Holmes et al. (Clin Chem 1994;40:2216-2222) presented a method based on Monte Carlo simulation sampling technique for the verification of reference intervals. We developed software, Refstat for Windows, which applies the method suggested by them.

Refstat for Windows - SOFTWARE SUMMARY

Refstat for Windows is a new software tool for the validation of reference intervals. It was produced by the developers of GraphROC for Windows. Refstat uses a computer intensive Monte Carlo simulation resampling technique, which is essentially a non-parametric method without any distributional assumptions. All parts of the reference distribution, including the reference limits themselves, are checked and corresponding p-values calculated for comprehensive evaluation of the validity of reference values.

How to get your own copy of Refstat?

The programme will be available in September 1997 for downloading through the GraphROC for Windows WWW-site:

http://www.netti.fi/~maxiw

SUMMARY

Refstat for Windows is a practical tool for the evaluation of the validity of reference values. Even if only few control samples and results are available, the validity of the reference distribution can still be evaluated. However, we recommend that at least about 20 control specimens are used. The minimum number of control results accepted by the program is 7.

Refstat uses a computer intensive Monte Carlo simulation resampling technique, which is essentially a non-parametric method without any distributional assumptions. All parts of the reference distribution, including the reference limits themselves, are checked and corresponding p-values calculated for comprehensive evaluation of the validity of reference values.

Probably the greatest practical problem in the application of the method is that laboratories which produce original reference values still often report only the reference limits and not the whole set of reference values. We suggest that this problem could be solved by the international generation of reference values "banks" that could be accessed via the World Wide Web.

Refstat will be available in September 1997 for downloading through the GraphROC for Windows WWW-site:

http://www.netti.fi/~maxiw

Figure 1. Figure shows an example of reference values verification. The original set of reference values represented 240 healthy adult individuals, whose serum creatinine levels (umol/l) were measured. Distributional data of these values are shown in the data window. The test data consisted of 10 creatinine values from healthy individuals.

In the graph all test data values are shown below, the horizontal lines represent 20th, 50th, and 80th percentiles of the test data.

Figure 2. The blue, red, and green frequency polygons represent distributions of 20th, 50th, and 80th percentiles of samples of size 12 after 3000 repeats of random sampling from the reference data.

Figure 3. Otherwise the graph is the same as in Figure 2, but for clarity the original reference distribution has been removed. The calculation results are visible in the data window. The NGE is the number of simulated cases in which the defined percentile was more far away from corresponding percentile in reference distribution than the same value of the test data. P-values are the probabilities of the null hypothesis that the test data mean or percentiles represent a random sample from the reference distribution.

Figure 4. Another example of reference values verification. The original set of reference values represented 239 healthy adult individuals, whose serum albumin levels (g/l) were measured. Distributional data of these values are shown in the data window. The test data consisted of 12 Albumin values from healthy individuals.

In the graph all test data values are shown below, the horizontal lines represent 20th, 50th, and 80th percentiles of the test data. The corresponding distributions based on 1000 samples from the reference data are shown above.

Figure 5. The graphical presentations and the graph and data windows can be adjusted in Refstat. This example displays the same data as in Figure 4. It is also possible to export both data and graphs via Windows clipboard.

Figure 6. An example of a printout generated by Refstat. The data window shows the calculated values for the 2.5th, 20th, 50th, 80th, and 97.5th percentiles as well as for the mean. P-values are calculated for each of these percentiles.

In the graph only the 20th, 50th, and 80th percentiles of the test data and of the 1000 simulated samples are shown, original frequency distributions have not been chosen for display.