By its very nature, endocrinology is a specialty that relies on sharp and comprehensive clinical and investigative skills and particularly on accurate, precise, sensitive, and reliable measurements of circulating hormone concentrations. With the realization that manifestations of endocrine disease may be subtle and affected by comorbid illness, medications, aging, and other factors that cloud clinical diagnosis, rugged and robust hormone assays are especially important to the clinician. Even in the presence of obvious classical manifestations of endocrine disease, reliable laboratory studies are needed to confirm the diagnosis.
Serum testosterone (T) assays play an important role in the clinical evaluation of a number of very common endocrine disorders. In males, T assays are used primarily to confirm the diagnosis of hypogonadism, and also to evaluate boys with delayed or precocious puberty and monitor the adequacy of T therapy. Because the clinical manifestations of androgen deficiency are nonspecific, the presence of low serum T levels in men with symptoms and findings consistent with androgen deficiency confirms the diagnosis of hypogonadism. In females, T assays are used in the evaluation of hyperandrogenism (e.g. idiopathic hirsutism, congenital adrenal hyperplasia, polycystic ovarian syndrome, and androgen-secreting ovarian or adrenal tumors), and more recently, to diagnose androgen deficiency. The very low circulating T concentrations in boys and females (over an order of magnitude lower than those in men) present a challenge to the sensitivity of most T assays.
The routine clinical use of T assays began approximately 30 yr ago with the development of RIAs for T that could be performed on relatively small quantities of blood after organic extraction and chromatographic separation (1). Subsequently, there have been remarkable advancements in immunoassays for T as well as other hormones. Compared with original RIAs, T assays of today are more sensitive and specific, require smaller quantities of serum, do not involve extraction or chromatography, and are performed more rapidly and with less cost. In most large clinical chemistry and many reference laboratories, T assays are performed routinely on automated platforms using nonradioactive methods.
The enhanced efficiency and reduced cost, improved sensitivity, ease of performance, and automation of modern T assays have made them more available to clinicians and researchers, facilitating both clinical care and research. Although the pace of these advances has been rapid, rigorous attention to the accuracy of many hormone assays, including T assays, has lagged behind and in some instances, been overlooked. In this issue of JCEM, the papers by Wang et al. (2) and Miller et al. (3) both carefully evaluate the accuracy and reliability of assays for serum total T in men and free T in women, respectively. We complement the authors for their well-conceived and well-designed studies examining the performance and validity of T assays. These reports serve as excellent models for the rigorous assessment of the accuracy of T assays, so important in clinical medicine and research. It is refreshing to note that these investigators have been willing to devote substantial amounts of time and resources to investigating these key methodological issues. This type of work is relatively nonsexy (despite the hormone being measured) and would be viewed as pedestrian by many funding agencies. These papers make for somewhat daunting reading because of the density of the methodological descriptions (both measurement and statistical), but the results are very important for both clinical and research audiences.
T circulates in blood mostly (98%) bound to serum proteins, primarily SHBG and albumin, and only 1–2% of serum T is free of protein binding (4). Because SHBG binds T with high affinity and the off time of T bound to SHBG is very slow, SHBG-bound T is not thought to be available for dissociation into target tissues for action via classical androgen receptor mechanisms (5). In contrast, albumin binds T with low affinity, and the dissociation of T bound to albumin is rapid (6). Therefore, both albumin-bound T and free T are thought to be available to target tissues for androgen action. The combination of albumin-bound (weakly bound) and free T is referred to as bioavailable (or non-SHBG-bound) T. For clinical purposes, this simplified paradigm of circulating T fractions and their actions is reasonable. However, it should be recognized that within the body, relationships among circulating T fractions are dynamic and probably involve other compartments, so that static measurements ex vivo can only estimate the state of T availability and action in vivo. The complexity is increased further with recent evidence that T bound to SHBG may act on some tissues (e.g. prostate) via cell surface receptors (7).
For total T measurements, commercial RIA and nonradioactive immunoassay kits, as well as automated platform immunoassays that mostly use chemiluminescent detection, are widely available. These are the most common types of assays used in clinical and research laboratories. Older RIAs used pure T as standards, often used extraction and chromatography to remove interfering substances or matrix effects, and used published methodology and rigorous validation. Despite differences in specific methodology, these RIAs produced relatively consistent results and a normal range of 300-1000 ng/dl (10.4–34.7 nmol/liter). In contrast, automated immunoassays often use T analogs as standards, proprietary reagents, and instrumentation, and there is limited published validation of their accuracy.
It has been disturbing to clinicians and researchers alike that the lower limit of normal in men for some of these assays has fluctuated and drifted down to as low as 132 ng/dl (4.6 nmol/liter) (3). How can this be? It appears that a major contributor to this variation and decline has been a lack of attention to validation of accuracy for many of these assays. Accuracy is a measure of the closeness of agreement between values measured in an assay to a “gold standard” or accepted method of measurement. For total T assays, the most appropriate gold standard for comparison of assay measurements of samples is steroid-free serum that is spiked with a range of gravimetrically determined amounts of T, or an independent method such as liquid or gas chromatography with mass spectrometry that has been validated in this way. At present, approval of T immunoassays by regulatory agencies is based on demonstrating that results obtained from them are comparable to previously approved assays that may or may not have been validated using gravimetrically determined gold standards.
In this issue of JCEM, Wang et al. (2) compared total T measurements using several immunoassays (manual RIAs using a commercial kit and research assay, and several automated immunoassays) with liquid chromatography tandem mass spectrometry (LC-MSMS) as a gold standard method. The latter was validated independently for high accuracy (using T-free serum spiked with gravimetrically determined amounts of T) and precision. Measurements were made in samples obtained from otherwise healthy men with T levels that ranged from severely hypogonadal to eugonadal and above (<50–1500 ng/dl). Using Deming regression analysis, most of the manual and automated immunoassays compared well with LC-MSMS, with regression slopes for most assays being close to 1 and correlation coefficients of 0.92–0.97. However, one automated assay showed systematic underestimation and another systematic overestimation of T levels compared with LC-MSMS. Within the normal adult male range (300–1000 ng/dl), these immunoassays were reported to perform reasonably well, with over 60% of samples assayed within ±20% of those measured by LC-MSMS. However, in samples with total T less than 100 ng/dl (3.47 nmol/liter), the majority (56–90%) of values assayed in immunoassays were greater than ±20% of those measured by LC-MSMS. The authors concluded that most manual and automated immunoassays tested were capable of distinguishing eugonadal from hypogonadal males if the adult male reference range was established in each individual laboratory, but they lacked sufficient accuracy to measure total T in females and prepubertal children, except perhaps in certain situations when T levels were elevated.
In general, we agree with the conclusions of Wang et al. (2). However, careful examination of their results also seems to reveal relatively consistent underestimation of total T levels by the automated immunoassays compared with LC-MSMS in samples falling within the mild to moderate hypogonadal range (100–300 ng/dl). Potentially, this could result in problems in distinguishing eugonadal from mildly hypogonadal males, and it reinforces the need to establish normal reference ranges for adult males in each individual laboratory. In addition to the accuracy of T immunoassays, another factor that likely contributes to the extreme variability in the normal range for T is the lack of attention to differences in the adequacy and standardization of populations used to establish reference ranges, both during assay validation and during implementation of the assays in individual laboratories. These issues require the attention of assay vendors, clinical and research laboratories, professional societies, and regulatory agencies.
Assay performance is monitored in individual laboratories by external quality control programs such as that provided by the College of American Pathologists. Measurement of a quality control sample in an individual laboratory is compared with those from other laboratories that use the same kit or automated platform without regard to the accuracy of the measurements. In the paper by Wang et al. (2), a table comparing measurements of a single quality control sample among laboratories using the same immunoassay kit or automated platform method revealed substantial variability of up to 23%, with results that ranged from the hypogonadal to eugonadal range, suggesting there is high measurement variability in laboratories using the same as well as different immunoassays.
An important finding of the paper by Wang et al. (2) is that the currently widely used immunoassays tested were not sufficient to measure total T concentrations accurately in females and prepubertal males. This is not surprising given the characteristics of the assays and relatively small volumes of serum used in these assays. The gold standard LC-MSMS method was able to measure samples that contained T in concentrations as low as 20 ng/dl with high accuracy and precision probably, in part, because of the large volume of sample (2 ml) used for analysis. Currently, gas or liquid chromatography with mass spectrometry techniques is not practical for routine use. However, it is possible that with technological advances in methodology, instrumentation, and automation, mass spectrometry methods may be used in the future for routine hormone measurements in clinical and research laboratories. Until then, we agree that there is a need to develop or modify existing manual and automated immunoassays to improve their sensitivity and accuracy, to measure the low levels of T present in females and prepubertal children.
A previous report by Taieb et al. (8) compared serum total T levels measured by several manual and automated platform assays with isotope-dilution gas chromatography-mass spectrometry as a gold standard, in men, women, and children. Although there were differences in the population studied, immunoassays tested, validation of the isotope-dilution gas chromatography-mass spectrometry method, and some discrepancies in the results obtained with automated immunoassays that were tested in both studies, this study came to the same general conclusions that conventional immunoassays performed reasonably well in men, but they lacked sufficient accuracy and reliability for measurement of total T in women and children. It is interesting that some immunoassays tended to underestimate and others tended to overestimate total T over the range tested.
For younger, otherwise healthy men with classical manifestations of androgen deficiency, very low total T concentrations are usually adequate to confirm the diagnosis of hypogonadism. Because total T assays measure both free and protein-bound T, total T levels may be influenced by alterations in SHBG concentrations. For example, total T levels are decreased in conditions associated with reduced SHBG levels (e.g. moderate obesity, hypothyroidism, androgen, glucocorticoid or progestin use, nephrotic syndrome) and increased in situations associated with elevated SHBG levels (e.g. aging, hyperthyroidism, androgen deficiency, estrogen or anticonvulsant use, hepatic cirrhosis) (9). If clinical conditions associated with alterations in SHBG levels are suspected, measurements of free or bioavailable T should be used to assess gonadal status.
A number of assays are available to measure free and bioavailable T in blood. The very low concentrations of circulating free T may be calculated from the percentage of free T as determined by radiolabeled T methods or measured directly after equilibrium dialysis or centrifugal ultrafiltration. Free T measurement by equilibrium dialysis is considered the gold standard for the measurement of free T. Bioavailable T is usually measured after 50% ammonium sulfate precipitation of SHBG from serum and calculation of the percentage of non-SHBG-bound T by tracer binding methods or direct measurements of T in the supernatant that contains free and albumin-bound T. Because they are technically more demanding, time-consuming, and expensive, these assays are not used by most clinical laboratories, but are available from reference laboratories. The most widely used assays for measurement of free T in clinical laboratories are direct RIAs performed either manually or on automated platforms. In general, these assays use a labeled T analog that has low affinity for SHBG and albumin and that competes with free T for binding to an immobilized T-specific antibody. Alternatively, some laboratories and investigators have measured total T and SHBG and used the ratio T/SHBG, the so-called free androgen index (FAI), as a surrogate or estimate for free T. Both free and bioavailable T may be calculated by measuring total T, SHBG, and albumin concentrations and using the equilibrium binding constants of T to the latter binding proteins in published equations (10).
Using the free T by equilibrium dialysis as a gold standard, previous studies have evaluated the accuracy of other free and bioavailable T assays in men and to a limited extent in women (10–13). In men, calculated free T levels (derived from total T, SHBG, and albumin measurements or assuming a constant albumin concentration) were found to be nearly identical with values measured by equilibrium dialysis. In pregnancy, calculated free T levels were lower than values measured by equilibrium dialysis. Therefore, calculated free T levels are thought to provide accurate estimates of free T in men. Although free T levels measured by direct analog RIA and the FAI correlated with free T by equilibrium dialysis, absolute values of free T by direct RIA were substantially lower than those measured by equilibrium dialysis and varied with alterations in SHBG concentrations (10, 11). Calculated bioavailable T levels correlated well and were nearly identical with those measured by ammonium sulfate precipitation in some but not all studies (10, 13). However, both were found independently to correlate well with free T by dialysis, and absolute values for either calculated non-SHBG-bound T or bioavailable T by ammonium sulfate precipitation were found to be approximately 20 times that of free T by equilibrium dialysis (10).
Total and free T concentrations in women are approximately 10-fold and 20-fold lower than those in men, respectively. Because estrogens increase SHBG concentrations and bind to SHBG with high affinity, SHBG levels in women are highly variable and affect measurements of total T. Therefore, accurate and sensitive measurements of free T are needed to assess androgen status, in particular, androgen deficiency in women. In this issue of JCEM, Miller et al. (3) compared free T levels by several of the methods described with those measured by equilibrium dialysis, the generally accepted gold standard method for measuring free T. Measurements were made in women in various states of estrogen and T deficiency and sufficiency, but in all groups, total and free T levels were very low. Calculated free T levels (derived from total T and SHBG measurements, and assuming a constant albumin concentration) and FAI were determined using two immunoassays for total T (RIA with and without extraction and chromatography) and SHBG [immunoradiometric assay (IRMA) and RIA], and free T by a direct (analog) RIA was measured in all samples. Using regression analysis, calculated free T values were nearly identical with those measured by equilibrium dialysis in all groups of women. The strength of agreement depended strongly on the specific total T and SHBG assay that was used; the greatest accuracy was achieved using the T RIA with extraction and chromatography and SHBG IRMA. In contrast, the direct free T RIA correlated with values measured by equilibrium dialysis but less well than calculated free T; more importantly, the direct free T RIA demonstrated poor accuracy (high systematic bias) and precision (high random variability). FAI also correlated very well with free T by equilibrium dialysis. The authors concluded that calculated free T and free T by equilibrium dialysis were the preferred methods for diagnosing androgen deficiency in women.
We agree with the conclusions of the paper by Miller et al. (3). From this study and previous studies, it is clear that calculated free T provides an accurate estimate of free T. As emphasized, both free T by dialysis and calculation require sensitive and accurate measurements of total T, and calculated free T also depends on the SHBG assay used. It is not surprising that free T calculations using the traditional total T RIA after extraction and chromatography performed better than the direct total T RIA. In large part, this was probably due to the larger sample volume and greater sensitivity of the former, traditional T RIA. It would be predicted from the studies of Wang et al. (2) and Taieb et al. (8) that calculated free and bioavailable T using the less sensitive and precise direct immunoassays would be inaccurate in women and children.
Miller et al. (3) found approximately 2-fold greater absolute SHBG levels using an IRMA vs. the RIA and better accuracy of free T calculations using the SHBG IRMA. The reason for this discrepancy in SHBG levels is not clarified, but may be due to differences in antibody specificity and recognition of the several circulating forms of SHBG, due in part to differences in glycosylation. Differences in SHBG concentrations among commercially available SHBG kits have been reported previously (14). To our knowledge, a gold standard method for measuring SHBG does not exist, but it seems reasonable to use methods that correlate well with methods based on T- or DHT-binding capacity. The SHBG IRMA used was calibrated against a binding capacity assay that may reflect more biologically active (physiologically relevant) SHBG concentrations in blood, possibly explaining its better performance in the free T calculation than the RIA. In order for widespread use of calculated free T in clinical laboratories, sensitive and accurate automated platform immunoassays for total T and SHBG must be developed and properly validated.
In the study by Miller et al. (3), FAI was less preferred because it was a unitless number and did not relate to the physiological reality of free T. Furthermore, it can be altered by changes in either T or SHBG and may be misleading. We agree. If both total T and SHBG need to be measured to derive the FAI, why not calculate free and bioavailable T levels? Also, although the FAI index correlated well with free T by equilibrium dialysis in women, it did not do so in previous studies in men (3, 10, 12). The study by Miller et al. (3) in women and other studies in men do not support the use of direct free T RIA measurements. We strongly agree. Although it correlates well with free T by equilibrium dialysis, it does not provide an accurate estimate of free T, it may be affected by alterations in SHBG, and it may lead to erroneous conclusions or diagnoses.
Two important and poorly understood issues that relate to the clinical use of T assays in the diagnosis of androgen deficiency deserve mention. First, it is unclear how many T measurements are needed to confidently confirm the diagnosis of hypogonadism. Approximately 30–35% of men who were classified as hypogonadal on the basis of a single low total T level were found subsequently to have average T levels over 24 h within the normal adult male range (15). Other studies using frequent blood sampling demonstrated that 15% of young normal men had total T levels below the normal range within a 24-h period (16). The intrasubject variation in T levels is particularly problematic in older men who exhibit T levels that fluctuate between the lower part of the normal range and slightly below normal. Because the diagnosis of hypogonadism usually implies a need for and commitment to long-term T treatment, experts in the field recommend that at least two low T values be obtained to confirm the diagnosis of hypogonadism. Second, the physiological significance of low normal to moderately low serum T levels, e.g. in aging men, and the ability of low T levels to predict improvements in clinical outcomes with T therapy are not known (17). The latter will require large, long-term, randomized controlled studies of T therapy on important clinical outcomes (e.g. frailty, fractures, cardiovascular events, and mortality, dementia, and depression). Such studies are also important in assessing the long-term risks of T treatment (e.g. prostate cancer) in older men. The recent Institute of Medicine report (18) performed an excellent service in reviewing androgens, including aspects of T measurement. This report recommended many additional studies of androgen administration to men, an important positive result. However, the Institute of Medicine report recommended against a large, multicenter study of androgen administration to aging men, powered to assess bone fractures, cardiovascular disease, and prostate cancer risk as outcomes; this recommendation was a mistake in our view and could leave us without key clinical information for decades. The aging male deserves a similar type of resource expenditure and scientific assessment as was provided to the aging female by the Women’s Health Initiative (19).
Few would dispute that for T or other hormone assays, accuracy matters. In the last 30 yr, there have been remarkable improvements in the sensitivity, specificity, efficiency, rapidity, and cost of T assays. What is needed now is refocusing of attention to more rigorous validation and standardization of the accuracy and normal reference ranges for these assays to alleviate the confusion that has arisen in the clinical and research community as a result of the variability and discrepancies in T assays. We hope that assay vendors, endocrinologists, clinical chemists, and regulatory agencies can act together to achieve better standardization of hormone measurements, including T assays. This mission should be a top priority for The Endocrine Society.
Free androgen index;
liquid chromatography tandem mass spectrometry;
Measurement of total serum testosterone in adult men: comparison of current laboratory methods versus liquid chromatography-tandem mass spectrometry.
J Clin Endocrinol Metab
Measurement of free testosterone in normal women and women with androgen deficiency: comparison of methods.
J Clin Endocrinol Metab
Transport of steroid hormones: binding of 21 endogenous steroids to both testosterone-binding globulin and corticosteroid-binding globulin in human plasma.
J Clin Endocrinol Metab
Effects of human sera on transport of testosterone and estradiol into rat brain.
Am J Physiol
Bioavailability of albumin-bound testosterone.
J Clin Endocrinol Metab
Androgen and estrogen signaling at the cell membrane via G-proteins and cyclic adenosine monophosphate.
Testosterone measured by 10 immunoassays and by isotope-dilution gas chromatography-mass spectrometry in sera from 116 men, women, and children.
In: Felig P, Frohman LA, eds. Endocrinology and metabolism. 4th ed. New York: McGraw-Hill;
A critical evaluation of simple methods for the estimation of free testosterone in serum.
J Clin Endocrinol Metab
The analog free testosterone assay: are the results in men clinically useful?
Clin Chem [Erratum (1999)
Evaluation of an algorithm for calculation of serum “bioavailable” testosterone (BAT).
On this page:
What are Concerns Regarding Endocrine Disruptors?
In the last two decades there has been a growing awareness of the possible adverse effects in humans and wildlife from exposure to chemicals that can interfere with the endocrine system. These effects can include:
- developmental malformations,
- interference with reproduction,
- increased cancer risk; and
- disturbances in the immune and nervous system function.
Clear evidence exists that some chemicals cause these effects in wildlife, but limited evidence exists for the potential of chemicals to cause these effects in humans at environmental exposure levels. Very few chemicals have been tested for their potential to interfere with the endocrine system. Current standard test methods do not provide adequate data to identify potential endocrine disruptors (EDs) or to assess their risks to humans and wildlife.
In recent years, some scientists have proposed that chemicals might inadvertently be disrupting the endocrine system of humans and wildlife. A variety of chemicals have been found to disrupt the endocrine systems of animals in laboratory studies, and there is strong evidence that chemical exposure has been associated with adverse developmental and reproductive effects on fish and wildlife in particular locations. The relationship of human diseases of the endocrine system and exposure to environmental contaminants, however, is poorly understood and scientifically controversial (Kavlock et al., 1996, EPA, 1997).
Top of Page
How Can Chemicals Disrupt the Endocrine System?
Disruption of the endocrine system can occur in various ways. Some chemicals mimic a natural hormone, fooling the body into over-responding to the stimulus (e.g., a growth hormone that results in increased muscle mass), or responding at inappropriate times (e.g., producing insulin when it is not needed). Other endocrine disruptors block the effects of a hormone from certain receptors (e.g. growth hormones required for normal development). Still others directly stimulate or inhibit the endocrine system and cause overproduction or underproduction of hormones (e.g. an over or underactive thyroid).
Certain drugs are used to intentionally cause some of these effects, such as birth control pills. In many situations involving environmental chemicals, however, an endocrine effect is not desirable.
Top of Page
What are Examples of Endocrine Disruption?
One example of the devastating consequences of the exposure of developing animals, including humans, to endocrine disruptors is the case of the potent drug diethylstilbestrol (DES), a synthetic estrogen. Prior to its ban in the early 1970's, doctors mistakenly prescribed DES to as many as five million pregnant women to block spontaneous abortion and promote fetal growth. It was discovered after the children went through puberty that DES affected the development of the reproductive system and caused vaginal cancer.
Since then, Congress has improved the evaluation and regulation process of drugs and other chemicals. The statutory requirement to establish an endocrine disruptor screening program is a highly significant step.
Growing scientific evidence shows that humans, domestic animals, and fish and wildlife species have exhibited adverse health consequences from exposure to environmental chemicals that interact with the endocrine system. To date, such problems have been detected in domestic or wildlife species with relatively high exposure to:
- organochlorine compounds (e.g., 1,1,1- trichloro-2,2-bis(p-chlorophenyl);
- ethane (DDT) and its metabolite dichorodiphenyldichloroethylene (DDE);
- polychlorinated biphenyls (PCBs), and dioxins); and
- some naturally occurring plant estrogens.
Effects from exposure to low levels of endocrine disruptors has been observed as well (e.g., parts per trillion levels of tributyl tin have caused masculinization of female marine molluscs such as the dog whelk and ivory shell). Adverse effects have been reported for humans exposed to relatively high concentrations of certain contaminants. However, whether such effects are occurring in the human population at large at concentrations present in the ambient environment, drinking water, and food remains unclear.
Several conflicting reports have been published concerning declines in the quality and quantity of sperm production in humans over the last 4 decades, and there are reported increases in certain cancers (e.g., breast, prostate, testicular). Such effects may have an endocrine-related basis, which has led to speculation about the possibility that these endocrine effects may have environmental causes. However, considerable scientific uncertainty remains regarding the actual causes of such effects.
Nevertheless, there is little doubt that small disturbances in endocrine function, particularly during certain highly sensitive stages of the life cycle (e.g., development, pregnancy, lactation) can lead to profound and lasting effects (Kavlock et al., 1996. EPA, 1997).
Top of Page