Back to Top8. Considerations when comparing result from different studies
Many patients visited and download our prostate studies and attempted to compare our survival results with other studies. The comparison prove to be a complex task if one is not a statistician. We understand that everyone is seeking to understand benefit of each treatment method. And we would like to make the comparison process is easier by comparing our outcome of HDR Brachytherapy with other treatment method base on the criteria listed below.
Comparison Criteria
- Definitions of Failure
- General Clinical Control
- Comparison of Risk Group
- Follow-up Time
- Cohort Characteristics
- Late RTOG Toxicity Rates (side effects)
Back to Top1. Definition of Failure
Most centers report their results using the ASTRO definition of failure. Unless a clinic includes the ASTRO definition in their report, it is impossible to compare results.
An important paper was recently published in the November 2003 issue of the IJROBP of the largest multi-institutional cohort (4,839) treated for prostate cancer with external beam radiation therapy. (
Thames H, et al. Comparison of Alternative Biochemical Failure Definitions Based on Clinical Outcome in 4839 Prostate Cancer Patients Treated by External Beam Radiotherapy Between 1986 and 1995. International Journal of Radiation Oncology, Biology and Physics Vol. 57, No.4, pp 929-943). The participating institutions were M.D. Anderson Cancer Center, Fox chase Cancer Center, William Beaumont Hospital, Mallinckrodt Institute of Radiology, Mayo Medical School, University of Michigan, Massachusetts General Hospital, and Memorial Sloan-Kettering Cancer Center. The authors conducted a validation study which statistically evaluated 102 definitions of failure and their ability to predict biochemical (PSA) failure. We applied several aspects of this paper into ours, namely the criteria for general clinical failure (GCF), and two of the four definitions of failure found to be superior, in specificity and sensitivity, positive and negative predictive values, and hazard of clinical failure after biochemical failure, than the standard ASTRO definition of failure.
The worst performing definition of failure is the absolute nadir (PSA>0.2) definition.
The validation study analyzed these absolute PSA value definitions and found they were the worst at predicting survival because of unacceptably low specificities. When we applied the PSA>0.2 definition of failure to our cohort, it called 35 (17%) known successes as failures (false positives), three times the number that ASTRO predicted. Where this shortcoming has clinical implications is if a physician initiates unnecessary salvage treatments for patients who fail to reach a specific PSA nadir, when they may not be true failures.
In the validation study, the better definitions of failure were ones determined to have both high predictive values. (see table below) Based on this analysis, we rejected the PSA>0.2 definition and continued to analyze our results using ASTRO (the standard), the 2 rises ≥0.5 and the nadir+2 definitions which were found to better predictors of PSA failure than ASTRO.
Table 1:
Measures of diagnostic performance |
ASTRO |
2 Rises ≥0.5 |
Nadir + 2< |
PSA >0.2 |
Predictive Value of a Positive Test |
33% |
71% |
75% |
31% |
Predictive Value of a Negative Test |
92% |
95% |
98% |
97% |
PSA progression free survival (PSA-PFS) (bNED, biochemical control) at 5, 8, and 10 years according to risk group using three definitions of failure: ASTRO, and two definitions found to be more accurate predictors of outcome than ASTRO. )
Table 2:
Risk Group |
ASTRO |
2 Rises =0.5 ng/mL |
Nadir + 2 ng/mL |
5yrs |
8yrs |
10yrs |
5yrs |
8yrs |
10yrs |
5yrs |
8yrs |
10yrs |
Low |
90% |
90% |
90% |
93% |
93% |
93% |
93% |
93% |
93% |
Intermediate |
91% |
87% |
87% |
91% |
89% |
89% |
93% |
92% |
82% |
High |
74% |
69% |
69% |
81% |
75% |
69% |
83% |
66% |
62% |
Back to Top2. General Clinical Control
The best indication of a clinic's treatment efficacy is not based on definitions of failure, but on mature, long term results. Our 10 year data showed an overall general clinical control (GCC) rate of 90% (188/209). We used the same endpoints for GCF as the large multi-institutional study mentioned above: local failure (determined by positive biopsy or DRE more than 2 years post treatment, associated with PSA progression), distant failure, post-treatment initiation of hormones, or a post-treatment PSA>25 ng/mL.
General clinical control according to risk groups: Low: 93%, Intermediate, 94%, and High, 79%.
Back to Top3. Comparison of Risk Group
To fairly compare results when centers classify patients by risk groups, it is important to note how the groups were defined. After an in-depth literature review, we noted that the definition of low risk was very consistent, but intermediate and high risk group definitions were quite varied. What one center calls high risk, another calls intermediate risk.
CET's Risk Group Definitions: Low risk: ≤T2a, Gleason ≤6, PSA ≤10 Intermediate risk: T2bc, Gleason 7, PSA >10-20 High risk: T3, Gleason 8-10, PSA >20
Note the median follow-up time, stated in months or years. The longer the median follow-up, the more mature the data and therefore a better indicator of a clinic's treatment success. It is not a fair comparison between a study with great looking PSA-PFS numbers with a 4 year median follow-up to one where the PSA-PFS numbers are lower but has 7 or 10 year median follow-up.
Back to Top5. Cohort Characteristics
Look at the distribution of patients according to risk group. If the majority of patients treated by a center are low risk, their survival numbers are going to be much higher than a center where the majority are intermediate risk. Note how the majority of our patients were intermediate risk. Our high risk patients comprised a sizeable percentage of our cohort also. Most permanent seed studies we found in the literature reported low risk to intermediate risk group results. When high risk statistics were cited, the high risk group was not a sizeable proportion of the cohort and did not include stage T3.
Table 3: CET Cohort Distribution (n = 209)
Low Risk |
70 (33.5%) |
Intermediate Risk |
92 (44%) |
High Risk |
47 (22.5% |
We have grouped recently published studies of various radiation modalities and group them together by risk group into 2 tables below.
- Low Risk Group Literature
- High Risk Group Literature
Table 4: Low Risk Group Literature
Author /
Treatment Method
|
Group definitions
|
Median follow-up (yrs)
|
ASTRO PSA-PFS (%)
|
3D EBRT
|
|
|
|
Zelefsky et al. |
T1-T2, Gleason score =6, PSA =10 ng/mL |
3 |
95 |
Hanks et al. |
T1-T2a, Gleason score =6, PSA <10 ng/mL |
9 |
78 |
Kupelian et al. |
T1-T2a, Gleason score =6, PSA =10 ng/mL |
4 |
93 |
Pollack et al. |
PSA =10 ng/mL |
5 |
75 |
Seed monotherapy
|
|
|
|
Grimm et al. |
T1-T2b, Gleason score =6, PSA = 10 ng/mL |
7 |
87 |
Blasko et al. |
T1-T2, Gleason score =6, PSA =10 ng/mL |
5 |
94 |
Kupelian et al. |
T1-T2a, Gleason score =6, PSA =10 ng/mL |
4 |
87 |
Beyer et al. |
T1-T2, Gleason score =6, PSA =10 ng/mL |
4 |
85 |
Kollmeier et al. |
T1-T2a, Gleason score <6, PSA =10 ng/mL |
6 |
88 |
Ragde et al. |
T1-T2a, Gleason score =6 |
10 |
66 |
Zelefsky et al. |
T1-T2a, Gleason score =6, PSA =10 ng/mL |
4 |
88 |
EBRT + seeds
|
|
|
|
Kupelian et al. |
T1-T2a, Gleason score =6, PSA =10 ng/mL |
4 |
93 |
Blasko et al. |
T1-T2, Gleason score =6, PSA =10 ng/mL |
5 |
87 |
Sylvester et al. |
T1-T2b, Gleason score =6, PSA =10 ng/mL |
5 |
85 |
EBRT + HDR-BT
|
|
|
|
Eulau et al. |
T1-T2b, Gleason score =6, PSA <10 ng/mL |
6 |
96 |
Galalae et al. |
T1-T2a, Gleason score =6, PSA =10 ng/mL |
5 |
96 |
Demanes (CET) et al |
T1-T2a, Gleason score =6, PSA =10 ng/mL |
7.25 |
90 |
HDR-BT monotherapy
|
|
|
|
Rodriguez (CET) et al. |
T1-T2a, Gleason score =6, PSA =10 ng/mL |
3 |
97
|
Table 5: High Risk Group Literature
Author
|
Risk Group definitions
|
Median follow-up (y)
|
ASTRO PSA-PFS (%)
|
3D EBRT
|
|
|
|
Hanks et al. |
T2b-T3, Gleason score 7–10 |
9 |
|
|
PSA <10 ng/mL(unfavorable) |
|
62 |
|
PSA 10–20 ng/mL(favorable/unfavorable) |
|
44/56 |
|
PSA >20 ng/mL |
|
14 |
Kupelian et al. |
T2b, Gleason score =7, PSA >10 ng/mL |
4 |
76 |
Pollack et al. |
PSA >10 ng/mL |
5 |
62 |
Zelefsky et al. |
T3, Gleason score =7, PSA =10 ng/mL |
3 |
|
|
Intermediate: one factor |
|
79 |
|
High: two or more factors |
|
~55 |
IMRT
|
|
|
|
Zelefsky et al. |
T3, Gleason score =7, PSA =10 ng/mL |
2 |
|
|
Intermediate: one factor |
|
86 |
|
High: two or more factors |
|
81 |
Seed monotherapy
|
|
|
|
Blasko et al. |
T3, Gleason score 7–10, PSA >10 ng/mL |
5 |
|
|
Intermediate: two factors |
|
84 |
|
High: three factors |
|
54 |
Kupelian et al. |
T2b, Gleason score =7, PSA >10 ng/mL |
4 |
64 |
Beyer et al. |
=T2b, Gleason score =7, PSA >10 ng/mL |
4 |
|
|
Intermediate: one factor |
|
~77 |
|
High: two or more factors |
|
~55 |
Kollmeier et al. |
Intermediate: T2bc, Gleason score 7, PSA |
6 |
81 |
|
>10–20 ng/mL |
|
|
|
High: 2 or more intermediate risk factors or Gleason score 8–10 or PSA >20 ng/mL |
|
65 |
EBRT + seeds
|
|
|
|
Kupelian et al. |
T2b, Gleason score =7, PSA >10 ng/mL |
4 |
~75 |
Blasko et al. |
T3, Gleason score 7–10, PSA >10 ng/mL |
5 |
|
|
Intermediate: one factor |
|
85 |
|
High: two or more factors |
|
62 |
Ragde et al. |
>T2b and/or Gleason score =7 |
10 |
79 |
Sylvester et al. |
=T2c, Gleason score =7, PSA >10 ng/mL |
5 |
|
|
Intermediate: one factor |
|
77 |
|
High: two or more factors |
|
47 |
Dattoli et al. |
>T2b, Gleason score =7, PSA >15 ng/mL |
4 |
79
|
|
One or more factors |
|
|
EBRT + HDR-BT
|
|
|
|
Eulau et al. |
T2c-T3, Gleason score 7–10, PSA >15 ng/mL |
6 |
|
|
Intermediate: one or two factors |
|
72 |
|
High: three factors |
|
49 |
Martinez et al. |
T2b-T3, Gleason score 7–10, PSA =10 ng/mL |
4 |
87 |
|
High-dose group |
|
|
Galalae et al. |
=T2b, Gleason score =7, PSA =10 ng/mL |
5 |
|
|
Intermediate: any one factor |
|
88 |
|
High: any two factors |
|
69 |
Demanes (CET) |
Intermediate: T2bc, PSA >10, =20 ng/mL, Gleason 7 |
7.25 |
87
|
|
High:T3, PSA >20 ng/mL, Gleason score 8-10 |
|
69 |
Back to Top6. Late RTOG Toxicity Rates (Side effects)
The following excerpt is the full RTOG late (chronic) toxicity analysis reported in our long term results paper). Late RTOG toxicity is defined as any symptom lasting 90 days or more. These are therefore "worst case" rates, because given more time, the toxicity resolved in most cases.
"TURP for benign prostatic hypertrophy had been performed in 36 (17%) patients prior to treatment. There were 17 (8%) patients with pre-existing urethral strictures (7 had prior TURP and 10 did not). Grade 2 morbidity (urinary events managed by office-based or medical interventions) occurred in 16 (7.7%) patients. Grade 3 morbidity, consisting in all but one case of bulbo-membranous stricture, occurred in 14 (6.7%) patients. Nine patients had single interventions after brachytherapy (2 urethral dilatations, 4 urethrotomies, 1 perineal urethroplasty, 1 bladder fulguration, and 1 artificial sphincter) and 5 patients had multiple procedures. Five of the Grade 3 patients had undergone a TURP before treatment. Grade 4 morbidity was the consequence of extensive TURP in 2 (1%) patients. Finally, there were 8 (3.8%) cases of urinary incontinence and they occurred only in patients who had TURP before or after treatment.
RTOG late GI morbidity was uncommon. There were 5 (2%) cases Grade 1 and 5 (2%) cases Grade 2 late GI morbidity. Grade 1 morbidity consisted of self-limited rectal bleeding that resolved without treatment and Grade 2 were cases of radiation proctitis that resolved with local treatment. There was no Grade 3 or 4 GI morbidity. "