Tag Archive for: AIDS

Using a Call Center to Expand Mobile HCT and Link HIV-positive Individuals to Care

by Michiel Adriaan van Zyl, Ph.D1, Leslie Lauren Brown, LCSW1,  Kathryn Pahl, Ph.D2,

1Kent School of Social Work, University of Louisville, 502-852-2430 | 2Shout-It-Now, Cape Town, South Africa

Engaging newly diagnosed HIV individuals in treatment is a significant global challenge. As South Africa expands HIV counseling and testing (HCT) services, the growing numbers of people diagnosed with HIV will need innovative links to care approaches in order for treatment to be most effective. While definitions vary, we have defined  “linkage to care” as connecting an HIV+ individual to medical care so that CD4 cell test results are obtained and ART eligibility assessed. The current study reveals findings from a non-governmental organization’s “Links to Care” program. A two-pronged expanded HCT service was used, which included a community outreach approach to address HIV testing and a call centre to track each patient’s linkage to care post HIV diagnosis. In the evaluation sample (n=1096), all participants were diagnosed as HIV positive, from either Limpopo or Gauteng provinces, with 95.5% of individuals reportedly being newly diagnosed. The majority of individuals (51%) were linked to care with a mean time to linkage of 31 days (with most individuals linked in less than 14 days). More females (54%) were linked to care than males (47%), and had higher CD4 cell counts than males; females had a mean CD4 cell count of 440, while males took longer to link to care and had a lower mean CD4 cell count of 331. Females 23 or younger had the lowest linkage rate of all females. Success rate differed by region, with 46.5% of individuals being linked to care in the rural area compared to 28% in the urban area. Reasons for the failure to link individuals to care are analyzed. Findings suggest that expanding HCT services to include innovative links to care approaches can improve linkage to care and subsequently impact HIV prevention.

Despite successful preventative efforts, South Africa continues to have the largest HIV/AIDS epidemic in the world (Mall, Middlekoop, Mark, Wood, & Bekker, 2013; van Rooyan, Barnabas, Baeten, Phakathi, Joseph, Krows, Hong, Murnane, Hughes, & Celum 2013). In order to maximize preventative efforts and curb transmission rates, more innovative and empirically sound interventions are needed. Research has shown HIV counseling and testing (HCT) to be instrumental in HIV prevention, yet very little literature has focused on ways to improve HCT and to link individuals to services after their HIV diagnosis. Mobile HCT enhances the traditional HIV testing model (typically only offered in health facilities) by providing community outreach and personalized HIV testing services; this approach aims to serve difficult-to-reach populations. Although antiretroviral therapy (ART) availability is integral to HIV treatment and prevention, the effectiveness of ART can be contingent upon the timeliness of linkage to care (Losina, Bassett, Giddy, Chetty, Regan, Walensky, Ross, Scott, Uhler, Kats, Holst, Freedberg, 2010), underscoring the need to improve services that impact linkage to care.

Preventative efforts are further confounded by the large number of individuals who are aware of their status but not engaged in care (De Koker, Lefevre, Matthys, van der Stuyft, & Delva, 2010; Van Zyl, Barney & Pahl, in Press). A recent study in a township in Durban, South Africa, found that 10% of participants in a community-based mobile testing linked to care after receiving an HIV diagnosis (Bassett et al, 2013). Because treatments such as ART rely on linkage to care, the effectiveness of HCT cannot be measured only by the numbers of those receiving HIV tests but rates of linkage to care should also be considered.

ART reduces morbidity by improving patient survival (Crum, Riffenburgh, Wegner, Agan, Tasker, Spooner, Armstrong, Fraser, & Wallace, 2006; Giordano, Gifford, White, Suarez, Almazor, Rabeneck, Hartman, Backus, Mole, & Morgan, 2007; Mayer, 2011), lowering infectiousness (Andrews, Wood, Bekker, Middlekoop, Walensky, 2012), and curbing secondary transmissions due to suppressed HIV loads (Jenness, Myers, Neaigus, Lulek, Navejas, & Raj-Singh, 2012 according to Quinn, Wawer, Sewankambo, Serwadda, Wabwire-Mangen, Mehan, Lutalo & Gray (2000). The scale up of ART has revolutionized AIDS responses by producing large-scale results, having already saved 9 million life-years is sub-Saharan Africa (UNAIDS, 2012), it is projected to save many more, since 70 percent of individuals maintaining ART treatment are expected to be alive after five years (Verguet, Lim, Murray, Gakidou, & Salomon, 2013). Conversely, delayed ART initiation is associated with hastened mortality and continued risky behavior, thus weakening preventative efforts. With the majority of individuals beginning ART too late, many researchers report an increased chance of lower CD4 counts. Fairall et al. (2008), Keiser et al. (2008), & Kigozi et al. (2009), Bassett et al., 2010 (as cited in Mugglin, Estill, Wandeler, Bender, Egger, Gsponer, & Keiser, 2012). Lawn et al. (2005) and Losina et al. (2010) (as cited in Kayigamba, Bakker, Fikse, Mugisha, & Asiimwe, 2012) report an increased risk of opportunistic infections and ultimately higher mortality rates. Even delaying treatment a few months can be deleterious for individuals with low CD4 cell counts, considering that more than half of the individuals who seek ART for the first time have a CD4 count <100 (Larson, Brennan, McNamara, Long, Rosen, Sanne, & Fox, 2010). A Center for Disease Control and Prevention (CDC) study in the United States found that almost 30% of HIV+ individuals delayed entry to medical care more than three months after being diagnosed (Reed, Hanson, McNaghten, Bertolli, Teshale, Gardner, & Sullivan, 2009).

The goal of the current study was to determine trends associated with linkage to care when using mobile HCT combined with a call-centre approach. Specific questions addressed were: (1) What percentage of HIV positive individuals reached in the mobile HCT program linked to care, and how does the demographic data vary for those linked to care and those not linked to care? (2) How long does it take to link clients to care and are rates in establishing linkage to care similar or different for various demographic groups? 

The Call-Centre Approach

Each day, the call centre receives the names and contact information of HIV+ individuals tested during mobile HCT outreach. Staff at the call centre make follow-up telephone calls and provide information on clinic enrolment in a supportive and understanding manner, while emphasizing the importance to link to care. Call centre staff continue to be in contact with patients until they are successfully linked to a clinic. This call centre approach is different from the way other call centres operate in the sense that most of the calls are outbound, though not exclusviely so: Individuals can also call into the call centre or send a free “please call me” text message, whereupon a call centre staff member will contact them.


Research Setting and Procedures                                                                                                    

 This secondary data analysis reviews data collected during seven months of operation of an HIV Linkage to Care call centre. The sample was 1096 individuals who tested HIV+ during mobile HCT outreach in Limpopo and Gauteng provinces in South Africa between April 1, 2012 and October 31, 2012; data was gathered for an additional five weeks after the end of October in order to track success in linking individuals to care. “Linkage to care” was defined as connecting an HIV+ individual to medical care so that CD4 cell test results are obtained and ART eligibility assessed. Individuals whose CD4 cell count remained high were encouraged to continue counseling and follow-up services, while those meeting eligibility requirements were encouraged to begin ART. De-identified administrative data of the call centre was provided for analysis. The institutional review board at the University of Louisville in Louisville, Kentucky, U.S.A., reviewed this secondary study for human subject protection.

Analytical Procedures

Descriptive statistics were used to describe the sample, and differences between various cohorts were detected by means of t-tests and CHAID (Chi-squared Automatic Interaction Detector) analysis (e.g. age, gender, and language). CHAID is in essence a tree classification method that relies on the Chi-square test to determine the best next split at each step of the classification tree. CHAID is similar to regression analysis in that it selects the predictors that best account for the variance in a dependent variable, in this case linkage status (e.g. linkage status “linked” or “not linked”). However, CHAID goes a step further by identifying those variables (e.g. age, gender, and language) that most differentiate clients who are linked to care from those who are not linked to care.

Description of sample demographics                                                                                             

The majority of individuals (68.7%) referred to the call centre during the study period were female and 31.3% were male. The mean age was 32.82 years (SD=10.29) with males having a higher mean age (M=35.82; SD=10.9) than that of females (M=31.46;SD=9.71)(t(1092)=6.626; p<.001). English was the language of choice with 41.9%, followed by Sepedi (36.5%), Sesotho (4.9%), Tsonga (3.5%), Venda (2%), Setswana (1.6%) and isiZulu (.2%).The language was not known for 7% and the category “other language” accounted for less than 1%. The team from the Gauteng province call centre was responsible for 621 (56.7%) of individuals and the team from the Limpopo province dealt with 475 (43.3%) of individuals.


Percentage of individuals linked to care and trends in rates and demographics differences  

Linkage to care was established for 563 (51.4%) of HIV+ individuals referred to the call centre during the seven-month sample. The large majority (95.9%) of those for whom linkages were established were newly diagnosed as HIV+. The HIV status of those not linked to care prior to current testing was unknown. More females (403 or 53.5% out of 753) than males (160 or 46.61% out of 343) (χ2=.037;df=1) were linked to care. The mean age (M=31.1; SD=9.96) of individuals linked to care was similar (M=32.52; SD=10.63) to those not linked to care (t(1092)=.936, p=.339). This was true for both males (linked M=36.5; SD=11.43; not linked M=35.24; SD=10.41; t(340)=1.059, p=.29) and females (linked M=31.77, SD=8.98; not linked M=31.1,SD=10.48; t(750)=.938,p=.349). The mean CD4 count for males was 331.06 (SD=205.53) and for females 439.72 (SD=231.75) (t(548)=5.107 , p<.001), indicating that males require on average more immediate ART intervention than females.

Using  the linkage status (Figure 1) (linked and not linked) as the dependent variable and age, gender and team” (urban and rural) as the independent variables, a CHAID analysis was conducted to identify significant differences in cohorts defined by categories of the independent variables. “Rural” referred to an area in Limpopo Province with intermediate population density, and “urban” to a metropolitan area in Gauteng Province. Gender was the most predictive of the three independent variables and more females were linked to service (53.5%) than males (46.6%). For females, age contributed to differences in linkage status. A relatively high percentage of people in the age group 33-43 was linked to care (61.8%), which was also true for people for whom age was unknown. Females under 23 was the female age group with the lowest percentage linked to care (41.8%). For this group, the rural team linked more (52.3%) young females to care than the urban team (32.9%).

Reasons for not being linked to care within 30 days of referral to the call centre were recorded for 442 people (See Figure 2). These individuals could have been linked to care within the seven-month period of the study, but the 30-day cut-off was used to identify a group at risk for not being linked to care. Of the 442 individuals, most (56.3%) were contacted many times but they still did not follow through with a visit to a clinic. The second reason most often recorded (17.9%) for not being linked to care in this group was incorrect contact information. Just over fourteen percent (14.5%) asked not to be called and 11.3% had no telephone, the only means available for the call centre to contact individuals. Reasons for not being able to link to care differed significantly for the two call centres. The urban team made a higher percentage of repeated attempts to contact individuals without success (64.7%) than the rural team (44.9%). The rural team had a higher percentage of individuals referred to them without a telephone (20.3%) as opposed to the urban team (4.7%).                                   

Time to link different cohorts of individuals to care

The mean time to link individuals was 31.1 days (SD=28.6).  A CHAID analysis was done by using “days to link” (with two categories) “0-14 Days” and “>14 Days” as the dependent variable. Independent variables were “month linked”, “CD4 count” and “team” (urban and rural) (See Figure 3). It took longer than 14 days for most people to be linked during April, May, and June (77.5% were linked after 14 days), but this changed for later months. During August, September, and October, 50.9% were linked to services within 14 days. The success achieved in later months, however, was not equal across all cohorts. Those with CD4 cell counts of <350 were not linked within the same time frame as those with higher CD4 cell counts during the period August to October. About 40% of individuals with low counts were linked within 14 days in August, September, and October, in contrast to about 60% of those with higher CD4 counts linked in the same timeframe. There were no differences in CD4 cell-count groups during other months. July appears to be a transitional month in the sense that the percentage of individuals linked within 14 days was higher than the percentage during the months of April, May, and June, but lower than the percentage of individuals linked during August, September, and October. July was also the only month with a significant difference in the percentage of individuals linked to care by the urban and rural teams, with the rural team linking 46.5% of individuals and the urban team 28%.

Summary and Discussion:              

Results (just over half of HIV+ individuals linked to care) compared favorably with previous studies on mobile HCT, which reported lower linkage to care despite services being free and in close proximity (Krazner, Zeinecker, Ginsbert, Orrell, Kalawe, Lawn, Bekker, & Wood, 2012; MacPherson, Corbett, Makombe, van Oosterhout, Manda, Choko, Thindwa, Swuire, Mann, & Lalloo, 2012). Bassett (2013) reported only 10% linkage to care for community-based mobile testers, although from a larger sample size and with a shorter collection period. Krazner et al. (2010), using a measurement period of six months, found higher rate of linkage among individuals testing through sexually transmitted infection services (84.1%) but found linkage rates for individuals testing on their own initiative were similar to what we found (53.4%).  Losina et al. (2010) found that although about 54% engaged in CD4 cell testing, only about 47% of those returned for their CD4 cell results. In our study, 4.1% of individuals reported having previously been diagnosed as HIV positive, while other similar studies found 6.5% (Larson et al., 2010) and 36% (van Rooyen et al., 2013) of their sample had been previously diagnosed.

Most individuals referred to the call centre during the data collection period were female, who were more likely to link to care than males; yet females under 23 had a lower percentage linked to care than all other female age groups (with the rural team linking more young females to care than the urban team). While the average amount of time for a patient to link to care was 31.1 days, linkage to care varied by month, with more individuals linking in less than 14 days during August, September, and October. July was also the only month with a significant difference in the percentage of individuals linked to care by the urban and rural teams, and more individuals linked to care in the warmer climate during this month. The average CD4 count for males was lower than females, indicating that more males in the group that were linked to care required ART interventions sooner than females.

There are several limitations to this study. First, the study does not include a comparison of linkage rates before and after the call centre approach was implemented. The study design does not include a comparison with other approaches to linking HIV+ individuals to care. Second, the study included only two provinces of South Africa. Findings cannot be generalized to other regions with different demographic compositions and infection incidence. Third, the study involved individuals in a seven-month timeframe. One possible seasonal factor was identified: individuals in a colder climate during the middle of winter linked at a lower rate. A study over 12 months may detect other possible seasonal patterns. Fourth, limited information was obtained for the group not linked to care. 

This study also had strengths. The sample size (n=1,096), although not large, is larger than many other HIV linkage to care studies that tracked individuals for a comparable length of time: Losina et al. (2010) with 454 HIV-positive patients in the Durban area; Rooyen et al. (2013) with 201 HIV-positive patients (also in KwaZulu-Natal). The sample we studied had the added advantage of diversity, as participants were from both urban and rural areas and included participants with a wide range in age and languages.

Study findings offer useful insights that can inform future studies about trends associated with linkage to care as well as delayed entry to care. Future studies should focus also on factors associated with delayed entry to care (Reed et al., 2009) and HIV+ individuals with CD4 counts of <350, who seem to take longer to link than individuals with a higher count. Also, as definitions of successful linkage to care presently vary, future studies should use a standardized definition of delayed care (Finnie et al., 2009); without such a benchmark, an important piece of HIV prevention may remain in the shadows.

Secondary data analysis to determine the reliability and validity of an adolescent HIV risk screening questionnaire.

By M.A. van Zyl, C. Studts, K Pahl



Identification of adolescents at high risk for HIV infection in South Africa is a key component of current and future prevention efforts. In 2011, a non-governmental organization (NGO) administered a 12-item self-report risk measure to adolescents (N = 3,872) in South Africa as part of an innovative voluntary testing and counseling (VCT) intervention. Secondary data analyses employing item response theory (IRT) methods assessed the original 12-item measure, a reduced 7-item measure used by the NGO staff, and a 5-item measure developed in the current study. The 5-item measure demonstrated acceptable levels of reliability and validity, with all items discriminating sufficiently between adolescents at different levels of risk.

However, both uniform and non-uniform differential item functioning (DIF) were revealed as problems: items performed differently with groups differing by age, ethnicity, and gender.

Consequently age-, ethnicity-, and gender-specific percentile-based norms were developed. The IRT analyses also highlighted the extremely high levels of risk required for adolescents to select the highest response option (on four-point Likert-type scales) for each item. This was related to 14.5% of adolescents in the sample—primarily comprised of lower-socio economic high school students in in three regions of South Africa—indicating that they engaged in behaviors much more risky than the behaviors of their peers. Implications for policy and practice are discussed.

The identification of adolescents with high risk of contracting HIV offers the potential of providing preventative interventions targeted at the high risk HIV-negative adolescent population. A 12-item measure to identify adolescents at high risk for HIV infection was administered in 2011 by a non-governmental organization (NGO) as part of innovative voluntary testing and counseling (VCT) intervention to a group of adolescents (N = 3,872) in South Africa. NGO staff subsequently shortened the measure from 12 items to 7 items. This study retrospectively investigates the validity and reliability of the risk screening measures and proposes a new, briefer 5-item measure.



Promising results of a pilot study (Van Zyl, Barney & Pahl, under review) of an innovative HIV prevention program targeted at adolescents, Shout-It-Now, led to wide scale implementation of the program in South Africa. Computers and internet access were made available to schools and community settings by the Shout-It-Now program and its sponsors, allowing adolescents to individually access online program content related to HIV prevention. Adolescents participated by viewing an online video of South African celebrities talking about issues related to HIV/AIDS prevention, including (a) condom usage, (b) Voluntary Counseling and Testing (VCT), (c) safer sex, and (d) responsible decision-making. Celebrities representing a variety of South African ethnic groups provided information in English, interspersed with local vernacular. The styles of the videos were similar to MTV television programs, using popular music and attention-retaining visual material. During the video a number of ‘pop-up’ questions appeared to reinforce messages being conveyed in the video program. The video took about 12 minutes to view, in line with anticipated attention spans and efforts to reduce participant burden.

Following the video presentation, adolescents were invited to participate in VCT. No coercion was placed on those who declined. Participants who had questions or concerns regarding the HIV test were invited to speak with a trained counselor. All who agreed to be tested for HIV were given a confidential one-on-one counseling session with a trained VCT counselor, followed by an HIV test with a registered nurse. In cases of conflicting or undefined results, a confirmatory test was conducted. Testing was conducted in accordance with UN guidelines and regulations of the South African government. On the same day as testing, participants were given their results within the framework of a confidential post-test counseling session. During this session, risk reduction strategies were discussed. Participants who tested positive for HIV were referred to appropriate treatment centers, care and support services, and were invited to access the program’s 24-hour hotline. Following the post-test counseling session, participants were given compensation such as music and cell phone minutes.

In implementing the Shout-It-Now intervention, program staff perceived the need to identify those at higher levels of risk and to offer additional services to them. To meet this need, a risk questionnaire was developed by program staff. Questions focused on three types of risk behaviors: risky sexual behaviors (condom use, number of sexual partners and being forced to have intercourse), alcohol and drug use, and absenteeism. These risk behaviors are related: Du Rant et al. (1999) described strong correlations among adolescent health risk behaviors, specifically early onset smoking and use of other substances (alcohol and drugs) with absenteeism and poor academic performance. These associations were evident across socio- demographic groups. Similarly, Guttmacher et al. (2002) identified correlations between school absenteeism and other adolescent risk taking behaviors. Also, the surveillance system of the Center for Disease Control for sexual behaviors that contribute to unintended pregnancy and sexually transmitted diseases, including HIV infection, monitor two of the three types of risk as indicators of high risk behavior: (i) drug and alcohol use before sexual intercourse, and (ii) risky sexual behaviors regarding condom use and number of sexual partners (CDC, 2010).

An initial set of 20 items intended to measure the three types of risk behaviors was compiled by the NGO staff and reviewed in focus group discussions with adolescents aged 14 to 18 years old. Twelve of the twenty items emerged with consistently shared meaning in discussions and included items measuring all three types of risk behaviors. As indicated in Table 1, the 12 items had response options on semantic differential scales of varying lengths (i.e., several questions had 4 response options, others had 3 response options, etc.). The 12 items were included as a screening measure in the online program and administered to 3,872 adolescents in two cities (Johannesburg and Cape Town) and one rural area (Burgersfort, Limpopo) in South Africa. The NGO program staff subsequently decided to use only 7 of the 12 items that they regarded, based on face validity, as the most important to determine risk. Different weights were allocated to response options according to the clinical team’s perceptions of relative risk associated with each response option. The summed score on the questionnaire was used to identify those with high risk for contracting HIV. No formal investigations of the psychometric properties of the 12-item or 7-item measures were conducted.

Table 1: Risk Assessment Questions and corresponding 7-item scale and 5-item scale items




Original 12-item questionnaire 7-Item Scale


5-item Scale


Item Response options and Weights Cronbach

alpha = .74


alpha = .79

1 If you have a boyfriend or a girlfriend, please tell us: 0 I don’t have a boyfriend or girlfriend

1 She or he is not more than 5 years older than me

3 She or he is more than 5 years older than me

2 How often do you have three or more drinks at one time? 0 Never

1 Less than monthly 2 Monthly

3 Weekly

4 Daily or almost daily

Item 1  
3 Have you ever been forced to have sex when you did not

want to?

0 No, never.

2 Yes, but only once or twice 4 Yes, it happens often

4 How many days did you bunk school in the last year? 0 Never

1 Once or twice

2  Between 3 and 5 days

3  Between 5 and 10 days

4  More than 10 days

Item 3  
5 How do your parents feel about you bunking school? 0 They don’t allow it at all 1 They allow it occasionally

3 They allow it whenever I want to bunk

4  They don’t care if I bunk school

6 How many times in the last year did you have sex without a condom? 0 Never

2 Once

3 A few times 4 Many times

Item 4 Item 1
7 How many times in the last year did you have sex while drunk? 0 Never

2 Once

3 A few times 4 Many times

Item 2 Item 2
8 How many times in the last year did you have sex while high on drugs? 0 Never

2 Once

3 A few times 4 Many times

  Item 3
9 How many times in the last year did you have sex for gifts or favours? 0 Never

2 Once

3 A few times 4 Many times

Item 5 Item 4
10 How many people have you had sex with in the past 6 months? 0 None

2 One

3 Two or three

4 More than 3 people

Item 6 Item 5
11 How often do you smoke dagga? 0 never

2 occasionally

3 daily

12 How often do you use any

other drugs (e.g. tik, crack, cocaine, sniff glue, etc)

0 never

2 occasionally

3 daily

Item 7  

Study Aims

The purposes of this study were two-fold: first, to determine the reliability and construct validity of the 7-item HIV Risk measure used by the NGO; and second, to determine if a brief risk measure with better psychometric properties could be developed from the initial 12-item questionnaire. The research was conducted to inform the NGO about the reliability and validity of the measure being used to determine which adolescents were at high risk for HIV infection.

Development of the original risk measure had two primary limitations. First, determination of items to be included in the measure was guided primarily by the face validity of questions. Face validity alone is insufficient to justify wide scale implementation of an instrument measuring a high stakes construct such as risk for HIV infection.

Second, the 7-item measure developed by the clinical team used a weighted total score to identify high risk adolescents. There are several problems associated with assigning weights to different response options. Cognitive bias in this scoring approach may be a problem. For example, one may believe or even have evidence that a certain drug (e.g., methamphetamine) is more detrimental to health than another substance (e.g., alcohol), and therefore regard users of methamphetamine as having higher risk for HIV infection compared to alcohol users. However, there is a cognitive bias in this perception, related to generalizing one type of health risk to another. Giving a higher weight on a risk scale to a question that asks about the use of drugs as opposed to alcohol may be intuitively appropriate, but not empirically sound. Another problem with the weighted scoring approach used for the 7-item risk questionnaire stems from possible range compression, related to the clinical team’s subjective assessments of the amount of risk associated with each question’s response options. Range compression occurs when response options are limited to a small number of outcomes or possibilities, when in fact a much wider range of options are possible or likely. Consequently, there is a significant loss of precision in the assessment.

To address these limitations, a formal psychometric assessment of the risk measure was conducted. Traditional psychometric analyses were complemented with item response theory (IRT) analyses to (1) obtain detailed item- and test-level information about the performance of the risk measures, and (2) investigate the possibility of developing a briefer, psychometrically sound risk measure from the original pool of 12 items.


This study was conducted with existing data provided by the NGO. The data were from a sample of 3,872 adolescents in two cities (Johannesburg and Cape Town) and one rural area (Burgersfort, Limpopo) in South Africa. Students in grades 8 to 12 from six public secondary schools in lower socio-economic communities were invited to participate in the intervention.

These schools voluntarily participated in Shout-It-Now during a four month time period in 2011. The NGO reported a 93% participation rate of all students present on the day the Shout-It-Now program was delivered. In addition to the six schools, data were also obtained from program outreach at a shopping mall, at which adolescents in grades 8 through 12 were eligible to  participate and were recruited by…?. This strategy added to the diversity of the sample, ideal for validation studies where the aim is not to describe the characteristics of a specific population, but rather to determine the psychometric qualities of an instrument. Approval and oversight to conduct research using these de-identified data was obtained from the University of Louisville’s Institutional Review Board.

The original 7-item risk measure was scored using a summative model, adding the weighted scores for each item response to obtain a total score. This scoring approach assumes that the measure is unidimensional. The unidimensionality assumption was tested using exploratory factor analysis (EFA) on the 7-item scale. Reliability and validity of the 7-item scale were also assessed, including each item’s mean corrected item-total correlation and the measure’s overall internal consistency reliability (i.e., Cronbach’s alpha).

Next, the full set of 12 items was analyzed to determine whether a brief risk measure with good psychometric properties could be developed. First, the factor structure of the 12 questions was determined using EFA. An iterative process of determining factor structure and interpreting factors was employed to identify a unidimensional construct, which was then assessed for reliability and validity.

This process facilitated the identification of a 5-item risk measure. Item response theory (IRT) analyses were employed to determine the psychometric properties of the newly developed brief risk measure. Item response theory provides more detailed psychometric information than classical test theory and is suitable to address the hypotheses generated in response to the second research question. First, unidimensionality of the 5-item measure was investigated using EFA. Next, the IRT assumption of local independence (i.e., the requirement that items should be statistically independent from one another after controlling for the level of risk (Steinberg & Thissen, 1996; Wainer & Thissen, 1996; Yen, 1993)) was assessed by inspecting the absolute values of residual correlations for each pair of items. A criterion of |r| ≥ .20 (Reeve et al., 2007) was used to determine violation of local independence. Following the testing of IRT assumptions, maximum-likelihood estimation procedures were used to fit a two-parameter logistic model to the data: MULTILOG 7.03 (Thissen et al., 2003) software was used to fit Samejima’s (1968) Graded Response Model and obtain item parameter estimates and estimates of participants’ levels of HIV risk. The amount and precision of measurement information provided by the newly developed 5-point scale were assessed using test information, item information, and item parameter estimates.

Once these psychometric properties were established, differential item functioning (DIF) analysis was conducted to determine if each item and the scale performed consistently across groups categorized by ethnicity, gender, and age. The R package lordif (Choi et al, 2011) was used in these analyses. The lordif package relies upon ordinal logistic regression for uniform and non-uniform DIF detection, employing Monte Carlo procedures to identify thresholds indicating whether items exhibit DIF, minimizing Type I error. In this approach, the impact of DIF on IRT parameter estimates is assessed by comparing model fit between nested ordinal logistic regression models with and without group terms (i.e., for ethnicity, gender, and age). Main effects of group are included to test for uniform DIF, while interactions between group and risk level are included to test for non-uniform DIF. Significant DIF is identified using likelihood-ratio (LR) tests between models. In the ordinal logistic regression approach to DIF-detection, an iterative approach is used in which group-specific parameter and trait estimates are updated and re-estimated until consistent identification of items with DIF over subsequent iterations is achieved (Choi et al., 2011). Monte Carlo simulations were used to determine if empirical threshold values systematically deviated from the nominal level. Two additional DIF analyses were also employed, assessing the magnitude of (a) changes in pseudo-R2 values, and (b) differences in parameter estimates between groups of interest.


The majority of the 3,872 participants were female (54.6%). The largest ethnic group in the sample was Black (87.5%), followed by Coloured (10.8%), White (1%), Indian (0.3%), and Other (0.3%). The mean age was 17.1 years (SD = 4.3), and most participants were in grade 10, with grade 8 (22.5%) and grade 11 (20.2%) also well represented in the sample. Nearly all (97.0%) of participants were seen in a school setting, with the remainder (3.0%) seen in a shopping mall. Most participants (86.4%) were from the wider Cape Town area, with 10.7% from Johannesburg and 3.0% from Burgersfort in the Limpopo Province. The sample was from mainly lower socio-economic areas and was equally split between those whose families did versus did not own a car.

The 7-Item Measure

An EFA of the 7-item risk measure yielded one factor that explained 40.6% of the variance. The 7 items loaded on one factor with an Eigenvalue of 2.84. The Kaiser-Meyer-Olkin Measure of Sampling Adequacy was .83, indicating an adequate sample size for the analysis.

Bartlett’s test of sphericity was highly significant (p<.001). The factor loadings were between .33 and .78. Internal consistency reliability as measured by Cronbach’s alpha was .74. Only three items had relatively high correlations with the total scale score (> .50). The corrected mean item- total correlation was .46 (SD = .14). Applying a 90th percentile cut-off score to determine high risk using the total weighted score approach, 496 (12.8%) were identified as falling into the high risk category.

The Original 12-Item Measure

The original set of 12 items, when subjected to a principle component analysis and varimax rotation, yielded three factors that explained 52% of the variance. The first factor consisted of 6 items, with loadings between .44 and .73. The second factor had 4 items with loadings between .54 and .67, and the third factor had only 2 items with loadings of .78 and .84. The third factor appeared to focus on drug use (“How often do you smoke dagga?” and “How often do you use any other drugs (e.g. tik, crack, cocaine, sniff glue, etc)?”). The first two factors included 2 cross-loading items, and distinguishing between these factors was difficult, as each had items related to having sex, alcohol use, and missing school. Given the difficulty in interpreting the first two factors, it was decided to follow a different approach in deriving a unidimensional measure from the 12 items.

A 5-Item Measure

In reviewing the 12 items for content, 6 items were identified as focusing on the conditions associated with having sex. No central theme could be determined for the remaining 6 items. The 6 items associated with conditions when having sex were subjected to factor analysis. In a principal component analysis of these 6 items, one factor was extracted that explained 48% of the variance. The internal consistency reliability (Cronbach’s alpha) of the 6-item instrument was .77, but one of the items (“Have you ever been forced to have sex when you did not want to?”) correlated poorly with the total scale score (.30). After this item was removed, Cronbach’s alpha of the 5-item risk measure increased to .79, all items correlated at .50 or higher with the total scale score, and the mean corrected item-total correlation was .57.

IRT Analysis of the 5-Item Measure 

Unidimensionality and local  independence of the 5-item risk measure  were supported by EFA results: A single factor was extracted with an eigenvalue of 2.71 that accounted for 54% of the variance, and absolute values of residual correlations  for  each  pair  of  items ranged from .00 to .08. Fitting the Graded Response Model yielded four parameter estimates for each item: a (discrimination), b1 (difficulty threshold between option 1 and option 2), b2 (difficulty threshold between option 2 and option 3), and  b3 (difficulty threshold  between option 3 and option 4). High values of a suggest that items are able to distinguish between participants at similar levels of risk. Item parameter estimates and standard errors for each item are presented in the first column of Table 2.

Table 2: Graded Response Model Item Parameter Estimates for Total Sample and Subgroups

Parameter Estimate (SE)
Item Parameter Total Sample Male Female Black “Other” Age ≤ 19 Age ≥ 20
    (N=3,872) (N=1,757) (N=2,115) (N=3,387) (N =487) (N=3,483) (N=388)
  a 1.53 1.53 1.77 1.5 2.29 1.51 1.56
    (-0.1) (-0.14) (0.14) (0.09) (0.62) (0.1) (0.21)
  b1 -0.21 -0.11 -0.27 -0.19 -0.5 -0.12 -0.73
    (-0.05) (-0.07) (0.07) (0.05) (0.23) (0.06) (0.19)
  b2 1.32 1.32 1.23 1.33 1.15 1.38 1.03
    (-0.08) (0.14) (0.1) (0.09) (0.28) (0.1) (0.23)
1 b3 10.08* 11.91 8.92 10.78 7.98 10.76 8.92
  a 2.56 2.3 2.79 2.51 3.9 2.44 3.16
    -0.18 0.21 0.29 0.18 1.3 0.19 0.55
  b1 0.13 0.07 0.19 0.13 -0.02 0.2 -0.25
    -0.04 0.06 0.07 0.05 0.16 0.05 0.11
  b2 1.42 1.36 1.51 1.41 1.53 1.44 1.31
    -0.07 0.1 0.11 0.07 0.26 0.08 0.15
2 b3 8.18 8.05 9.45 8.34 9.85 8.48 8.99
  a 1.53 1.48 1.59 1.53 1.98 1.44 1.97
    -0.23 0.36 0.31 0.17 1.28 0.27 0.55
  b1 0.07 0.02 0.11 0.09 -0.43 0.13 -0.22
    -0.19 0.12 0.2 0.11 0.53 0.18 0.31
  b2 1.5 1.49 1.48 1.49 1.76 1.59 1.14
    -0.28 0.18 0.26 0.16 0.89 0.4 0.29
3 b3 8.88 9.44 7.87 8.72 14.43 9.68 5.99
  a 1.43 1.46 1.43 1.44 1.5 1.4 1.58
    -0.14 0.17 0.23 0.14 1.06 0.15 0.36
  b1 0.31 0.18 0.5 0.31 0.31 0.34 0.14
    -0.07 0.09 0.13 0.08 0.47 0.08 0.17
  b2 1.76 1.6 2 1.76 1.66 1.82 1.52
    -0.16 0.19 0.31 0.16 0.99 0.19 0.31
4 b3 15.77 14.81 15.44 15.53 20.02 11.89 19.16
  a 0.77 0.93 0.61 0.76 1.05 0.76 0.82
    -0.07 0.1 0.12 0.08 0.46 0.08 0.2
  b1 1.12 0.24 2.56 1.11 1.27 1.18 0.78
    -0.17 0.11 0.59 0.16 0.53 0.18 0.31
  b2 3.31 2.32 5.21 3.37 2.25 3.37 2.97
    -0.54 0.36 1.67 0.51 1.09 0.55 1.03
5 b3 15.62 17.17 17.66 14.81 18.29 14.91 20.07

Three of the five items (items 1, 3, and 4) demonstrated high discrimination (a = 1.43 to 1.53)(Baker, 1985), and one item (item 2) very high discrimination (a = 2.56). Only item 5 had moderate discrimination (a = .77). The lowest difficulty parameter, b1, clustered around one-third of a standard deviation above the mean risk level (M = .28). The lowest b1 parameter estimate was for item 1 (b1 = -.21, se =.05), while item 5 exhibited the highest b1 parameter estimate (b1 = 1.12, se = .07). This range of values suggests that risk levels near the mean were associated with selecting option 1 rather than option 0 on all 5 items.

Notably, the b3 difficulty parameter estimates were extremely high (M = 11.07 standard deviations above the mean). A total of 563 (14.5%) respondents endorsed option 3 on at least one item. Item 4 (b3 = 15.77) was the most difficult of the set, requiring extremely high levels of risk for participants to select the highest response option on this item. However, the two items with the lowest difficulty levels for their upper thresholds were item 2 (b3 = 8.18) and item 3 (b3 = 8.88), also requiring very high levels of risk for endorsement of option 3 versus 2. The extreme standardized scores have large standard errors, and the values should be interpreted relative to the mean of 0 and other large parameter estimates, and not in terms of absolute values.

Information for measuring risk with the total number of 5 items was higher than the standard error (i.e., most precise) from approximately 1.0 standard deviation below the mean to about 2.6 standard deviations above the mean. The test information curve peaks from 0.2 to 1.6 standard deviations above the mean, a range appropriate for precise measurement in a screening instrument, as screening should accurately assess those with low levels of risk (such as at one standard deviation below the mean) and those with high risk (such as those with more than 1 standard deviations above the mean).

As some items offered more information than others at similar levels of risk, some items could be omitted. Other aspects of item performance need to be considered before such a decision is made, including the degree to which an item exhibits measurement bias, or DIF.

DIF Analysis of the 5 -Item Measure Ethnic differences.

Differences in item parameter estimates by ethnic groups were investigated using Black versus Other, in which Other included all groups other than Black (i.e., Coloured, White, Indian, and Other). With lordif’s default settings, the program terminated in two iterations. All five items were identified as potentially having DIF. Sparseness due to all items being flagged limited the range of diagnostics possible for detailed DIF analysis. However, it was apparent that the mean slope of the true score functions for all 5 items was substantially lower for Blacks than for Others (1.55 vs. 2.15), indicating non-uniform DIF. The LR χ2 test for uniform DIF, comparing Model 1 and Model 2, was significant for items 1, 2, 4, and 5 (p < .001). This was also true for the 2-df test of non-uniform DIF (comparing Models 1 and 3, p < .001) for the same items. The overall 1-df test was significant for items 1 and 5. The non-uniform component of DIF revealed by the LR χ2 test can also be observed in the substantial group .001) for the same items. The overall 1-df test was significant for items 1 and 5. The non-uniform component of DIF revealed by the LR χ2 test can also be observed in the substantial group differences of the slope parameter estimates for item 1 (2.29 vs. 1.50), item 2 (3.90 vs. 2.51), and item 5 (1.07 vs. .76). When weighted by the focal group trait distribution, the expected impact of DIF as reflected by McFadden’s pseudo-R2 measures varied across items from .02 to .08 with a mean of .05 for R2 . The impact apparent in the R2 was less and varied from .00 to .03 with a mean of .01. The percent change in β1 for all 5 items tended to be small (M = 4%), with a maximum difference of 11% noted for item 2, which is higher than the frequently used criterion of 10% change in β1 to conclude that DIF exists (Crane et al., 2004).

The mean Monte Carlo probability threshold values associated with the χ2 statistic across items were .008, .01 and .01 for χ212 (testing for uniform DIF), χ213 (testing for non-uniform DIF), and χ223 (testing for DIF overall while controlling Type I error), respectively. On average, the empirical threshold values for the probability associated with the χ2 statistic were close to the nominal α level. The Monte Carlo simulation results confirmed that the LR χ2 test maintains the Type I error adequately in this dataset.

Gender and age. Similar analyses were conducted for two other comparison groups, categorized by gender and age. For these analyses, age was categorized as younger (≤ 19 years) or older (≥ 20 years). The program terminated for gender in two iterations, flagging all items. Similarly, the program terminated for age in five iterations, flagging all items.

The mean slope for the true score functions was lower for males in comparison to females (1.54 vs. 1.64) and substantially lower for younger compared to older subjects (1.51 vs. 1.82), indicating non-uniform DIF. The LR χ2 test for uniform DIF, comparing Model 1 and Model 2 (χ2 ), was significant for all items for gender and items 1, 2, 4, and 5 for age. The 2-df test (χ2 ) for non-uniform DIF (comparing Models 1 and 3, p<.001) was significant for all items for both gender and age. The overall 1-df test (χ2 ) was significant for items 1, 3, 4 and 5 in the case of gender, and items 1, 2 and 5 for age. The non-uniform component of DIF revealed by the LR χ2 test can also be observed in the difference of the slope parameter estimates; for example, in the case of age for the younger vs. older groups, respectively, for item 2 (2.44 vs. 3.16), item 3 (1.44 vs.1.97), and item 5 (1.51 vs.1.82). The McFadden’s pseudo-R2 measures varied across items from .0025 to .0231, with a mean of .01 for R213 and .0037 to .0195 with a mean of .013 for R213 respectively for gender and age. The impact apparent in the R223 was also small, with a mean of .003 for gender and age. When aggregated over all the items in the test, differences in item characteristic curves may become small due to canceling of differences in opposite directions.

However, this does not mean that the impact on trait estimates is not of concern.

The theta values (i.e., levels of risk) with the highest information for the total sample and the various groups are reported in Table 3. Maximum item information estimates reflected by theta values of .2, 1.6 were most common in all groups for items 1, 2 3 and 4. Exceptions included lower values (-1.4, .0) for item 1 in the case of younger participants, and for item 2 for the total sample, as well as for older participants in the case of item 3. Information for item 3 for the ethnic “other” group was highest in the 1.8, 3.0 range.

Table 3: Maximum Item Information Estimates and Locations for Total Sample and Groups

Item Total Sample Ethnic: Black Ethnic: Other Young Older Males Females
1 0.65 0.627 1.174 0.634* 0.653 0.657 0.846
2 1.965* 1.646 3.798 1.551 2.458 1.398 1.988
3 0.663 0.666 0.986** 0.59 1.036* 0.618 0.712
4 0.581 0.59 0.644 0.558 0.703 0.607 0.579

a Theta


values incr


ease in steps


4,0.0 **


1.8,3.0 *




Total mean and percentile scores (90, 93, 95 and 99) on the 5-point risk scale for eight different groups (age by ethnicity by gender) are presented in Table 4. Scores varied widely between the groups. For example, the mean score for older female subjects of Other ethnicity is 5.10, but for younger females in this ethnic group, the mean score is only 0.41. The 90th percentile scores for Other ethnicity ranges from 1.00 for younger females to 11.00 for older males and females. For older Blacks, the impact of gender on scale total scores appears small, but for younger Blacks, gender has a significant impact (M = 3.12 for males vs. 2.08 for females; 90th percentile = 8.00 vs. 5.00, respectively).

Table 4: Mean and percentile scores for 5-Item Risk Scale by age, ethnicity and gender.

5-Item Risk Scale




Standard Deviation Percentile 90 Percentile 93 Percentile 95 Percentile 99
Age ≤ 19 years Ethnicity Black Gender Male 1472 3.12 3.28 8.00 9.00 10.00 13.00
          Female 1745 2.08 2.54 5.00 6.00 7.00 11.00
      Other Gender Male 204 .89 2.27 4.00 5.00 6.00 9.00
          Female 264 .41 1.24 1.00 3.00 3.00 7.00
  ≥ 20 years Ethnicity Black Gender Male 74 4.58 3.60 9.00 10.00 12.00 15.00
        Female 96 3.48 3.59 9.00 10.00 11.00 15.00
    Other* Gender Male 7 3.86 3.98 11.00 11.00 11.00 11.00
        Female 10 5.10 4.23 11.00 14.00 14.00 14.00
















*Due to skewness in distribution and small cell n, percentile scores equivalent to the “Black Older Group” are recommended for use to differentiate high risk individuals.

Using the age-, ethnicity-, and gender-specific 90th percentile as a criterion to determine high risk with the 5-item measure, 499 (12.9%) adolescents were identified as falling into the high risk group. This proportion is equivalent to the 496 (12.8%) identified as high risk using the summative weighted scoring approach for the 7-item measure. However, only 313 (8.1%) cases were identified by both scales as high risk, and the inter-rater agreement between the two scales was significant, but not high (Cohen’s kappa = 0.57; p < 0.001).


The 7-item risk measure used at present is unidimensional with acceptable reliability for group administration, but its internal consistency is inadequate for differentiating between individuals (≥.65 is required at the group comparison level and ≥.80 at the individual level; Nunnally, 1994). In addition, the validity of the 7-item scale as measured by corrected mean item-total correlation (.45) was relatively low; a corrected mean item-total correlation of at least .50 is desired (Hudson, 1982).

In contrast, the 5-item risk measure was unidimensional, valid (mean corrected item-total correlation ≥.50), and reliable enough for use in differentiating individual levels of risk (Cronbach’s alpha = .79). Further, the 5-item scale was 2 items shorter and more valid (.57 vs .46) and reliable (.79 vs. .74) than the 7-item instrument. A reduced number of items selected from the 12-item HIV Risk Questionnaire was therefore incorporated into a new instrument with improved psychometric characteristics in comparison to the 7-item measure currently in use.

All 5 items of the newly developed measure discriminated sufficiently between subjects at different levels of risk. For all 5 items, extremely high levels of risk were necessary for a subject to select the highest option on the four-point scale. The extremely high level of risk required for the highest scores to be endorsed is an indication of range compression. An additional response option between “A few times” and “Many times”—for example, “Several times”—may add to the measure’s precision. The range of precision of the measure is appropriate for a screening instrument.

All items in the 5-point risk measure were flagged for measurement bias or DIF. Both uniform and non-uniform DIF were identified as problems. However, the percentage of subjects with salient score changes and the minimal clinically important difference (MCID) for the risk measure have not yet been determined. Age-, ethnicity-, and gender-specific norms are therefore recommended. The mean scores and percentiles between these groups vary substantially (see Table 4). Of note, the actual number of older subjects of Other ethnicity in the sample is very low. Consequently, percentile scores are not an appropriate way to identify high risk individuals in this group, because a few subjects with high score skew the distribution. Lower thresholds for older adolescents of Other ethnicity, such as those presented for the older Black adolescents, are recommended.


Several study limitations should be considered. A comprehensive literature review is an essential part of the process in scale development (DeVellis, 2012). This study focused on items used in practice to determine risk, and although these questions were supported by some previous studies and the CDC’s surveillance system, a comprehensive literature review was not conducted in scale development or refinement efforts. Also, racial groups were not equally represented in the sample; this is particularly true for older adolescents of Other ethnicity. The most important limitation is the lack of longitudinal data inclusive of HIV status to use as a criterion in determining the predictive validity of the measures analyzed.


These findings, combined with the fact that a less reliable 7-item measure is currently used to measure risk for HIV infection, have immediate implications for practice. A 90th percentile cut-off score for high risk resulted in similar numbers (respectively 499 and 496 for the 5-item and 7-item scales) being identified as high risk, but the agreement between the scales in categorizing adolescents’ risk levels was only moderate. In addition, the extremely high level of risk required for the highest score option on the four-point scale to be endorsed (as reflected by the mean standardized score of 11.07) is alarming. For most domains of behavior, the range of standardized scores is between -3 and 3. In total, 563 participants selected the highest response option on at least one item. This means that in a sample primarily comprised of high school students of low socioeconomic status in South Africa, 14.5% indicated that they engaged in behavior much more risky than the behavior of their peers. Although the measurement error on any single item is higher than the measurement error for the total scale, it is interesting that the percentage of high risk individuals using the 90th percentile cut-off (12.9%) is close to the percentage of participants who endorsed the highest response option on at least one item (14.5%).

VCT is not done routinely in schools in South Africa. In light of the relatively high percentage of adolescents that engage in risky behavior for HIV infection, this policy should be revisited. The newly developed 5-item measure offers a way to identify adolescents at high risk, a first step in providing targeted preventive interventions to the high risk adolescent population. Use of the 5-item screener, including age-, ethnicity-, and gender-specific percentile-based norms, may dramatically improve prevention effectiveness, and will enable additional validation studies and further refinement of this and other risk measures. In addition, the behavior dimension included in the 5-item measure is not different from other approaches used to determine risk for HIV infection. Consequently, it is possible that other measures may also include item bias for various groups. Item bias should therefore be investigated for instruments or questions used to determine HIV risk.



Baker, F. B. (1985). The Basics of Item Response Theory. Portsmouth, NH: Heinemann Educational Books.
Centers for Disease Control and Prevention. Surveillance Summaries, MMWR 2010; 59 (No.SS-5).
Crane, P.K., Gibbons, L.E., Jolley, L., Van Belle, G. (2006). Differential Item functioning analysis with ordinal Logistic Regression Techniques: DIF Detect and difwithpar. Medical Care, 44(11 Supp 3), S115-S123.
Crane, P.K., Van Belle, G., Larson, E.B. (2004). Test bias in a cognitive Test: Differential Item Functioning in the CASI. Statistics in Medicine, 23, 241-256.
Choi, S.W., Gibbons, L.E., Crane, P.K. (2011). lordif: An R Package for Detecting differential Item functioning Using Iterative Hybrid Ordinal Logistic Regression/Item Response Theory and Monte Carlo Simulations. Journal of Statistical Software, 39 (8), 1-28.
DeVellis, R. F. (2012). Scale Development – Theory and Applications (3rd ed.). Los Angeles: Sage
Du Randt, R.H., Smith, J.A., Kreiter, S.R., Krowchuk, D.P. (1999). The Relationship Between Early Age of Onset of Initial Substance Use and Engaging in Multiple Health Risk Behaviors Among Young Adolescents. Pediatric Adolescent Medicine, 153: pp286 – 291
Guttmacher, S., Weitsman, B., Kapadia, F., & Weinberg, S. (2002). Classroom-based Surveys of Adolescent Risk Taking Behaviors: Reducing the Bias of Absenteism. American Journal of Public Health, 92 (2): pp235 -237.
Nunnally, J.C.& Bernstein, I. H. (1994). Psychometric Theory (3rd ed.). New York: McGraw- Hill.
Hudson, W.W. (1982). The Clinical Measurement Package: A Field Manual. Homewood, IL: Dorsey Press.
Reckase, M. D. (1979). Unifactor latent trait models applied to multifactor tests: Results and implications. Journal of Educational Statistics, 4, 207-230.
Reeve, B. B., Hays, R. D., Bjorner, J. B., Cook, K. F., Crane, P. K., Teresi, J. A., et al. (2007).
Psychometric evaluation and calibration of Health-Related Quality of Life item banks plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Medical Care, 45(5 Suppl 1), S22-S31.
Samejima, F. (1969). Calibration of latent ability using a response pattern of graded scores.
Psychometrika Monograph Supplement, 17.
Steinberg, L., & Thissen, D. (1996). Uses of item response theory and the testlet concept in the measurement of psychopathology. Psychological Methods, 1, 81-97.
Thissen, D., Chen, W.H., & Bock, R. D. (2003). MULTILOG 7.03 [computer software].
Lincolnwood, IL: Scientific Software International.
Van Zyl, M.A., Barney R., & Pahl, K. (Under Review). VCT and Celebrity Based HIV/AIDS Prevention Education: A Pilot Program Implemented in Cape Town Secondary Schools.
Wainer, H., & Thissen, D. (1996). How is reliability related to the quality of test scores? What is the effect of local dependence on reliability? Educational Measurement: Issues and Practice, 15, 22-29.
Yen, W. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187-213.