1) 高岸为谷,深谷为陵。——《诗经·小雅》
2) 挽弓当挽强,用箭当用长。射人先射马,擒贼先擒王。——杜甫《前出塞九首》
3) 草木本无意,荣枯自有时。——孟浩然《江上寄山阴崔少府国辅》
4) 李杜文章在,光焰万丈长。——韩愈《调张籍》
5) 采得百花成蜜后,为谁辛苦为谁甜。——罗隐《蜂》
6) 瓜田不纳履,李下不正冠。——汉乐府民歌《君子行》
7) 知人者智,自知者明。——《老子》
8) 一日不见,如三秋兮。——《诗经·王风·采葛》
9) 城中好高髻,四方高一尺。——汉乐府民歌《城中谣》
10) 作诗火急追亡逋,情景一失后难摹。——苏轼《腊日游孤山访惠勒思二僧》
Key wordsItem response theory; Mixed-type models; Dichotomous items; Polytomous items; Maximum likelihood estimation; Weighted likelihood estimation
CLC numberO 211.2Document codeA
So far, there are a lot of approaches about the bias of the ability estimation reduction have been proposed. For instance, Warm (1998) proposed a WML for application in tests of dichotomous items. The WML estimator consistently displayed the smaller level of bias than the MLE estimator. Then Penfield and Bergeron (2005) generalized the WML to the polytomous IRT models (Samejima, 1998; Wang, 2001; Penfield and Bergeron, 2005). Compared to dichotomous items, polytomous items provide superior information concerning the level of the estimated latent trait(Donoghue, 1994; Embretson and Reise, 2000, p.95; Jodoin, 2003; Penfield and Bergeron, 2005).
These approaches mentioned above focused separately on tests composed of either dichotomous items or polytomous items. However, in practice the mixed-type test composing of both dichotomous and polytomous items is more commonly used, for instance the National Assessment of Educational Progress (NAEP). Therefore, we believe that it is of interest to propose a WML that applied to the mixed-type test. This is the main work of this article.
The purpose of this article is twofold: (a) to present the derivations of the WML estimator under a mixed-type item response model and (b) to compare the properties of the WML estimator to that of the ML estimators under different test conditions. To this end, the remainder of this article is organized into four sections. First, two models used in the article are briefly summarized and present the derivations of the WML under the mixed-type test. Second, a simulation study is conducted to evaluate the performance of the proposed WML by comparing with the usual MLE. Third, a real data set from a large-scale reading assessment is used to demonstrate the difference between the two estimation procedures. Finally, we conclude the article with discussion.
2.1RASCH and PCM
In this paper, we consider a mixed-type model that is the combination of the following Rasch model (RM) and the partial-credit model (PCM). To simplify the notation, the examinee subscript will not be shown in the following derivations.
The RM is defined as
where Pij(θ) is the probability of selecting response j of polytomous item i at ability levelθ, bivdenotes the step parameter of item i of category v, and m denotes the number of the response category.
2.2The Weighted Maximum Likelihood Estimator
To facilitate the presentation of the estimation method, the relevant technical aspects of the IRT ability estimation methods are described below. Based on the above RM model and PCM, the likelihood of response can be written as the product of two types of likelihood functions:
3.1The Design of This Simulation
To evaluate the performance of the proposed WML method, an intensive simulation study was conducted to cover a wide range of index values, such as the total number of items and the proportion of dichotomous and polytomous items in a mixed-type test.
Except for the WML method, the MLE method is also used to estimated the latent traitθ. Then, we compare the two estimators under nine different test conditions, they are: three short test (6 items with 4, 3, and 2 dichotomously scored items), three medium test (12 items with 8, 6, and 4 dichotomously scored items) and three long test (24 items with 16, 12, and 8 dichotomously scored items). The difficulty parameters of the dichotomous items were randomly generated from the standard normal distribution N(0,1).The location parameters of each polytomous item were randomly generated from four normal distributions: bi1~N(0,0.2),bi2~N(?2,0.2),bi3~N(0.5,0.2),bi4~N(2,0.2). Furthermore, 17 equally spacedθvalues were considered, ranging from -4.0 to +4.0 within an increment 0.5. For each of 17 levels ofθ, 1000 vectors of responses to the n items were generated. The dichotomous item responses were simulated according to the RASCH model, and the polytomous item responses were simulated according to the PCM. For each vector of responses, the WML and ML estimators were computed. For the trials containing response patterns consisting of all zeros or all out, the Newton-Raphson algorithm cannot converge, and thus the ML and WML estimators could not be obtained. These response patterns were removed from the analysis, and examinee responses were simulated until admissible response patterns were obtained at each of the 17 levels ofθ.
3.3Results of Simulation
Impact of Inadmissible Response Patterns. Before describing the results concerning the relative performance of the two estimators, the reader should be aware of an anomaly in the results that was immediately apparent upon inspection of the simulation output. For values of less than -2.0 and greater than 2.0, the values of bias in the ML and WML estimators became grossly and nonsensically inflated in magnitude, such that the values of the ML and WML estimators were pulled in toward zero. Note that this direction in bias is opposite of what is typically observed for the ML estimator (Warm, 1989). This obscure result can be explained by the removal of all response patterns consisting of all zeros or all out. As a result of the nonsensical results obtained for the conditions in which |θ| > 2.0, the following description of the simulation results focuses on the comparison of the properties of the WML and ML estimators only for the conditions in which |θ|≤2.0.
standard deviation of the sampling distribution of a statistic, we also compare of the SD and the SE for the WML. The simulation results show that WML outperforms MLE regarding reduction in Bias, RMSE and SD.
4Real Data Analysis
level of the 2000 examinees base on WML and the MLE procedures are?0.6835 and?0.6123.
The total absolute difference and the total relative difference of estimated abilities based on the two procedures are respectively,
The primary limitation to this study is thefinding of nonsensical results for the ML and WML estimators for extreme levels of the latent trait (|θ| > 2.0). As described in the“results of simulation”section, nonsensical ML and WML estimates for this situation are attributable to the removal of trials for which the response pattern consisted of all zeros or all out and poses a major hurdle to the valid use of the ML and WML estimators for such extreme levels ofθ. As a result, extreme caution should be used when estimatingθusing tests and scales that do not match the respondent’s level of latent trait.
There are some other issues that should be further explored. First, the proposed weighting scheme can be generalized to a broad range of applications. For examples, it can be applied to computerized adaptive testing (CAT), not only to lower item exposure rates, but also to improve ability estimation (e.g., Tao 2011). Second, although the RM and the PCM are used commonly in practice test, there are some other more general item response models, for instance the three-parameter logistic(3PL) model and the General Partial Credit Model (GPCM). Therefore, it is worth studying to extend the WML to these more complex models. Third, in addition to the ML, the Bayesian estimator, for instance the expected a posteriori (EAP) estimator, is frequently used in IRT, so a procedure of reduce the bias of the Bayesian estimator should be discussed.
Details of Weighted Maximum Likelihood Estimation Using the Newton-Raphson Algorithm The weighted likelihood estimator is the solution of Equation (10) as follows:
