Sunday, 3 December 2017

Classical Test Theory and Item Response Theory

Classical test theory (CTT) and Item Response Theory (IRT) are widely used as statistical measurement frameworks. CTT is approximately 100 years old, and still remains commonly used because it is appropriate for certain situations. Although CTT has served the measurement community for most of this century, IRT has witnessed an exponential growth in recent decades. IRT is generally claimed as an improvement over CTT.

Classical Test theory (CTT)

          CTT is a theory about test scores that introduces three concepts - test score (observed score), true score, and error score. A simple linear model is postulated linking the three concepts as the basic formulation as follow:

            O (observed score)  =  T (true score) + E (random error)

The assumptions in CTT model are that
            (1) true scores and error scores are uncorrelated,
            (2) the average error score in the population of respondent is zero,
            (3) error scores on parallel tests are uncorrelated.

            CTT is assumed that measurements are not perfect. The observed score for each person may differ from their true ability because the true score influenced by some degree of error. All potential sources of variation existing in the process of testing either external conditions or internal conditions of person are assumed to have an effect as random error. It is also assumed that random error found in observed scores are normally distributed and uncorrelated with the true scores. As this equation, minimizing the error score and reducing the difference between observed and true scores is desirable to yield more true score answers.
            The CTT models have linked test scores to true scores rather than item scores to true scores. Scores obtained from CTT applications are entirely test dependent. In addition, the two statistics (item difficulty and item discrimination) are entirely dependent on the respondent sample taken the test, as well as reliability estimates are dependent upon test scores from beta samples.

            Advantage and implication of CTT
            The main advantage of CTT is its relatively weak theoretical assumptions, which make CTT easy to meet real data and modest sample size, and apply in many testing situations. CTT is useful for assessing the difficulty and discrimination of items, and the precision with which scores are measured by an examination.
            In application, the main purpose of CTT within psychometric testing is to recognise and develop the reliability of psychological tests and assessments.
            1) True scores in the population are assumed to be measured at the interval level and normally distributed.
            2) Classical tests are built for the average respondents, and do not measure high or low respondents very well.
            3) Statistics about test items depend on the respondent sample being representative of population. It can only be confidently generalized to the population from which the sample was drawn. As well as generalization beyond that setting must be careful consideration.
            4) The test becomes longer, the more reliability.
            5) Researcher should not rely on previous reliability estimates of previous study. It is suggested to estimate internal consistency for every study using the sample obtained because estimates are sample dependent.
            Read more about CTT as this link


Item Response Theory (IRT)

            The item response theory (IRT) refers to a family of mathematical models that establishes a link between the properties of items on an instrument, individuals responding to these items, and the underlying trait being measured. IRT assumes that the latent construct (e.g. stress, knowledge, attitudes) and items of a measure are organized in an unobservable continuum. It focuses on establishing the individual’s position on that continuum. IRT models can be divided into two families: unidimensional and multidimensional. There are a number of IRT models varying in the number of parameters (one, two and three-parameter models), and non-parametric (Mokken scale).

            IRT Assumptions
            The purpose of IRT is to provide a framework for evaluating how well assessments work, and how well individual items on assessments work. 
            1) Monotonicity – The assumption indicates that as the trait level is increasing, the probability of a correct response also increases.
            2) Unidimensionality – The model assumes that there is one dominant latent trait being measured and that this trait is the driving force for the responses observed for each item in the measure.
            3) Local Independence – Responses given to the separate items in a test are mutually independent given a certain level of ability.
            4) Invariance – It is allowed to estimate the item parameters from any position on the item response curve. Accordingly, we can estimate the parameters of an item from any group of subjects who have answered the item.

            Each item on a test has its own characteristic curve that describes the probability of getting each item right or wrong given the ability of the person.
            Item Response Function (IRF)
            IRF is the relation between the respondent differences on a construct and the probability of endorsing an item. The response of a person to an item can be modeled by a mathematical item response function (IRF).

            Item Characteristic Curve (ICC)

            IRFs can be converted into Item Characteristic Curve (ICC) which is graphic functions that represent the respondent ability as a function of the probability of endorsing the item. Depending on the IRT model used, these curves indicate which items are more difficult and which items are better discriminators of the attribute.

ผลการค้นหารูปภาพสำหรับ item response curve
            Item Information Function (IIF)
            Each IRF can be transformed into an IIF. The information is an index representing the item's ability to differentiate among individuals.
            Discrimination - height of the information (tall and narrow IIFs- large discrimination, short and wide IIFs - low discrimination)
            Test Information Function
            We can judge the test as a whole and see at which part of the trait range it is working the best.

            The IRT mathematical model is defined by item parameters. Parameters on which items are characterized include their difficulty (b), discrimination (a), and a pseudoguessing parameter (c).
            -Location (b): location on the difficulty range
            "b" is the item difficulty that determines the location of the IRF, an index of what level of respondents for which the item is appropriate; typically ranges from -3 to +3, with 0 being an average respondent level.
            -Discrimination (a): slope or correlation
            "a" is the item's discrimination that determines the steepness of the IRF, an index of how well the item differentiates low from top respondents; typically ranges from 0 to 2, where higher is better.
            -Guessing (c)
            "c" is a lower asymptote parameter for the IRF, typically is focus on 1/k where k is the number of options. The inclusion of a "c" parameter suggests that respondents with low trait level may still have a small probability of endorsing an item.
            -Upper asymptote (d)           
            "d" is an upper asymptote parameter for the IRF. The inclusion of a "d" parameter suggests that respondents very high on the latent trait are not guaranteed to endorse the item.

           
ผลการค้นหารูปภาพสำหรับ item response theory parameters

            Advantages and Disadvantages of IRT
            IRT provides flexibility in situations where different sample or test forms are used. As IRT model’s unit of analysis is the item, they can be used to compare items from different measures provided that they are measuring the same latent construct. Moreover, they can be used in differential item functioning, in order to assess why items that are calibrated and test, still behave differently among groups. Thus, that is allowed IRT findings are foundation for computerized adaptive testing.
IRT models are generally not sample- or test-dependents.           

            However, IRT are strict assumptions, typically require large sample size (minimum 200; 1000 for complex models), more difficult to use than CTT: IRT scoring generally requires relatively complex estimation procedures, computer programs not readily available and models are complex and difficult to understand.       

            Read more about IRT as this link

Monday, 20 November 2017

Factor Analysis: EFA

Factor analysis (FA) is a statistical technique applies to a set of variables to discover which variables in the set that is relatively independent of one another. Principal components analysis (PCA) is extremely similar to factor analysis, and is often used as preliminary stage to factor analysis. Exploratory factor analysis (EFA) is used to identify the hypothetical constructs in a set of data, while confirmatory factor analysis (CFA) is used to confirm the existence of the hypothetical constructs in a fresh set of data. In this blog, I will focus on exploratory factor analysis (EFA).

EFA is used to analyze the structure of the correlations among a large number of variables by defining sets of variables that are highly interrelated, known as factors (Hair et al., 2010). Thus, EFA is used to reduce a set of items and identify the internal dimensions of the scale.

ผลการค้นหารูปภาพสำหรับ exploratory factor analysis

There is no certain rule to say how much samples must have provided data for factor analysis. However, correlation coefficients tend to be less reliable when estimated from small samples. As a general rule of thumb, it should have at least 300 samples for factor analysis. In instrumentation, it should have a ratio of at least 5-10 subjects per item.

To perform FA, the data should meet certain requirements: 1) data has been measured on an interval scale, 2) samples vary in their scores on the variables, 3) the scores on the variables have linear correlations with each other, 4) the scores on the variables are normally distributed, 5) absence of outliers among cases, 6) absence of multicollinearity, and 7) factorablity for EFA.

              The factorability indices for EFA including the alpha correlation of each item, factor loading, communalities, Bartlett's test of Sphericity, the test of KMO, and MSA will be performed before conducting EFA.
              - Alpha correlation: The interpretation of the factorability indices for EFA includes the correlation coefficient (r) in which all pairs of items should range from .30 to .70.  Conversely, if the items correlate very highly (.90 or more) then they are redundant, they should be dropped from the analysis.              
              - Factor loading: A factor loading is the correlation of the variable and the factor, the larger the size of factor loading, the more important the loading in interpreting the factor matrix. Factor loading greater than .30 is desirable.  However, if a variable persists in having cross-loadings, it becomes a candidate for deletion.


ผลการค้นหารูปภาพสำหรับ factor loading

             - Communality: Communality represents the amount of variance accounted for by the factor solution for each variance.  Communality greater than .50 is desirable. 


ผลการค้นหารูปภาพสำหรับ communality

          - Bartlett's test of Sphericity:  Bartlett's test of Sphericity is the method examining the entire correlation matrix.  A statistically significant Bartlett's test of Sphericity (sig. < .05) indicates that sufficient correlations exist among the variables to proceed.         
             - The Kaiser-Meyer-Olkin measure (KMO): The KMO is based on the principle that if variables share common factors, then partial correlations between pairs of variables should be small when effects of other variables are controlled.  The KMO measurement of sampling adequacy for factor analysis at least of .60 is desirable. 


ผลการค้นหารูปภาพสำหรับ kmo and bartlett's test

            - Measure of Sampling Adequacy (MSA): The MSA is examining the degree of intercorrelations among the variables both the entire correlation matrix and each individual variable. An overall MSA value of above .50 before proceeding with the factor analysis is desirable.
       
After the appropriateness of performing a FA has been analyzed, factor extraction using PCA method will be performed.  This procedure condenses items into their underlying constructs which explain the pattern of correlations.  Examine the results of the PCA to decide how many factors which are worth keeping by considering Eigenvalues and Scree test (Eigenvalues greater than 1, the point at which there is an 'elbow' on the Scree test).

Carry out FA by using the number of factors determined from PCA. 'Rotation' is needed when extraction techniques produce two or more factors. Carry out FA, with an orthogonal rotation (Varimax, Quartimax, and Equamax), to see how clear the outcome is. Then, carry out FA again, with an oblique rotation (Oblim and Promax), to produce a clearer outcome. In orthogonal rotation, the factors remain uncorrelated with each other whereas in oblique rotation they are allowed to correlate. Several runs of the analysis will be executed to explore an appropriate factor solution.

The criteria for the number of factors to extract consists of: 1) Eigenvalues greater than 1, 2) a Scree test result, and 3) the value of factor loading for each item that is .30 or greater.

    ผลการค้นหารูปภาพสำหรับ a scree test           

Conclusion: Factor analysis is used to describe things and to attach the conceptual ideas to its statistical results. Interpretation and naming of factors depend on the meaning of the combination of variables accounted for factors. A good factor makes sense. The variables in each factor should show theoretical sense and parsimonious accounting for the factors. 

Wednesday, 8 November 2017

Quantitative analysis: Inferential statistics

Inferential statistics

The 'population' is the entire collection of individuals that we are interested in studying. It is typically impossible or difficult to observe/test each member of the population totally. So we choose a subset containing the characteristics of a large population, called a 'sample', to study.

Descriptive statistics and Inferential statistics
When it comes to statistic analysis, there are two classifications: descriptive statistics and inferential statistics


ผลการค้นหารูปภาพสำหรับ descriptive statistics

Both descriptive and inferential statistics rely on the same set of data.
Descriptive statistics is solely concerned with properties of the observed data, and does not assume that the data came from a larger population. When descriptive statistics are applied to populations, and the properties of populations, like the mean or standard deviation, are called parameters as they represent the whole population. Descriptive statistics are limited in so much that they only allow you to make summations about the people or objects that you have actually measured. You cannot use the data you have collected to generalize to other people or objects.

When examination of each member of an entire population is not convenient or possible, inferential statistics are valuable.
Inferential statistics does start with a sample and then generalizes to a population. Inferential statistics use a random sample of data taken from a population to describe and make generalizations about the population. Inferential statistics are based on the assumption that sampling is random. However, inferential statistics arise out of the fact that sampling naturally incurs sampling error and thus a sample is not expected to perfectly represent the population. There are two main areas of inferential statistics:
     1. The estimation of parameter. This means taking a statistic from sample data and using it to explain something about the population. This is expressed in terms of an interval and degree of confidence that the parameter is within the data.
     2. Testing of significance or hypothesis testing. This is where you use a statistical sample to answer research questions. However, there is some uncertainty in this process and can be expressed in terms of a level of significance.


ผลการค้นหารูปภาพสำหรับ inferential statistics

Please read more explanation as this link https://www.thoughtco.com/differences-in-descriptive-and-inferential-statistics-3126224


Tuesday, 7 November 2017

Quantitative Analysis: Descriptive statistics

Research study may generate masses of data, to summarize data in a simpler interpretation and a meaning way it needs to use descriptive statistics to manage and present data in a summary. Typically, there are two types of statistics that are used to describe data:

1) Measures of central tendency
The concept of central tendency is to describe the 'average' or 'most typical' value of a distribution. These are ways of producing a figure that best represents the 'middle point' in the data. The three most common measures of central tendency are: the mean, the median, and the mode.

       -The mean
         The arithmetic mean is most people's notion of what an average is. The mean is equal to the sum of all values in the data set divided by the number of values in the data set. It should be calculated for interval/ratio data. The mean is also influenced by the outliers that may be at the extreme of data set.

       -The median
         The median is simply the middle value in a distribution when the data are ranked orderly. The median is the most suitable measure of central tendency for ordinal data. It is also widely used with interval/ratio data. We usually prefer the median over the mean or mode when the data is skewed.

       -The mode
         The mode is the value that occurs most frequently in the distribution. It is appropriate for nominal data.


ผลการค้นหารูปภาพสำหรับ central tendency

ผลการค้นหารูปภาพสำหรับ central tendency

2) Measures of dispersion
These are ways of summarizing a group of data by describing how spread out the scores are. When dealing with ordinal data we are restricted to the range and interquartile range, while the variance and standard deviation are usually calculated for interval/ratio data. In addition, there are no appropriate measures of dispersion for nominal data.

       -The range
         The range is calculated by subtracting the smallest value from the large.

       -The interquartile range
         The interquartile is designed to overcome the main flaw of the range by eliminating the most extreme scores in the distribution. It is obtained by ordering the data from lowest to highest, then divided into four equal parts (quartiles) and concentrate on the middle 50% of the distribution.


ผลการค้นหารูปภาพสำหรับ interquartile range


       -The variance
         The main problem with the variance is that the individual differences from the mean have been squared. It is not measured in the same units as the original variable. To remove the effect of squaring, we obtain the square root of the variance, more commonly referred to 'the standard deviation'.


         The variance and standard deviation tell us how widely dispersed the values in a distribution are around the mean. The variance represents the average squared deviation from the mean. If the values are closely concentrate around the mean, the variance will be small.   


ผลการค้นหารูปภาพสำหรับ variance

       -The standard deviation
         The standard deviation is calculated the square root of the variance. It is the most widely used measure of dispersion. However, it can be distorted by a small number of extreme values.


ผลการค้นหารูปภาพสำหรับ standard deviation

Notes: It is advisable to check the data for any unusual high or low values before employing these kinds of statistics.

Friday, 27 October 2017

Quantitative design: Surveys


ผลการค้นหารูปภาพสำหรับ surveys





Surveys

The major purpose of surveys is to describe the characteristics of a population. All kinds of people in all kind of professions are used surveys to gain information about their target populations. There are two main types of surveys; a cross-sectional survey and a longitudinal survey.

Types of surveys
  • A cross-sectional survey collects information from a sample at one point in time. When an entire population is surveyed, it's call census. Cross-sectional surveys are useful in assessing practices, attitudes, knowledge, and beliefs of a population in relation to a particular related event.
  • A longitudinal survey collects information at different points in time in order to study changes over time. Three longitudinal designs are commonly employed in survey research: trend studies, cohort studies, and panel studies.
         1) Trend studies
         In a trend study, different samples from the same population at different points in time. If random selection were used to obtain the samples. These could be considered representative of the population.
         2) Cohort studies
         A cohort study samples a particular population whose numbers do not change over the course of the survey. A cohort sample has experiences some type of event in a selected time period, and studying them at intervals through time.
         3) Panel studies
         A panel study is a longitudinal study of a cohort people (same individual) with multiple measures over time. The various data collections are often called waves. Panel studies with several waves are the best quasi-experimental design for investigating the causes and consequences of changes with high internal validity. Moreover, most of big panel studies utilize population probability samples that permit generalization to the target population and provide for external validity. However, big panel studies tend to be very expensive and difficult to conduct.

ผลการค้นหารูปภาพสำหรับ surveys method

Advantages and disadvantages of surveys

Advantages
1. Convenient of collecting data, surveys can be administered to the participants through a variety of ways, eg. mail, e-mail, online, mobile, face-to-face, and can be administered in remote area.
2. Little or no observer subjectivity.
3. Participants can take their time to complete the question. Online and e-mail surveys allow respondents to maintain their anonymity.
4. Due to the large number of people who answer surveys, good statistic significant results can be find easier. Moreover, with survey software, advance statistical technique can be utilized.
5. Assuming well-constructed question and study design, survey method is potential to produce reliable results.
6. The representativeness of the respondents makes the surveys potential for generalization.
7. Cost effective, but cost depends on survey method.

ผลการค้นหารูปภาพสำหรับ surveys method

Disadvantages
1. The questions may not appropriate for all of participants. These make differences in understanding and interpretation. The answer of participants may not be precisely answered.
2. Respondents may not feel comfortable providing answers that present themselves in an unfavorable manner. Dishonesty can be an issue.
3. If questions are not required, the respondents will ignore to answer some questions.
4. Open-ended questions are difficult to analyze because of too much of varied opinion.
5. Low response rate.

To read more how to identifying relevant guidance for survey research and evidence on the quality of reporting survey, please go to this link http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1001069

Tuesday, 17 October 2017

Quantitative design: Experiment and quasi-experiment

Experiment and Quasi-experiment

Experiment
An experiment is a study in which a treatment, procedure, or program is intentionally introduced and a result or outcome is observed (https://ori.hhs.gov/content/module-2-research-design-section-2#experimental-studies). Key characteristics are:
     -Random assignments
     -Control over extraneous variables
     -Manipulate of the treatment conditions
     -Outcome measurements
     -Group comparisons
     -Threats to validity
The most important of these characteristics are manipulation and controlIn addition, experiments involve highly controlled and systematic procedures in an effort to minimize error and bias, which also increases our confidence that the manipulation "caused" the outcome.

Quasi-experiment
A quasi-experiment is an empirical study used to estimate the causal impact of an intervention on its target population without random assignment to treatment or control.


ผลการค้นหารูปภาพสำหรับ experimental research characteristics

Randomized controlled trial (RCT)
Randomized controlled trial is a type of scientific experiment which aims to reduce bias when testing a new treatment. The participants in the trial are randomly assigned to either the group receiving treatment under investigation or to the control group. The control may be a standard practice, a placebo, or no intervention at all. It is the most rigorous way of determining whether a cause-effect relation exists between treatment and outcome. The RCT is considered the gold standard for a clinical trial. 
One of the key feature is "randomization" to the group. All the participants have the same chance of being assigned to each of the study groups. Importantly, the characteristics of the participants are likely to be similar across the groups at the start of the comparison. This is intended to ensure that all potential confounding factors are divided equally among the groups that will be later compared.
RCTs are "controlled" so that researchers can reasonably expect any effects to be the result of the treatment or intervention, observing and comparing effects in the control group which not given the treatment or intervention.
Bias is avoided not only by randomization but also by blinding. When the groups that have been randomly selected from a population do not know whether they are in the control group or experimental group. The study is called "single blind". In "double blind" the researchers also do not know which participants are in the control group or the experimental group. 

Intention to treat analysis
Intention to treat (ITT) is a strategy for the analysis of RCTs that compares patients in group which they were originally randomly assigned. All participants who were enrolled and randomly allocated to treatment are included in the analysis and are analyzed in the groups to which they were randomized. Inclusion occurs regardless of deviations that may happen after randomization, such as protocol violations, adherence to treatment protocol, dropout/withdrawals from the study. ITT provides 1) a more realistic estimate of average treatment effects in the real situation as it is normal that some participants may dropout or deviation from the treatment in every day practice and 2) helps to preserve the integrity of randomization process. ITT is a good approach of RCTs, but it will be problem if there is high dropout rate and poor adherence of the study. Reporting of any deviations from random assignment and missing response is essential of an ITT approach, as emphasized in the CONSORT guidelines on the reporting of RCTs.

Per protocol analysis is a comparison of treatment groups that includes only those participants who complete the treatment originally allocated. The results of per protocol analysis usually provide a lower of evidence but better reflect the effects of treatment. It can reduce the under- or overestimation of true effect which found in ITT. If done per protocol analysis alone, the analysis will be bias. Both intention to treat and per protocol analysis is recommended to report.

ผลการค้นหารูปภาพสำหรับ intention to treat analysis


Complex interventions
Complex interventions are made up of many components that act both on their own and in conjunction with each other. There is no clear boundary between simple and complex interventions, but the number of components and range of effects may vary widely. Complex interventions are widely used in the health service, in public health practice, and in areas of social policy that have important health consequences, such as education, transport, and housing. The property of the intervention and the context into which an intervention is placed is important. Complex interventions may work better if tailored to local context than being completely standardized.
In 2000, the Medical Research Council (MRC) of the United Kingdom published a guideline to help researchers and research funders to recognize and adopt appropriate methods,and updated as this link. This BMJ paper summarized the issues that prompted the revision and the key massage of the new guidance. This figure is the key elements of the development and evaluation process.

Figure1

Pragmatic trials

Clinical trials have been the main tool used to test and evaluate interventions. Trials are either explanatory or pragmatic. Explanatory trials aim to test whether an intervention works under optimal situations. Pragmatic trials are designed to evaluate the effectiveness of interventions in real-word practice settings
In pragmatical trials, internal validity (accuracy of the results) and external validity (generalizability of the results) must be achieved, and must be prospectively registered and reported fully according to the pragmatic trials extension of the CONSORT statement.

ผลการค้นหารูปภาพสำหรับ pragmatic trials
ผลการค้นหารูปภาพสำหรับ pragmatic trials

Friday, 13 October 2017

Measurement and prediction II (reliability, validity, sensitivity and specificity)

Reliability

    Reliability refers to the consistency of measurement. If the measurement were to be done more than one time or more than one person on the same phenomenon, and it produces the same results. The measurement is reliable. There are four aspects of reliability (https://www.socialresearchmethods.net/kb/reltypes.php).

    1) Inter-rater or Inter-observer reliability - This type of reliability used to assess the agreement between/among observers of the same phenomenon.
    2) Test-retest reliability - This type of reliability will be used when we administer the same test/instrument to the same sample on two different times.
    3) Inter-method reliability - This type of reliability will be used to assess the degree to which test scores are consistent when there is a variation in the methods or instruments used. When two tests constructed in the same way from the same content domain, it may be termed "parallel-forms reliability".
    4) Internal consistency reliability - This type of reliability will be used to assess the consistency of results across items within a test.

For the scale development, reliability of an instrument refers to the degree of consistency or the  repeatability of an instrument with which it measure the concept it is supposed to be measuring (Burns & Groove, 2007). The reliability of an instrument can be assessed in various ways. Three key aspects are internal consistency, stability, and equivalence.
    
Validity
    
    Validity refers to the credibility of the measurement. The measurement can measure what it want to measure. There are two aspects of validity.
    1) Internal validity.  It refers that the instruments or procedures used in the study measure what they are supposed to measure.
    2) External validity.  It refers that the results of the study can be generalized.

For the scale development, validity is inferred from the manner in which a scale was constructed, its ability to predict specific events, or its relationship to measure of other constructs (DeVellis, 2012). There are three types of validity: content validity, construct validity, and criterion-related validity.


    The relationships between reliability and validity
    If measurements are valid, they must be reliable. The developed scale is expected to contain evidence to support its reliability and validity.


ผลการค้นหารูปภาพสำหรับ reliability


Sensitivity and Specificity

Sensitivity is the extent to which true positives that are correctly identified (so false negatives are few). For example: A sensitive test helps rule out disease. If a person has a disease, how often will the test be positive (true positive rate)?


Specificity is the extent to which positives really present the condition of interest and not some other condition being mistaken for it (so false positives are few). For example: If a person does not have disease, how often will the test be negative (true negative rate)?


ผลการค้นหารูปภาพสำหรับ sensitivity and specificity

For example, the results of testing TB from 100 subjects,

ผลการค้นหารูปภาพสำหรับ sensitivity and specificity
From a simple illusion, the sensitivity and specificity would be  0.83 (25/30) and  0.97 (68/70) respectively.


ROC curve

The measures of sensitivity and specificity rely on a single cutpoint to classify a test result as positive or negative. In many diagnostic situations the results of a continuous test or ordinal predictor, there are multiple cutpoints. A receiver operating characteristic curve (ROC curve) is an effective method of evaluating the performance of diagnostic tests. 


ผลการค้นหารูปภาพสำหรับ sensitivity and specificity

ROC curve will be created by plotting the true positive rate (sensitivity) against the false positive rate (1-specificity) at various different cut-off points. The different points on the curve correspond to the different cutpoints used to determine whether the test results are positive. The closer the ROC is to the upper left corner, the high accuracy of the test.


ผลการค้นหารูปภาพสำหรับ roc curve


References:

Burns, N., & Groove, S. K. (2007). Understanding nursing research: building an evidence-based practice (4th ed.). St. Louis, Missouri: Sanders Elsevier.
DeVellis, R. F. (2012). Scale development: theory and applications (3rd ed.). Los Angeles: SAGE Publications.