Monday, 20 November 2017

Factor Analysis: EFA

Factor analysis (FA) is a statistical technique applies to a set of variables to discover which variables in the set that is relatively independent of one another. Principal components analysis (PCA) is extremely similar to factor analysis, and is often used as preliminary stage to factor analysis. Exploratory factor analysis (EFA) is used to identify the hypothetical constructs in a set of data, while confirmatory factor analysis (CFA) is used to confirm the existence of the hypothetical constructs in a fresh set of data. In this blog, I will focus on exploratory factor analysis (EFA).

EFA is used to analyze the structure of the correlations among a large number of variables by defining sets of variables that are highly interrelated, known as factors (Hair et al., 2010). Thus, EFA is used to reduce a set of items and identify the internal dimensions of the scale.

ผลการค้นหารูปภาพสำหรับ exploratory factor analysis

There is no certain rule to say how much samples must have provided data for factor analysis. However, correlation coefficients tend to be less reliable when estimated from small samples. As a general rule of thumb, it should have at least 300 samples for factor analysis. In instrumentation, it should have a ratio of at least 5-10 subjects per item.

To perform FA, the data should meet certain requirements: 1) data has been measured on an interval scale, 2) samples vary in their scores on the variables, 3) the scores on the variables have linear correlations with each other, 4) the scores on the variables are normally distributed, 5) absence of outliers among cases, 6) absence of multicollinearity, and 7) factorablity for EFA.

              The factorability indices for EFA including the alpha correlation of each item, factor loading, communalities, Bartlett's test of Sphericity, the test of KMO, and MSA will be performed before conducting EFA.
              - Alpha correlation: The interpretation of the factorability indices for EFA includes the correlation coefficient (r) in which all pairs of items should range from .30 to .70.  Conversely, if the items correlate very highly (.90 or more) then they are redundant, they should be dropped from the analysis.              
              - Factor loading: A factor loading is the correlation of the variable and the factor, the larger the size of factor loading, the more important the loading in interpreting the factor matrix. Factor loading greater than .30 is desirable.  However, if a variable persists in having cross-loadings, it becomes a candidate for deletion.


ผลการค้นหารูปภาพสำหรับ factor loading

             - Communality: Communality represents the amount of variance accounted for by the factor solution for each variance.  Communality greater than .50 is desirable. 


ผลการค้นหารูปภาพสำหรับ communality

          - Bartlett's test of Sphericity:  Bartlett's test of Sphericity is the method examining the entire correlation matrix.  A statistically significant Bartlett's test of Sphericity (sig. < .05) indicates that sufficient correlations exist among the variables to proceed.         
             - The Kaiser-Meyer-Olkin measure (KMO): The KMO is based on the principle that if variables share common factors, then partial correlations between pairs of variables should be small when effects of other variables are controlled.  The KMO measurement of sampling adequacy for factor analysis at least of .60 is desirable. 


ผลการค้นหารูปภาพสำหรับ kmo and bartlett's test

            - Measure of Sampling Adequacy (MSA): The MSA is examining the degree of intercorrelations among the variables both the entire correlation matrix and each individual variable. An overall MSA value of above .50 before proceeding with the factor analysis is desirable.
       
After the appropriateness of performing a FA has been analyzed, factor extraction using PCA method will be performed.  This procedure condenses items into their underlying constructs which explain the pattern of correlations.  Examine the results of the PCA to decide how many factors which are worth keeping by considering Eigenvalues and Scree test (Eigenvalues greater than 1, the point at which there is an 'elbow' on the Scree test).

Carry out FA by using the number of factors determined from PCA. 'Rotation' is needed when extraction techniques produce two or more factors. Carry out FA, with an orthogonal rotation (Varimax, Quartimax, and Equamax), to see how clear the outcome is. Then, carry out FA again, with an oblique rotation (Oblim and Promax), to produce a clearer outcome. In orthogonal rotation, the factors remain uncorrelated with each other whereas in oblique rotation they are allowed to correlate. Several runs of the analysis will be executed to explore an appropriate factor solution.

The criteria for the number of factors to extract consists of: 1) Eigenvalues greater than 1, 2) a Scree test result, and 3) the value of factor loading for each item that is .30 or greater.

    ผลการค้นหารูปภาพสำหรับ a scree test           

Conclusion: Factor analysis is used to describe things and to attach the conceptual ideas to its statistical results. Interpretation and naming of factors depend on the meaning of the combination of variables accounted for factors. A good factor makes sense. The variables in each factor should show theoretical sense and parsimonious accounting for the factors. 

Wednesday, 8 November 2017

Quantitative analysis: Inferential statistics

Inferential statistics

The 'population' is the entire collection of individuals that we are interested in studying. It is typically impossible or difficult to observe/test each member of the population totally. So we choose a subset containing the characteristics of a large population, called a 'sample', to study.

Descriptive statistics and Inferential statistics
When it comes to statistic analysis, there are two classifications: descriptive statistics and inferential statistics


ผลการค้นหารูปภาพสำหรับ descriptive statistics

Both descriptive and inferential statistics rely on the same set of data.
Descriptive statistics is solely concerned with properties of the observed data, and does not assume that the data came from a larger population. When descriptive statistics are applied to populations, and the properties of populations, like the mean or standard deviation, are called parameters as they represent the whole population. Descriptive statistics are limited in so much that they only allow you to make summations about the people or objects that you have actually measured. You cannot use the data you have collected to generalize to other people or objects.

When examination of each member of an entire population is not convenient or possible, inferential statistics are valuable.
Inferential statistics does start with a sample and then generalizes to a population. Inferential statistics use a random sample of data taken from a population to describe and make generalizations about the population. Inferential statistics are based on the assumption that sampling is random. However, inferential statistics arise out of the fact that sampling naturally incurs sampling error and thus a sample is not expected to perfectly represent the population. There are two main areas of inferential statistics:
     1. The estimation of parameter. This means taking a statistic from sample data and using it to explain something about the population. This is expressed in terms of an interval and degree of confidence that the parameter is within the data.
     2. Testing of significance or hypothesis testing. This is where you use a statistical sample to answer research questions. However, there is some uncertainty in this process and can be expressed in terms of a level of significance.


ผลการค้นหารูปภาพสำหรับ inferential statistics

Please read more explanation as this link https://www.thoughtco.com/differences-in-descriptive-and-inferential-statistics-3126224


Tuesday, 7 November 2017

Quantitative Analysis: Descriptive statistics

Research study may generate masses of data, to summarize data in a simpler interpretation and a meaning way it needs to use descriptive statistics to manage and present data in a summary. Typically, there are two types of statistics that are used to describe data:

1) Measures of central tendency
The concept of central tendency is to describe the 'average' or 'most typical' value of a distribution. These are ways of producing a figure that best represents the 'middle point' in the data. The three most common measures of central tendency are: the mean, the median, and the mode.

       -The mean
         The arithmetic mean is most people's notion of what an average is. The mean is equal to the sum of all values in the data set divided by the number of values in the data set. It should be calculated for interval/ratio data. The mean is also influenced by the outliers that may be at the extreme of data set.

       -The median
         The median is simply the middle value in a distribution when the data are ranked orderly. The median is the most suitable measure of central tendency for ordinal data. It is also widely used with interval/ratio data. We usually prefer the median over the mean or mode when the data is skewed.

       -The mode
         The mode is the value that occurs most frequently in the distribution. It is appropriate for nominal data.


ผลการค้นหารูปภาพสำหรับ central tendency

ผลการค้นหารูปภาพสำหรับ central tendency

2) Measures of dispersion
These are ways of summarizing a group of data by describing how spread out the scores are. When dealing with ordinal data we are restricted to the range and interquartile range, while the variance and standard deviation are usually calculated for interval/ratio data. In addition, there are no appropriate measures of dispersion for nominal data.

       -The range
         The range is calculated by subtracting the smallest value from the large.

       -The interquartile range
         The interquartile is designed to overcome the main flaw of the range by eliminating the most extreme scores in the distribution. It is obtained by ordering the data from lowest to highest, then divided into four equal parts (quartiles) and concentrate on the middle 50% of the distribution.


ผลการค้นหารูปภาพสำหรับ interquartile range


       -The variance
         The main problem with the variance is that the individual differences from the mean have been squared. It is not measured in the same units as the original variable. To remove the effect of squaring, we obtain the square root of the variance, more commonly referred to 'the standard deviation'.


         The variance and standard deviation tell us how widely dispersed the values in a distribution are around the mean. The variance represents the average squared deviation from the mean. If the values are closely concentrate around the mean, the variance will be small.   


ผลการค้นหารูปภาพสำหรับ variance

       -The standard deviation
         The standard deviation is calculated the square root of the variance. It is the most widely used measure of dispersion. However, it can be distorted by a small number of extreme values.


ผลการค้นหารูปภาพสำหรับ standard deviation

Notes: It is advisable to check the data for any unusual high or low values before employing these kinds of statistics.