
Below is a model of the cognitive processes that people engage in when responding to a survey item. Respondents must interpret the question, retrieve relevant information from memory, form a tentative judgment, convert the tentative judgment into one of the response options provided and finally edit their response as necessary.
The following survey statement at first seems straightforward but it poses several difficulties for respondents.
I don’t drink many soft drinks in a typical day.
First, they must interpret the question. For example, they must decide whether “soft drinks” include for example milk and water (as opposed to just artificial drinks) and whether a “typical day” is a typical weekday, typical weekend day or both.
Once they have interpreted the question, they must retrieve relevant information from memory to answer it. But what information should they retrieve, and how should they go about retrieving it? They might think vaguely about some recent occasions on which they drank soft drinks, they might carefully try to recall and count the number of soft drinks they consumed last week, or they might retrieve some existing beliefs that they have about themselves (e.g., “I don’t drink many soft drinks”).
Then they must use this information to arrive at a tentative judgment about how many soft drinks they consume in a typical day. For example, this mental calculation might mean dividing the number of soft drinks they consumed last week by seven to come up with an average number per day. Then they must format this tentative answer in terms of the response options actually provided. In this case, the options pose additional problems of interpretation. For example, what does “average” mean, and what would count as “somewhat more” than average?
Finally, they must decide whether they want to report the response they have come up with or whether they want to edit it in some way. For example, if they believe that they drink much more than average, they might not want to report the higher number for fear of looking bad in the eyes of the researcher. From this perspective, what at first appears to be a simple matter of asking people how many soft drinks they consume (and receiving a straightforward answer from them) turns out to be much more complex.
Pulse poses a series of closed-ended statements for consideration by participants and provides a set of response options for them to choose from. For example: The feedback I receive from my manager is constructive.
We do this because we are interested in the participants’ level of agreement with each statement. Closed-ended items are used because they are relatively quick and easy for participants to complete and the responses can be easily converted to numbers, ready for use in Pulse’s algorithms.
The survey creation process for Satchel Pulse surveys follows seven steps to ensure that Pulse surveys are efficient, valid, and reliable and that users can have confidence in their results.
The Pulse survey development process begins with a review of the existing relevant literature and other data sources and surveys of culture, climate and social and emotional learning.
Satchel researchers conducted interviews and focus groups with educators to explore their thoughts on the topics in Pulse surveys. Discussions explored the usefulness, relevance, and language of the topics in the surveys.
Subscales (called Pillars) are developed based on literature reviews, interviews and expert input. Existing frameworks are examined to determine if a suitable set of Pillars could be adopted. If a suitable set of Pillars does not exist, Pillars are determined by examining the available research and existing frameworks to identify a coherent set of distinct constructs. Being well founded in the education sector, including having many educators on staff with Satchel, we combine Satchel research with industry insights to develop Pillars.
Satchel follows the BRUSO model when writing survey statements (https://methods.sagepub.com/book/constructing-effective-questionnaires). Pulse survey questions are brief, to the point and avoid long, overly technical or unnecessary words. This makes them easier for respondents to understand and faster for them to complete. Pulse only presents questions that are relevant to the overall survey. Again, this makes the survey faster to complete, but it also avoids irritating respondents with irrelevant statements. Pulse questions are unambiguous and can be interpreted in only one way. They are also specific, so that it is clear to respondents what their response should be about. A common problem can be closed-ended items that are “double-barrelled.” They ask about two conceptually separate issues but allow only one response. Satchel ensures that where this may be the case the question is split into two separate questions. Finally, they are objective and do not drive participants to answer in a particular way.
Questions for each Pillar are weighted by content area experts, to prioritize the relative importance of each question in the Pillar. Higher impact questions are weighted more heavily in the Pillar score.
Experienced subject matter experts and survey methodologists have reviewed Pulse surveys to identify any issues with the wording of questions or administration that could cause measurement error.
Schools in regions throughout the United States pilot the items and subscales as well as the administration methodology. Feedback from pilot testing is used to modify questions and refine the administration methodology.
More frequent surveys means the potential for survey fatigue. Pulse combats this by providing a quick and simple way to collect participants’ responses. As we have quantitative variables, we use a visual-analog scale for the response options, on which the participants make a mark on a horizontal line to indicate the magnitude of their response.
We have three response options on our rating scale, Strongly disagree, Not sure and Strongly agree. The verbal response labels are presented to the respondents and their response is converted into numerical values between one and ten depending on where the response is on the rating scale. We supplement the verbal labels on the scale with appropriate graphic icons. These icons change as the respondent drags the indicator along the sliding scale. This ensures that the respondent can quickly and easily see whether the response they’re giving is negative or positive.
By fitting a statistical profile to the answers provided by a school, Pulse intelligently selects which and how many questions are delivered in each survey so as to keep results and analyses accurate and up to date.
By making some simple assumptions around the diversity of opinion in a group of people Pulse can estimate how many respondents are needed, so to acquire an accurate result for the whole population (within an appropriate error tolerance level). This is called a Margin Of Error calculation. Pulse uses this process and the response rate seen for each school to estimate the number of questions needed to ask in each survey, so to get a complete picture of the school as quickly as possible.
Satchel rates the impact of each Pulse question both in general and for each specific Pillar. This rating system helps put emphasis on which questions should be picked for each survey. Those questions with the highest impact will be asked more frequently, this keeps Pillar scores reflective of the opinion at the time. In addition to this, the frequently asked questions ensure enough answers are received from other questions so that they pass the accuracy thresholds to be included in the overall Pillar results.
Collecting the responses is step 1 in a Pulse survey, step 2 is delivering the insights needed to create issue specific actions.
We use our industry insights to consolidate survey statements into 8 core pillars for each group (for staff, students and parents) we believe are necessary for upholding a well managed school with an engaged school community. Pulse focuses on how much impact each pillar makes on school culture by tracking attitudes and actions related to each of them.
Each pillar has a score from 0-10. The pillar score is derived from the consolidated statements and the answers given for them. A low pillar score indicates the staff, students or parents have issues in this area whereas a high pillar score indicates that they are happy in this area.
A weighting is applied to each statement for its influence on a particular pillar. And a statement can influence one or multiple pillars.
Pulse uses a series of calculations to propagate the final pillar scores. This ensures any decisions made for change are based on scores that accurately reflect the feelings of the respondents.
The value of each statement is determined by the average of the answers given for it. For a statement’s value to be calculated for a group there must either be a minimum of 5 responses or 50% of the group must have responded to the statement (whichever is the larger).
The error on each Statement Value is estimated from the spread of the answers given. The larger the spread of answers, the larger the error will become. Assuming a normal distribution, this would approximate to the 95% confidence interval for the mean Statement Value.
We combine relevant Statement Values to produce the Pillar Value taking into account the importance of each statement. The Pillar Value is calculated as the sum of the Statement Values multiplied by the associated weight and normalized by the sum of all weights. Statements that do not yet have enough data for a value to be calculated are excluded from this calculation, as are their associated weights.
Errors on individual statements are also propagated through to give an error on the Pillar Value whilst still taking into account the importance of each statement. Questions that have not met the threshold number of participants are removed from the above calculations as are their associated weights.
This method of calculation takes into consideration how important a statement is in relevance to the Pillar Value. If a simple average was used, this information would be lost. For example under the pillar “I Like My Job” the same influence wouldn’t be given to the question “The technology at this school is reliable and well supported” as “I enjoy teaching the students of this school”.
Pulse surveys are only reported once a sufficient Pillar completeness value has been achieved and a sufficient number of respondents have completed a survey. This ensures that an acceptable amount of data has been collected to establish confidence in the normality and reliability of the data.
According to Neuendorf (2011), Sijtsma (2009) and Tavakol and Dennick (2011), traditional measures of reliability that examine internal consistency, such as Cronbach’s alpha, can be artificially enhanced by adding more questions with redundant content. They are measures of homogeneity, and while some is good, too much homogeneity is an indicator of redundancy, especially when the questions on a survey are intended to measure multifaceted constructs like the Pulse Pillars.
Many surveys, including those developed by Satchel are developed with scales that are composed of sub constructs that are not and should not be highly correlated. Bollen and Lennox (1991) explain that we would expect a high correlation between indicators of a latent construct when the indicators are all intended to measure the same thing, such as four two-digit multiplication questions in the multiplication section of a math test. However, If a scale, such as the Pulse Pillars, contains four indicators that address four unique and discrete aspects of the scale, one would not expect the questions to correlate highly. In fact, a scale like this with higher inter item correlations may lack the heterogeneity needed to fully address the diverse subdomains with the scale. High inter item correlations indicate that either the scale construct is too specific or the questions are redundant. These high correlations may even be a detriment to content validity because all aspects of the scale construct are not fully represented by the items in the scale.
Nevertheless, it is important to understand the psychometric relationship between questions on a survey. Because many researchers have concerns about Cronbach’s alpha being too sensitive to the number of questions, Satchel uses the mean inter item correlation to measure internal consistency in its surveys. Satchel follows the advice of Clark and Watson (1995) who suggest that for broad, higher order constructs such as the Pillars in Pulse surveys, the inter item correlation should be as low as .15 to .20 but never higher than .50.
Validity of a survey instrument is determined by a variety of factors, including psychometric properties and survey processes. Satchel uses controls during survey design and administration to ensure the validity of Pulse survey results. Face validity is established through interviews with persons representing their respective population of respondents. Interviews confirm questions that measure what they are intended to measure and identify items that needed to be modified.
As previously mentioned, high internal consistency may work against content validity, the extent to which a scale taps all aspects of a construct. As noted by Clark and Watson (1995) “maximizing internal consistency almost invariably produces a scale that is quite narrow in content; if the scale is narrower than the target construct, its validity is compromised.” Furthermore, according to Kline (1986) maximum validity is obtained where test items do not all correlate with each other, but where each correlates positively with the criterion. Such a test would have only low internal-consistency reliability. Satchel maximizes content validity by ensuring complete coverage of the subdomains measured within each Pillar. For example, Satchel uses content area experts to examine the questions contained in Pulse surveys so they cover all relevant aspects of each Pillar.
Bollen, K., & Lennox, R. (1991). Conventional wisdom on measurement: A structural equation perspective. Psychological Bulletin, 110, 305-314.
Clark, L. A., & Watson, D. (1995). Constructing validity: Basic issues in objective scale development. Psychological Assessment, 7, 309-319.
Kline, P. Kline (1986). A handbook of test construction: Introduction to psychometric design. , Methuen, New York.
Neuendorf, K. (2011). Internal consistency reliability: Can Cronbach’s alpha be too high. COM 631-Multivariate analysis.
Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74(1), 107.
Tavakol, M. & Dennick, R. (2011). Making sense of Cronbach’s alpha. International journal of medical education, 2, 53.
You are in good company