Statistical Quandary and Policy Ambiguity in Performance Appraisal

Posted: December 16, 2010 in Analysis
Tags: , , , , , , , , , ,

"The itch of thinking begins from scratch!"

Organizations need to employ valid and reliable instruments to measure its performance and so with their members. Likewise, the policies that guide the organization in assessing performance  must be clear to all its members or else it will be subject to various interpretations that could have adverse repercussions. The right tools in measuring, analyzing information and making conclusions must be utilized with utmost care to arrive at a reasonable decision.

The functions of performance appraisal are various. One, it helps in determining how every member fulfills his or her role and responsibility. Another, it offers insights how stakeholders and clients ‘appreciate’ the services which  the organization members deliver. In addition, with the use of a valid and reliable instrument, an employee can be appraised comprehensively avoiding the bias of judging his performance based on isolated observations. With the generated data, supervisors and managers can guide their employees in improving their competencies for efficacy. Lastly, performance evaluation can be used as determining instrument for the needed reinforcement to enhance the employees’ self-esteem and motivations towards work.

With those purposes that evaluations serve for the organization and its members, the choice of evaluation instrument is critical. In evaluation, both the qualitative and quantitative aspect of analysis must be considered. When an instrument is highly dependent on numerical data, decisions will have to be based on the result of statistical analysis. When the instrument allows for collecting qualitative data, those must be analyzed appropriately. In both analyses, generalizations have to be grounded on the data.

Statistically, the levels of analysis can be descriptive, inferential to the rather more complex analyses. These are essential in making informed decisions. Qualitatively, the responses must fit those of the quantifiable obtained data. Otherwise, there will be inconsistency that may lead to failure in making informed decision. With numerical data, simple statistical analysis can be valuable already. In doing so, one can have a grasp of the performance holistically.

Performance has to be evaluated holistically, or else the evaluation fails. On the merits of having high performance standards, with several components being evaluated, specific weights must be apportioned to the components. This reduces bias and so increases the validity and reliability of the evaluation process. Hence, evaluators can not just set their eyes on a fluke in the data when everything else in the data says otherwise.

There is a joke among researchers and academics, numbers do not lie but the interpreters of the data can do so. Statistical quandary can either result from manipulation of data or from the incompetency due to lack of knowledge to the side of  the evaluator or researcher. If these happen, then the interpretations of the evaluator can be judged a hoax.

Here is one case to illustrate statistical quandary. Assuming that a company is using a valid and reliable instrument that combines both quantitative and qualitative measures of constructs; that the performance evaluation is taken at several times and data is obtained from various groups; that the performance is evaluated in three different components; and that the policy states that an employee to move from ranks must obtain a very satisfactory rating during the period of evaluation; evaluators should be able to arrive at an informed decision. XYZ Inc. has all those set in the system, but surprisingly its board of evaluators could not decide whether to promote an employee to a higher rank because with one group at one time in that year, the employee had an unsatisfactory rating.

In XYZ Inc., Zsazha, an employee,  filed for a promotion. The board reviewed her records but did not arrive at the point when they need to make a decision. She was evaluated for the past year, 16 times, in three terms, among different groups at different time intervals. This case can be considered like a panel study.

She had 97% attendance efficiency. She had actively been involved in her organization through her pro-bono services. She had continuing education and professional development. She is even recognized in her participation as resource person, speaker and researcher locally and internationally. She has been serving the company for more than five years, and never actually got a promotion coming from her supervisors initiative. But, becauseshe failed in one out of the 16 evaluations, her request for promotion is still on the table of discussion, yet in all the rest of her evaluations she scored very satisfactory to outstanding.

Applying simple descriptive statistics, one should understand that the power of one is insignificant to the value of 15 among 16 sets of evaluation data. The data will nearly show that 6.25% of the respondents does not approve of her performance. But the data also shows that over all the whole group surveyed appreciated her performance to be very satisfactory (93.75%). There are actually more groups who gave Zhasha outstanding rating for her performance.

The presiding officer of the evaluation board argues that an employee to merit promotion should have very satisfactory rating “in all components” during the term of evaluation. To arrive at a qualitative interpretation of the value, the statistician applies taking the general average or mean of the scale responses. The general average of all the ratings in the term of evaluation says that Zhasha had very satisfactory rating, but the evaluators eyes were so focused on that single group that gave her an unsatisfactory rating.

With such kind of analysis, the other ratings have lost value. The judgment was bias based on single isolated case. The role of evaluation using several cases, at different terms has not served its cause. One can see then that the evaluation used was not to obtain a comprehensive understanding of one’s performance, but or is it a mere instrument to find fault?

That single case should not be singled out nor should it be emphasized. But the evaluators assume that is how the system is. If so, there is a problem with the existing system, not in the evaluation instrument, not in its administration, but in the system of thinking of those interpreting the data and those making decisions out of the data that they have. The said case should call the attention of the employer to reflect and consistently aspire to be more effective in the work. Zhasha’s ratings were consistent, so far as the numbers tell throughout the evaluation period, yes except at one single case. But the overall rating for that term where the fluke is still says that she was very satisfactory in three components being evaluated.

Probably, it is the definition of the component that causes the misinformation. This is because the evaluators assume that the component refers to any of the elements in the period of evaluation. That broad construct would then mean to  include timing, case, number of respondents, performance areas and others. But it should not exclude the general rating or average score and interpretation. Yet, the policy which are accessible to all employees does not tell “in all components” and it does not define what “components” are and what they include.

The illustration teaches us that policy ambiguity, the inadequate knowledge of evaluators in examining and interpreting numerical data can result to a dilemma. My basis of analysis were merely on the level of applying descriptive statistics. To arrive at a justifiable conclusion to Zhasha’s case or at least a reasonable decision, I challenge her evaluators to place the data in various statistical analysis to test the hypothesis: That Zhasha’s performance during the period of evaluation is not satisfactory.

I do see the merit of putting the highest benchmark because that will reinforce productivity. Although,  XYZ Inc., may be too idealistic to assume that an employee can get 100% very satisfactory rating in all components. For Zhasha, in the three panel surveys conducted she had received 100% very satisfactory rating. However, that is not how the evaluators see the case because their ignorance dictates them the very satisfactory performance is not statistically signficant and it is not consistent, since they are looking at a tree and assume that it is the forest.

This case is so revealing that a company can strive to keep the ‘strictest’ measure without applying the least reasonable ‘tool’ to come up with decisions over elementary statistical figures. It is saddening that while the company intends to maintain high standards, the standard of thinking of its administration is to abide by the “pre-existing system” without actually interrogating how the system really work for organizational productivity and the motivation of its employees.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s