How to determine the average in statistics. Determining the mean, variance and shape of the distribution

Average size is called a statistical indicator that gives a generalized characteristic of homogeneous.

The average value gives a generalizing quantitative characteristic of the entire population and characterizes it in relation to a given characteristic.

For example, the average gives a generalizing quantitative characteristic of the state of remuneration for the population of workers under consideration. In addition, using averages, it is possible to compare different information sets. For example, you can compare different organizations by level of labor productivity, as well as by level, and other indicators.

Essence of average lies in the fact that it cancels out random deviations in the values ​​of a characteristic and takes into account changes caused by the main factor.

Statistical processing by the method of average values ​​consists of replacing the individual values ​​of a varying characteristic with some balanced average value.

For example, the individual output of 5 tellers commercial bank per day amounted to 136, 140, 154 and 162 operations. To get the average number of transactions per day performed by one operator, you need to add up these individual indicators and divide the resulting amount by the number of operator:

Operations.

As can be seen from the above example, the average number of operations does not coincide with any of the individual ones, since not a single operator performed 150 operations. But if we imagine that each operator performed 150 operations, then their total sum will not change, but will also be equal to 750. Thus, we have arrived at the main property of average values: the sum of individual values ​​of a characteristic is equal to the sum of average values.

This property once again emphasizes that the average value is a generalizing characteristic of the entire statistical population.

Average values ​​are widely used in various fields of knowledge. They play a particularly important role in economics and statistics: in analysis, planning, forecasting, in calculating standards and in assessing the achieved level. The average is always a named quantity and has the same dimension as an individual unit of the population.

The most important conditions (principles) for the correct calculation and use of averages are the following:

  1. In each specific case, it is necessary to proceed from the qualitative content of the characteristic being averaged, to take into account the relationship of the characteristics being studied and the data available for calculation.
  2. The individual values ​​from which the average is calculated must relate to a homogeneous population, and their number must be significant.

Types of averages

Average values ​​are divided into two large classes: power means and structural means

Power averages: Structural averages:

The choice of the form of the average depends on the initial basis for calculating the average and on the available economic information for its calculation.

The initial basis for calculation and the guideline for the correct choice of the form of the average value are economic relationships that express the meaning of average values ​​and the relationship between indicators.

Calculation of some averages:

  • Average salary of 1 employee = Payroll / Number of employees
  • Average price of 1 product = Cost of production / Number of units of production
  • Average cost of 1 product = Production cost / Number of units of production
  • Average yield = Gross yield / sown area
  • Average labor productivity = volume of products, works, services / Time worked
  • Average labor intensity = time worked / volume of products, work, services
  • Average capital intensity = Average cost of fixed assets / volume of products, works and services
  • Average capital productivity = volume of products, works and services / average cost fixed assets
  • Average capital-labor ratio = average value of fixed production assets / average number of production personnel
  • Average percentage of defects = (cost of defective products / Cost of all products produced) * 100%

Power averages

Power averages, depending on the presentation of the source data, can be simple and balanced.
If an option occurs once, we carry out calculations based on the average simple (for example, a salary of 3 thousand rubles occurs only for one worker), and if the option is repeated an unequal number of times, that is, it has different

Lecture 5. Average values

The concept of average in statistics

Arithmetic mean and its properties

Other types of power averages

Mode and median

Quartiles and deciles

Average values ​​are widely used in statistics. Average values ​​characterize qualitative indicators commercial activities: distribution costs, profit, profitability, etc.

Average- This is one of the common methods of generalization. A correct understanding of the essence of the average determines its special significance in a market economy, when the average, through the individual and random, allows us to identify the general and extremely important, to identify the trend of patterns of economic development.

average value- these are general indicators in which actions are expressed general conditions, patterns of the phenomenon being studied.

average value (in statistics) – a general indicator characterizing the typical size or level of social phenomena per unit of the population, all other things being equal.

Using the method of averages, the following can be solved: main goals:

1. Characteristics of the level of development of phenomena.

2. Comparison of two or more levels.

3. Study of the interrelations of socio-economic phenomena.

4. Analysis of the location of socio-economic phenomena in space.

Statistical averages are calculated on the basis of mass data from correctly statistically organized mass observation (continuous and selective). In this case, the statistical average will be objective and typical if it is calculated from mass data for a qualitatively homogeneous population (mass phenomena). For example, if you calculate the average wage in cooperatives and state-owned enterprises, and extend the result to the entire population, then the average is fictitious, since it is calculated for a heterogeneous population, and such an average loses all meaning.

With the help of the average, differences in the value of a characteristic that arise for one reason or another in individual units of observation are smoothed out. For example, the average output of a salesperson depends on many reasons: qualifications, length of service, age, form of service, health, etc.

The essence of the average lies in the fact that it cancels out the deviations of the characteristic values ​​of individual units of the population caused by the action of random factors, and takes into account changes caused by the action of basic factors. This allows the average to reflect the typical level of the trait and abstract from the individual characteristics inherent in individual units.

The average value is a reflection of the values ​​of the characteristic being studied, therefore, it is measured in the same dimension as the given characteristic.

Each average value characterizes the population under study according to any one characteristic. To get a complete and comprehensive picture of the population under study in a number of ways essential features, in general, it is extremely important to have a system of average values ​​that can describe the phenomenon from different angles.

There are different averages:

Arithmetic mean;

Geometric mean;

Harmonic mean;

Mean square;

Average chronological.

The concept of average in statistics - concept and types. Classification and features of the category "The concept of average value in statistics" 2017, 2018.

The most common form of statistical indicators used in economic research is average value , which is a generalized quantitative characteristic of a characteristic in a statistical population in specific conditions place and time.

The most important property of the average value lies in the fact that it reflects what is common to all units of the population under study, because the attribute values ​​of individual units of the population fluctuate in one direction or another under the influence of many factors, including random ones.

Let's give examples economic indicators, based on the calculation of the average value and revealing its essence:

  • calculation of the average salary of an enterprise's employees is carried out by dividing the total wage fund by the number of employees;
  • the average size deposits in a bank are found by dividing the amount of deposits in monetary terms by the number of deposits;
  • to determine the average daily output of one employee, it is necessary to divide the volume of work (number of parts) performed by the employee for a certain period by the number of days in this period.

Types of averages used in statistics

Let us consider the main types of average values ​​used in solving socio-economic and analytical problems.

Simple arithmetic mean calculated by the formula:

When calculating average values, individual values ​​of the characteristic being averaged can be repeated or encountered several times.

In such cases, the average is calculated using grouped data or variation series. An example of using the weighted arithmetic mean formula is presented in Problem 2. Mean harmonic simple

Harmonic means are used when there is information on the economic content for the numerator, but for the denominator it must first be determined.

Harmonic mean weighted Mean harmonic simple

This formula is used to calculate average indicators not only statically, but also dynamically, when the individual values ​​of the attribute and weight W are known for a number of time intervals. An example of applying the weighted harmonic mean formula is presented in Problem 3.

Geometric mean simple (unweighted) is determined by the formula:

This type of average is most widely used in the analysis of dynamics to determine the average growth rate.

Simple mean square (unweighted) is determined by the formula:

The mean square is the basis for the calculations of a number of summary calculation indicators.

Most often used in economic practice structural averages are the mode and median. Fashion (Mo) represents the value of the characteristic being studied that is repeated with the greatest frequency. Median (Me) is the value of the attribute that falls in the middle of the ranked (ordered) population. An example of determining the median and mode for a discrete series of numbers is presented in Problem 1.

The main property of the median lies in the fact that the sum of absolute deviations of attribute values ​​from the median is less than from any other value.

For an interval series, mode calculation carried out according to the formula:

where Ho is the lower limit of the modal interval (the interval with the highest frequency is called modal);

i is the value of the modal interval; f Mo - frequency of the modal interval; f Mo-1 - frequency of the interval preceding the modal one; f Mo+1 is the frequency of the interval following the modal one. carried out according to the formula:

For an interval series, calculating the median

Ho - the lower limit of the median interval (the median is the first interval whose accumulated frequency exceeds half of the total sum of frequencies); i is the value of the median interval; Sme-1 - accumulated frequency of the interval preceding the median; f Me - frequency of the median interval.

Examples of solving problems on the topic “Average values ​​in statistics” Problem 1

. Given a series of numbers: 15; 15; 12; 14; 13. Find the range, arithmetic mean, median and mode of this series.

Solution

1) The range of a series of numbers is the difference between the largest and smallest of these numbers. In this case, the range is R = 15-12 = 3

3) To determine the median, it is necessary to arrange the proposed series - arrange the numbers, for example, in ascending order: 12; 13; 14; 15; 15.
The median of an odd number of numbers in a discrete series is the number written in the middle. The median of an even number of numbers is the arithmetic mean of the two numbers in the middle.
Since in our case the number of numbers in the series is odd, then Me = 14.

4) The mode of a discrete series of numbers is the number that occurs in a given series more often than others. Since the number 15 appears in our series more often than others, Mo = 15.

Problem 2 . There is information on the number of students at universities in the city and the share (%) of students studying on a commercial basis:

Define: 1) average specific gravity university students studying on a commercial basis; 2) the number of these students.

. Given a series of numbers: 15; 15; 12; 14; 13. Find the range, arithmetic mean, median and mode of this series.

To solve this, let’s expand the proposed table:

The average share of university students studying on a commercial basis will be determined by the weighted arithmetic average formula: Хср = (15×15+3×10+7×20) / (15+3+7) = 15.8%.

Answer . The average share of university students studying on a commercial basis is 15.8%, the number of these students is 3,950 people.

Problem 3 . The amount of unpaid loan debt as of July 1 amounted to 92.4 million monetary units. By individual industries economy it was distributed as follows:

Determine the average percentage of debt not paid on time. Justify the choice of the medium form.

. Given a series of numbers: 15; 15; 12; 14; 13. Find the range, arithmetic mean, median and mode of this series.

Since at different enterprises the amount of debt on loans is different with different specific weights, we will apply the weighted harmonic average formula.
Хср = ΣW / Σ(W/х) = (32+14+46.4)/(32/20+14/28+46.4/16) = 92.4/5 = 18.48%.

Answer . The average percentage of unpaid debt on time is 18.48%.

At the stage of statistical processing, a variety of research problems can be set, for the solution of which it is necessary to select the appropriate average. In this case, it is necessary to be guided by the following rule: the quantities that represent the numerator and denominator of the average must be logically related to each other.

  • power averages;
  • structural averages.

Let us introduce the following conventions:

The quantities for which the average is calculated;

Average, where the bar above indicates that averaging of individual values ​​takes place;

Frequency (repeatability of individual characteristic values).

Various averages are derived from the general power average formula:

when k = 1 - arithmetic mean; k = -1 - harmonic mean; k = 0 - geometric mean; k = -2 - root mean square.

Average values ​​can be simple or weighted.

Weighted averages are called quantities that take into account that some variants of attribute values ​​may have different numbers, and therefore each option has to be multiplied by this number. In other words, the “scales” are the numbers of aggregate units in different groups, i.e. Each option is “weighted” by its frequency. The frequency f is called the statistical weight or average weight.

It is known that the transactions were carried out within 5 days (5 transactions), the number of shares sold at the sales rate was distributed as follows:

1 - 800 ak. - 1010 rub.

2 - 650 ak. - 990 rub.

3 - 700 ak. - 1015 rub.

4 - 550 ak. - 900 rub.

5 - 850 ak. - 1150 rub.

The initial ratio for determining the average price of shares is the ratio of the total amount of transactions (TVA) to the number of shares sold (KPA):

OSS = 1010 800 + 990 650 + 1015 700+900 550+1150 850 = 3,634,500;

KPA = 800+650+700+550+850=3550.

In this case, the average stock price was equal to:

It is necessary to know the properties of the arithmetic average, which is very important both for its use and for its calculation. We can distinguish three main properties that most determined the widespread use of the arithmetic average in statistical and economic calculations.

Property one (zero): the sum of positive deviations of individual values ​​of a characteristic from its average value is equal to the sum of negative deviations. This is a very important property, since it shows that any deviations (both + and -) caused by random reasons will be mutually canceled out.

Proof:

Property two (minimum): the sum of squared deviations of individual values ​​of a characteristic from the arithmetic mean is less than from any other number (a), i.e. there is a minimum number.

Proof.

Let's compile the sum of squared deviations from variable a:

To find the extremum of this function, it is necessary to equate its derivative with respect to a to zero:

From here we get:

Consequently, the extremum of the sum of squared deviations is achieved at . This extremum is a minimum, since a function cannot have a maximum.

Property three: the arithmetic mean of a constant value is equal to this constant: for a = const.

Besides these three the most important properties arithmetic mean there are so-called design properties, which are gradually losing their significance due to the use of electronic computer technology:

  • if the individual value of the attribute of each unit is multiplied or divided by a constant number, then the arithmetic mean will increase or decrease by the same amount;
  • the arithmetic mean will not change if the weight (frequency) of each attribute value is divided by a constant number;
  • if the individual values ​​of the attribute of each unit are reduced or increased by the same amount, then the arithmetic mean will decrease or increase by the same amount.

Harmonic mean. This average is called the inverse arithmetic average because this value is used when k = -1.

Simple harmonic mean is used when the weights of the attribute values ​​are the same. Its formula can be derived from the basic formula by substituting k = -1:

For example, we need to calculate the average speed of two cars that covered the same path, but at different speeds: the first at a speed of 100 km/h, the second at 90 km/h.

Using the harmonic mean method, we calculate the average speed:

In statistical practice, the harmonic weighted is more often used, the formula of which has the form:

This formula is used in cases where the weights (or volumes of phenomena) for each attribute are not equal. In the initial ratio for calculating the average, the numerator is known, but the denominator is unknown.

For example, when calculating the average price, we must use the ratio of the sales amount to the number of units sold. We do not know the number of units sold (we are talking about different goods), but we know the sales amounts of these different goods.

Let's say we need to know average price goods sold:

We get

If you use the arithmetic average formula here, you can get an average price that will be unrealistic:

Geometric mean. Most often, the geometric mean finds its application in determining average growth rates (average growth coefficients), when individual values ​​of a characteristic are presented in the form of relative values. It is also used if it is necessary to find the average between the minimum and maximum values characteristic (for example, between 100 and 1,000,000). There are formulas for simple and weighted geometric mean.

For a simple geometric mean:

For the weighted geometric mean:

Root mean square value. The main area of ​​its application is the measurement of the variation of a characteristic in the aggregate (calculation of the standard deviation).

Simple mean square formula:

Weighted mean square formula:

As a result, we can say that from the right choice The type of average value in each specific case depends on the successful solution of statistical research problems.

Choosing the average involves the following sequence:

a) establishing a general indicator of the population;

b) determination of a mathematical relationship of quantities for a given general indicator;

c) replacing individual values ​​with average values;

d) calculation of the average using the appropriate equation.

Average values ​​refer to general statistical indicators that provide a summary (final) characteristic of mass social phenomena, since they are built on the basis of a large number of individual values ​​of a varying characteristic. To clarify the essence of the average value, it is necessary to consider the peculiarities of the formation of the values ​​of the signs of those phenomena, according to the data of which the average value is calculated.

It is known that units of each mass phenomenon have numerous characteristics. Whichever of these characteristics we take, its values ​​will be different for individual units; they change, or, as they say in statistics, vary from one unit to another. For example, an employee’s salary is determined by his qualifications, nature of work, length of service and a number of other factors, and therefore varies within very wide limits. The combined influence of all factors determines the amount of earnings of each employee, however, we can talk about the average monthly salary of workers in different sectors of the economy. Here we operate with a typical, characteristic value of a varying characteristic, assigned to a unit of a large population.

The average value reflects that general, which is typical for all units of the population being studied. At the same time, it balances the influence of all factors acting on the value of the characteristic of individual units of the population, as if mutually extinguishing them. The level (or size) of any social phenomenon is determined by the action of two groups of factors. Some of them are general and main, constantly operating, closely related to the nature of the phenomenon or process being studied, and form the typical for all units of the population being studied, which is reflected in the average value. Others are individual, their effect is less pronounced and is episodic, random. They act in the opposite direction, causing differences between the quantitative characteristics of individual units of the population, trying to change the constant value of the characteristics being studied. The effect of individual characteristics is extinguished in the average value. In the combined influence of typical and individual factors, which is balanced and mutually canceled out in general characteristics, it manifests itself in general view famous from mathematical statistics fundamental law large numbers.

In the aggregate, the individual values ​​of the characteristics merge into a common mass and, as it were, dissolve. Hence average value acts as “impersonal”, which can deviate from the individual values ​​of characteristics without coinciding quantitatively with any of them. The average value reflects the general, characteristic and typical for the entire population due to the mutual cancellation in it of random, atypical differences between the characteristics of its individual units, since its value is determined as if by the common resultant of all causes.

However, in order for the average value to reflect the most typical value of a characteristic, it should not be determined for any population, but only for populations consisting of qualitatively homogeneous units. This requirement is the main condition for the scientifically based use of averages and implies a close connection between the method of averages and the method of groupings in the analysis of socio-economic phenomena. Consequently, the average value is a general indicator characterizing the typical level of a varying characteristic per unit of a homogeneous population under specific conditions of place and time.

In thus defining the essence of average values, it is necessary to emphasize that the correct calculation of any average value presupposes the fulfillment of the following requirements:

  • the qualitative homogeneity of the population from which the average value is calculated. This means that the calculation of average values ​​should be based on the grouping method, which ensures the identification of homogeneous, similar phenomena;
  • excluding the influence of random, purely individual causes and factors on the calculation of the average value. This is achieved in the case when the calculation of the average is based on sufficiently massive material in which the action of the law of large numbers is manifested, and all randomness cancels out;
  • When calculating the average value, it is important to establish the purpose of its calculation and the so-called defining indicator(property) to which it should be oriented.

The defining indicator can act as the sum of the values ​​of the characteristic being averaged, the sum of its inverse values, the product of its values, etc. The relationship between the defining indicator and the average value is expressed as follows: if all values ​​of the characteristic being averaged are replaced by the average value, then their sum or product in in this case will not change the defining indicator. Based on this connection between the defining indicator and the average value, an initial quantitative relationship is constructed for direct calculation of the average value. The ability of average values ​​to preserve the properties of statistical populations is called defining property.

The average value calculated for the population as a whole is called general average; average values ​​calculated for each group - group averages. The general average reflects the general features of the phenomenon being studied, the group average gives a characteristic of the phenomenon that develops under the specific conditions of a given group.

Calculation methods may be different, therefore in statistics there are several types of average values, the main ones being the arithmetic mean, the harmonic mean and the geometric mean.

IN economic analysis the use of average values ​​is the main tool for assessing the results of scientific and technological progress, social events, and searching for reserves for economic development. At the same time, it should be remembered that excessive reliance on average indicators can lead to biased conclusions when conducting economic and statistical analysis. This is due to the fact that average values, being general indicators, extinguish and ignore those differences in the quantitative characteristics of individual units of the population that actually exist and may be of independent interest.

Types of averages

In statistics, various types of averages are used, which are divided into two large classes:

  • power means (harmonic mean, geometric mean, arithmetic mean, quadratic mean, cubic mean);
  • structural means (mode, median).

To calculate power averages it is necessary to use all available characteristic values. Fashion And median are determined only by the structure of the distribution, therefore they are called structural, positional averages. The median and mode are often used as an average characteristic in those populations where calculating the power mean is impossible or impractical.

The most common type of average is the arithmetic mean. Under arithmetic mean is understood as the value of a characteristic that each unit of the population would have if the total sum of all values ​​of the characteristic were distributed evenly among all units of the population. The calculation of this value comes down to summing all the values ​​of the varying characteristic and dividing the resulting amount by the total number of units in the population. For example, five workers fulfilled an order for the production of parts, while the first produced 5 parts, the second - 7, the third - 4, the fourth - 10, the fifth - 12. Since in the source data the value of each option occurred only once, to determine the average output of one worker should apply the simple arithmetic average formula:

i.e. in our example, the average output of one worker is equal to

Along with the simple arithmetic mean, they study weighted arithmetic average. For example, let's calculate the average age of students in a group of 20 people, whose ages range from 18 to 22 years, where xi- variants of the characteristic being averaged, fi- frequency, which shows how many times it occurs i-th value in the aggregate (Table 5.1).

Table 5.1

Average age of students

Applying the weighted arithmetic mean formula, we get:


To select a weighted arithmetic mean, there is certain rule: if there is a series of data on two indicators, for one of which it is necessary to calculate

average value, and at the same time the numerical values ​​of the denominator of its logical formula are known, and the values ​​of the numerator are unknown, but can be found as the product of these indicators, then the average value should be calculated using the arithmetic weighted average formula.

In some cases, the nature of the initial statistical data is such that the calculation of the arithmetic average loses its meaning and the only generalizing indicator can only be another type of average - harmonic mean. Currently, the computational properties of the arithmetic mean have lost their relevance in the calculation of general statistical indicators due to the widespread introduction of electronic computing technology. The harmonic mean value, which can also be simple and weighted, has acquired great practical importance. If the numerical values ​​of the numerator of a logical formula are known, and the values ​​of the denominator are unknown, but can be found as a partial division of one indicator by another, then the average value is calculated using the harmonic weighted average formula.

For example, let it be known that the car covered the first 210 km at a speed of 70 km/h, and the remaining 150 km at a speed of 75 km/h. It is impossible to determine the average speed of a car over the entire journey of 360 km using the arithmetic average formula. Since the options are speeds in individual sections xj= 70 km/h and X2= 75 km/h, and the weights (fi) are considered to be the corresponding sections of the path, then the products of the options and the weights will have neither physical nor economic meaning. In this case, the quotients acquire meaning from dividing the sections of the path into the corresponding speeds (options xi), i.e., the time spent on passing individual sections of the path (fi / xi). If the segments of the path are denoted by fi, then the entire path is expressed as Σfi, and the time spent on the entire path is expressed as Σ fi / xi , Then the average speed can be found as the quotient of the entire path divided by the total time spent:

In our example we get:

If, when using the harmonic mean, the weights of all options (f) are equal, then instead of the weighted one you can use simple (unweighted) harmonic mean:

where xi are individual options; n- number of variants of the averaged characteristic. In the speed example, simple harmonic mean could be applied if the path segments traveled at different speeds were equal.

Any average value must be calculated so that when it replaces each variant of the averaged characteristic, the value of some final, general indicator that is associated with the averaged indicator does not change. Thus, when replacing actual speeds on individual sections of the route with their average value (average speed), the total distance should not change.

The form (formula) of the average value is determined by the nature (mechanism) of the relationship of this final indicator with the averaged one, therefore the final indicator, the value of which should not change when replacing options with their average value, is called defining indicator. To derive the average formula, you need to create and solve an equation using the relationship between the averaged indicator and the defining indicator. This equation is constructed by replacing the variants of the averaged characteristic (indicator) with their average value.

In addition to the arithmetic mean and harmonic mean, other types (forms) of the mean are used in statistics. They are all special cases power average. If we calculate all types of power averages for the same data, then the values

they will turn out to be the same, the rule applies here majo-rate average. As the exponent of the average increases, the average value itself increases. The most frequently used calculation formulas in practical research various types power average values ​​are presented in table. 5.2.

Table 5.2


The geometric mean is used when there is n growth coefficients, while the individual values ​​of the characteristic are, as a rule, relative dynamics values, constructed in the form of chain values, as a ratio to the previous level of each level in the dynamics series. The average thus characterizes the average growth rate. Average geometric simple calculated by the formula

Formula weighted geometric mean has the following form:

The above formulas are identical, but one is applied at current coefficients or growth rates, and the second - at absolute values ​​of series levels.

Mean square used in calculations with the values ​​of quadratic functions, used to measure the degree of fluctuation of individual values ​​of a characteristic around the arithmetic mean in the distribution series and is calculated by the formula

Weighted mean square calculated using another formula:

Average cubic is used in calculations with cubic function values ​​and is calculated using the formula

average cubic weighted:

All average values ​​discussed above can be presented as a general formula:

where is the average value; - individual meaning; n- number of units of the population being studied; k- exponent that determines the type of average.

When using the same source data, the more k V general formula power average, the larger the average value. It follows from this that there is a natural relationship between the values ​​of power averages:

The average values ​​described above give a generalized idea of ​​the population being studied, and from this point of view, their theoretical, applied and educational significance is indisputable. But it happens that the average value does not coincide with any of the actually existing options, therefore, in addition to the considered averages, in statistical analysis it is advisable to use the values ​​of specific options that occupy a very specific position in the ordered (ranked) series of attribute values. Among these quantities, the most commonly used are structural, or descriptive, average- mode (Mo) and median (Me).

Fashion- the value of a characteristic that is most often found in a given population. In relation to a variation series, the mode is the most frequently occurring value of the ranked series, i.e., the option with the highest frequency. Fashion can be used in determining the stores that are visited more often, the most common price for any product. It shows the size of a feature characteristic of a significant part of the population and is determined by the formula

where x0 is the lower limit of the interval; h- interval size; fm- interval frequency; fm_ 1 - frequency of the previous interval; fm+ 1 - frequency of the next interval.

Median the option located in the center of the ranked row is called. The median divides the series into two equal parts such that there are the same number of population units on either side of it. In this case, one half of the units in the population has a value of the varying characteristic less than the median, while the other half has a value greater than it. The median is used when studying an element whose value is greater than or equal to, or at the same time less than or equal to, half of the elements of a distribution series. The median gives a general idea of ​​where the attribute values ​​are concentrated, in other words, where their center is.

The descriptive nature of the median is manifested in the fact that it characterizes the quantitative limit of the values ​​of a varying characteristic that half of the units in the population possess. The problem of finding the median for a discrete variation series is easily solved. If all units of the series are given serial numbers, then the serial number of the median option is determined as (n + 1) / 2 with an odd number of members of n. If the number of members of the series is an even number, then the median will be the average value of two options that have serial numbers n/ 2 and n / 2 + 1.

When determining the median in interval variation series, first determine the interval in which it is located (median interval). This interval is characterized by the fact that its accumulated sum of frequencies is equal to or exceeds half the sum of all frequencies of the series. The median of an interval variation series is calculated using the formula

Where X0- lower limit of the interval; h- interval size; fm- interval frequency; f- number of members of the series;

∫m-1 is the sum of the accumulated terms of the series preceding the given one.

Along with the median for more full characteristics the structures of the population under study also use other values ​​of options that occupy a very specific position in the ranked series. These include quartiles And deciles. Quartiles divide the series according to the sum of frequencies into 4 equal parts, and deciles - into 10 equal parts. There are three quartiles and nine deciles.

The median and mode, unlike the arithmetic mean, do not eliminate individual differences in the values ​​of a variable characteristic and therefore are additional and very important characteristics of the statistical population. In practice, they are often used instead of the average or along with it. It is especially advisable to calculate the median and mode in cases where the population under study contains a certain number of units with a very large or very small value of the varying characteristic. These values ​​of the options, which are not very characteristic of the population, while influencing the value of the arithmetic mean, do not affect the values ​​of the median and mode, which makes the latter very valuable indicators for economic and statistical analysis.

Variation indicators

The purpose of statistical research is to identify the basic properties and patterns of the statistical population being studied. In the process of summary processing of statistical observation data, they build distribution series. There are two types of distribution series - attributive and variational, depending on whether the characteristic taken as the basis for the grouping is qualitative or quantitative.

Variational are called distribution series constructed on a quantitative basis. The values ​​of quantitative characteristics in individual units of the population are not constant, they differ more or less from each other. This difference in the value of a characteristic is called variations. Individual numerical values ​​of a characteristic found in the population being studied are called variants of values. The presence of variation in individual units of the population is due to the influence of a large number of factors on the formation of the level of the trait. The study of the nature and degree of variation of characteristics in individual units of the population is the most important issue any statistical research. Variation indices are used to describe the measure of trait variability.

Another important task of statistical research is to determine the role of individual factors or their groups in the variation of certain characteristics of the population. To solve this problem, statistics uses special methods for studying variation, based on the use of a system of indicators with which variation is measured. In practice, the researcher is faced with quite a lot big amount variants of attribute values, which does not give an idea of ​​the distribution of units by attribute value in the aggregate. To do this, arrange all variants of characteristic values ​​in ascending or descending order. This process is called ranking the series. The ranked series immediately gives a general idea of ​​the values ​​that the feature takes in the aggregate.

The insufficiency of the average value for an exhaustive description of the population forces us to supplement the average values ​​with indicators that allow us to assess the typicality of these averages by measuring the variability (variation) of the characteristic being studied. The use of these indicators of variation makes it possible to make statistical analysis more complete and meaningful and thereby gain a deeper understanding of the essence of the social phenomena being studied.

The simplest signs of variation are minimum And maximum - this is the smallest and highest value signs in the aggregate. The number of repetitions of individual variants of characteristic values ​​is called repetition frequency. Let us denote the frequency of repetition of the attribute value fi, the sum of frequencies equal to the volume of the population being studied will be:

Where k- number of options for attribute values. It is convenient to replace frequencies with frequencies - wi. Frequency- relative frequency indicator - can be expressed in fractions of a unit or percentage and allows you to compare variation series with different numbers of observations. Formally we have:

Various absolute and relative indicators. Absolute indicators of variation include mean linear deviation, range of variation, dispersion, and standard deviation.

Range of variation(R) represents the difference between the maximum and minimum values ​​of the attribute in the population being studied: R= Xmax - Xmin. This indicator gives only the most general idea of ​​the variability of the characteristic being studied, since it shows the difference only between the maximum values ​​of the options. It is completely unrelated to the frequencies in the variation series, i.e., to the nature of the distribution, and its dependence can give it an unstable, random character only on the extreme values ​​of the characteristic. The range of variation does not provide any information about the characteristics of the populations under study and does not allow us to assess the degree of typicality of the obtained average values. The scope of application of this indicator is limited to fairly homogeneous populations; more precisely, it characterizes the variation of a characteristic by an indicator based on taking into account the variability of all values ​​of the characteristic.

To characterize the variation of a characteristic, it is necessary to generalize the deviations of all values ​​from any value typical for the population being studied. Such indicators

variations, such as the average linear deviation, dispersion and standard deviation, are based on considering the deviations of the characteristic values ​​of individual units of the population from the arithmetic mean.

Average linear deviation represents the arithmetic mean of the absolute values ​​of deviations of individual options from their arithmetic mean:


The absolute value (modulus) of the deviation of the variant from the arithmetic mean; f- frequency.

The first formula is applied if each of the options occurs in the aggregate only once, and the second - in series with unequal frequencies.

There is another way of averaging the deviations of options from the arithmetic mean. This very common method in statistics comes down to calculating the squared deviations of the options from the average value with their subsequent averaging. In this case, we obtain a new indicator of variation - dispersion.

Dispersion(σ 2) - the average of the squared deviations of the attribute value options from their average value:

The second formula is applied if the options have their own weights (or frequencies of the variation series).

In economic and statistical analysis, it is customary to evaluate the variation of a characteristic most often using the standard deviation. Standard deviation(σ) is the square root of the variance:

Average linear and standard deviations show how much the value of a characteristic fluctuates on average among units of the population under study, and are expressed in the same units of measurement as the options.

In statistical practice there is often a need to compare variation various signs. For example, it is of great interest to compare variations in the age of personnel and their qualifications, length of service and wages, etc. For such comparisons, indicators of absolute variability of characteristics - linear average and standard deviation - are not suitable. It is, in fact, impossible to compare the fluctuation of length of service, expressed in years, with the fluctuation of wages, expressed in rubles and kopecks.

When comparing the variability of various characteristics together, it is convenient to use relative measures of variation. These indicators are calculated as the ratio of absolute indicators to the arithmetic mean (or median). Using the range of variation, the average linear deviation, and the standard deviation as an absolute indicator of variation, relative indicators of variability are obtained:


The most commonly used indicator of relative variability, characterizing the homogeneity of the population. The population is considered homogeneous if the coefficient of variation does not exceed 33% for distributions close to normal.