Title: | Gini Indices, Variances and Confidence Intervals for Finite and Infinite Populations |
---|---|
Description: | Estimates the Gini index and computes variances and confidence intervals for finite and infinite populations, using different methods; also computes Gini index for continuous probability distributions, draws samples from continuous probability distributions with Gini indices set by the user; uses 'Rcpp'. References: Muñoz et al. (2023) <doi:10.1177/00491241231176847>. Álvarez et al. (2021) <doi:10.3390/math9243252>. Giorgi and Gigliarano (2017) <doi:10.1111/joes.12185>. Langel and Tillé (2013) <doi:10.1111/j.1467-985X.2012.01048.x>. |
Authors: | Juan Francisco Muñoz [aut, cre]
|
Maintainer: | Juan Francisco Muñoz <[email protected]> |
License: | GPL |
Version: | 0.0.1-3 |
Built: | 2025-02-03 03:55:39 UTC |
Source: | https://github.com/cran/giniVarCI |
Compares variance estimates and confidence intervals for the Gini index in finite populations.
fcompareCI( y, w, Pi = NULL, Pij = NULL, PiU, alpha = 0.05, B = 1000L, digitsgini = 2L, digitsvar = 4L, na.rm = TRUE, plotCI = TRUE, line.types = c(1L, 2L, 4L), colors = c("red", "green", "blue"), shapes = c(8L, 4L, 3L), save.plot = FALSE, large.sample = FALSE)
fcompareCI( y, w, Pi = NULL, Pij = NULL, PiU, alpha = 0.05, B = 1000L, digitsgini = 2L, digitsvar = 4L, na.rm = TRUE, plotCI = TRUE, line.types = c(1L, 2L, 4L), colors = c("red", "green", "blue"), shapes = c(8L, 4L, 3L), save.plot = FALSE, large.sample = FALSE)
y |
A vector with the non-negative real numbers to be used for estimating the Gini index. |
w |
A numeric vector with the survey weights to be used for estimating the Gini index, the variance estimation and the confidence interval. This argument can be missing if argument |
Pi |
A numeric vector with the (sample) first inclusion probabilites to be used for estimating the Gini index, the variance estimation and the confidence interval. This argument can be |
Pij |
A numeric square matrix with the (sample) second (joint) inclusion probabilites to be used for the variance estimation and the confidence interval. The Hajek approximation is used when |
PiU |
A numeric vector with the (population) first inclusion probabilites. The Hartley-Rao ( |
alpha |
A single numeric value between 0 and 1 specifying the confidence level 1- |
B |
A single integer specifying the number of bootstrap replicates. The default value is |
digitsgini |
A single integer specifying the number of decimals used in the estimation of the Gini index and confidence intervals. The default value is |
digitsvar |
A single integer specifying the number of decimals used in the variance estimation of the Gini index. The default value is |
na.rm |
A 'TRUE/FALSE' logical value indicating whether |
plotCI |
A 'TRUE/FALSE' logical value indicating whether confidence intervals are compared using a plot. The default value is |
line.types |
A numeric vector of length 3 specifying the line types. See the function |
colors |
A vector of length 3 specifying the colors for lines of the plot. The default value is |
shapes |
A numeric vector specifying the point shapes for the limits of intervals. If |
save.plot |
A 'TRUE/FALSE' logical value indicating whether the ggplot object of the plot comparing the confidence intervals should be saved in the output. The default value is |
large.sample |
A 'TRUE/FALSE' logical value indicating whether the sample is large to apply a faster algorithm to sort the sample values in the computation of the Gini index. The default value is |
For a sample , with size
and inclusion probabilities
(argument
Pi
), derived from a finite population , with size
, different formulations of the Gini index have been proposed in the literature. This function estimates the Gini index, variances and confidence intervals using various formulations. The different methods for estimating the Gini index are (see also Muñoz et al., 2023):
\
Gini Index formulae.
Method 1
(Langel and Tillé, 2013)
where ,
, and
are the survey weights. For example, the survey weights can be
.
w
or Pi
must be provided, but not both. It is required that , for
, when both
w
and Pi
are provided.
Method 2
(Alfons and Templ, 2012; Langel and Tillé, 2013)
where are the values
sorted in increasing order,
are the values
sorted according to the increasing order of the values
, and
. Langel and Tillé (2013) show that
, so the computation of
is ommited in results.
Method 3
(Berger, 2008)
where
is the smooth (mid-point) distribution function, and is the indicator variable that takes the value 1 when its argument is true, and 0 otherwise. It can be seen that
, so the computation of
is ommited in results.
Method 4
(Berger and Gedik-Balay, 2020)
where and
Method 5
(Lerman and Yitzhaki, 1989)
where
and .
\
Variances and confidence intervals.
For a given estimator and variable
, the Horvitz-Thompson type variance estimator (Hortvitz and Thompson, 1952) is given by
where
and is the second (joint) inclusion probability of the individuals
and
, i.e.,
(argument
Pij
).
The Sen-Yates-Grundy type variance estimator (Sen, 1953; Yates and Grundy, 1953) is defined as
.
The Hartley-Rao type variance estimator (Hartley and Rao, 1962) is given by
Note that the The Horvitz-Thompson variance estimator can give negative values. We observe that both Horvitz-Thompson and Sen-Yates-Grundy variance estimators depend on second (joint) inclusion probabilities (argument Pij
). The Hajek (1964) approximation
is used when the second (joint) inclusion probabilities are not available (Pij = NULL
). Note that the Hajek approximation is suggested for large-entropy sampling designs, large samples, and large populations (see Tille 2006; Berger and Tillé, 2009; Haziza et al., 2008; Berger, 2011). For instance, this approximation is not recomended for highly-stratified samples (Berger, 2005). The Hartley-Rao variance estimator requires the first inclusion probabilities at the population level (argument PiU
). zjackknife
computes the confidence interval based on the jackknife technique with critical values based on the Normal approximation. zalinearization
and zblinearization
compute the confidence intervals based on the linearization technique applied to the estimators
and
respectively, where
Critical values are also based on the Normal approximation. pbootstrap
computes the variance using the rescaled bootstrap, and the confidence interval is constructed using the percentile method. The vignette vignette("GiniVarInterval")
contains a detailed description of the various methods for variance estimation and confidence intervals for the Gini index.
The following table summarises the various types of variances and confidence intervals that the function fcompareCI
computes.
Interval | Variance | Critical values | References |
_______________ | ______________ | _________________ | _________________________ |
zjackknife |
Jackknife | Normal | Berger (2008) |
zalinearization |
Linearization | Normal | Langel and Tille (2013) |
zblinearization |
Linearization | Normal | Berger (2008) |
pBootstrap |
Rescaled bootstrap | Percentile bootstrap | Berger and Gedik-Balay (2020) |
If save.plot = FALSE
, a data frame with columns:
interval
. The method used to construct the confidence interval.
method
. The method used to estimate the Gini index.
varformula
. The type of formula for the variance estimator. Posible values are HT
and SYG
if argument PiU
is missing, and HT
, SYG
amd HR
if argument PiU
is provided.
gini
. The estimation of the Gini index.
lowerlimit
. The lower limit of the confidence interval.
upperlimit
. The upper limit of the confidence interval.
var.gini
. The variance estimation for the estimator of the Gini index.
If save.plot = TRUE
, a list with two components: (i) 'base.CI' a data frame of seven columns as just described and (ii) 'plot' a (ggplot) description of the plot, which is a list with components that contain the plot itself, the data, information about the scales, panels, etc. As a side-effect, a plot that compares the various methods for constructing confidence intervals for the Gini index is displayed. **ggplot2** is needed to be installed for this option to work.
If plotCI = TRUE
, as a side-effect, a plot that compares the various methods for constructing confidence intervals for the Gini index is displayed. **ggplot2** is needed to be installed for this option to work.
Juan F Munoz [email protected]
Jose M Pavia [email protected]
Encarnacion Alvarez [email protected]
Alfons, A., and Templ, M. (2012). Estimation of social exclusion indicators from complex surveys: The R package laeken. KU Leuven, Faculty of Business and Economics Working Paper.
Berger, Y. G. (2005). Variance estimation with highly stratified sampling designs with unequal probabilities. Australian & New Zealand Journal of Statistics, 47, 365–373.
Berger, Y. G. (2008). A note on the asymptotic equivalence of jackknife and linearization variance estimation for the Gini Coefficient. Journal of Official Statistics, 24(4), 541-555.
Berger, Y. G. (2011). Asymptotic consistency under large entropy sampling designs with unequal probabilities. Pakistan Journal of Statistics, 27, 407–426.
Berger, Y., and Gedik-Balay, İ. (2020). Confidence intervals of Gini coefficient under unequal probability sampling. Journal of Official Statistics, 36(2), 237-249.
Berger, Y. G. and Tillé, Y. (2009). Sampling with unequal probabilities. In Sample Surveys: Design, Methods and Applications (eds. D. Pfeffermann and C. R. Rao), 39–54. Elsevier, Amsterdam.
Hajek, J. (1964). Asymptotic theory of rejective sampling with varying probabilities from a finite population. The Annals of Mathematical Statistics, 35, 4, 1491–1523.
Hartley, H. O., and Rao, J. N. K. (1962). Sampling with unequal probabilities and without replacement. The Annals of Mathematical Statistics, 350-374.
Haziza, D., Mecatti, F. and Rao, J. N. K. (2008). Evaluation of some approximate variance estimators under the Rao-Sampford unequal probability sampling design. Metron, LXVI, 91–108.
Horvitz, D. G. and Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663–685.
Langel, M., and Tillé, Y. (2013). Variance estimation of the Gini index: revisiting a result several times published. Journal of the Royal Statistical Society: Series A (Statistics in Society), 176(2), 521-540.
Lerman, R. I., and Yitzhaki, S. (1989). Improving the accuracy of estimates of Gini coefficients. Journal of econometrics, 42(1), 43-47.
Muñoz, J. F., Moya-Fernández, P. J., and Álvarez-Verdejo, E. (2023). Exploring and Correcting the Bias in the Estimation of the Gini Measure of Inequality. Sociological Methods & Research. https://doi.org/10.1177/00491241231176847
Sen, A. R. (1953). On the estimate of the variance in sampling with varying probabilities. Journal of the Indian Society of Agricultural Statistics, 5, 119–127.
Tillé, Y. (2006). Sampling Algorithms. Springer, New York.
Yates, F., and Grundy, P. M. (1953). Selection without replacement from within strata with probability proportional to size. Journal of the Royal Statistical Society B, 15, 253–261.
# Income and weights (region 'Burgenland') from the 2006 Austrian EU-SILC (Package 'laeken'). data(eusilc, package="laeken") y <- eusilc$eqIncome[eusilc$db040 == "Burgenland"] w <- eusilc$rb050[eusilc$db040 == "Burgenland"] # Estimation of the Gini index and confidence intervals using different methods. fcompareCI(y, w) y <- c(30428.83, 14976.54, 18094.09, 29476.79, 20381.93, 6876.17, 10360.96, 8239.82, 29476.79, 32230.71) w <- c(357.86, 480.99, 480.99, 476.01, 498.58, 498.58, 476, 498.58, 476.01, 476.01) fcompareCI(y, w, plotCI = FALSE)
# Income and weights (region 'Burgenland') from the 2006 Austrian EU-SILC (Package 'laeken'). data(eusilc, package="laeken") y <- eusilc$eqIncome[eusilc$db040 == "Burgenland"] w <- eusilc$rb050[eusilc$db040 == "Burgenland"] # Estimation of the Gini index and confidence intervals using different methods. fcompareCI(y, w) y <- c(30428.83, 14976.54, 18094.09, 29476.79, 20381.93, 6876.17, 10360.96, 8239.82, 29476.79, 32230.71) w <- c(357.86, 480.99, 480.99, 476.01, 498.58, 498.58, 476, 498.58, 476.01, 476.01) fcompareCI(y, w, plotCI = FALSE)
Estimates the Gini index and computes variances and confidence intervals in finite populations.
fgini( y, w, method = 2L, interval = NULL, Pi = NULL, Pij = NULL, PiU, alpha = 0.05, B = 1000L, na.rm = TRUE, varformula = "SYG", large.sample = FALSE )
fgini( y, w, method = 2L, interval = NULL, Pi = NULL, Pij = NULL, PiU, alpha = 0.05, B = 1000L, na.rm = TRUE, varformula = "SYG", large.sample = FALSE )
y |
A vector with the non-negative real numbers to be used for estimating the Gini index. |
w |
A numeric vector with the survey weights to be used for estimating the Gini index, the variance and the confidence interval. This argument can be missing if argument |
method |
An integer between 1 and 5 selecting one of the 5 methods detailed below for estimating the Gini index in finite populations. The default method is |
interval |
A character string specifying the type of variance estimation and confidence interval to be used. Possible values are |
Pi |
A numeric vector with the (sample) first inclusion probabilites to be used for estimating the Gini index, the variance and the confidence interval. This argument can be |
Pij |
A numeric square matrix with the (sample) second (joint) inclusion probabilites to be used for the variance estimation and the confidence interval. The Hajek approximation is used when |
PiU |
A numeric vector with the (population) first inclusion probabilites. This argument is only required when the Hartley-Rao expression for the variance estimation is selected ( |
alpha |
A single numeric value between 0 and 1. If |
B |
A single integer specifying the number of bootstrap replicates. This argument is required when |
na.rm |
A 'TRUE/FALSE' logical value indicating whether |
varformula |
A character string specifying the type of formula to be used for the variance estimator when |
large.sample |
A 'TRUE/FALSE' logical value indicating indicating whether the sample is large to apply a faster algorithm to sort the sample values in the computation of the Gini index. The default value is |
For a sample , with size
and inclusion probabilities
(argument
Pi
), derived from a finite population , with size
, different formulations of the Gini index have been proposed in the literature. his function estimates the Gini index, variances and confidence intervals using various formulations. The different methods for estimating the Gini index are (see also Muñoz et al., 2023):
\
Gini Index formulae.
method = 1
(Langel and Tillé, 2013)
where ,
, and
are the survey weights. For example, the survey weights can be
.
w
or Pi
must be provided, but not both. It is required that , for
, when both
w
and Pi
are provided.
method = 2
(Alfons and Templ, 2012; Langel and Tillé, 2013)
where are the values
sorted in increasing order,
are the values
sorted according to the increasing order of the values
, and
. Langel and Tillé (2013) show that
.
method = 3
(Berger, 2008)
where
is the smooth (mid-point) distribution function, and is the indicator variable that takes the value 1 when its argument is true, and the value 0 otherwise. It can be seen that
.
method = 4
(Berger and Gedik-Balay, 2020)
where and
method = 5
(Lerman and Yitzhaki, 1989)
where
and .
\
Variances and confidence intervals.
For a given estimator and variable
, the Horvitz-Thompson type variance estimator (Hortvitz and Thompson, 1952)
is computed when varformula = "HT"
, where
and is the second (joint) inclusion probability of the individuals
and
, i.e.,
(argument
Pij
).
The Sen-Yates-Grundy type variance estimator (Sen, 1953; Yates and Grundy, 1953)
is computed when varformula = "SYG"
, and the Hartley-Rao type variance estimator (Hartley and Rao, 1962)
is computed when varformula = "HR"
. Note that the The Horvitz-Thompson variance estimator can give negative values. We observe that both Horvitz-Thompson and Sen-Yates-Grundy variance estimators depend on second (joint) inclusion probabilities (argument Pij
). The Hajek (1964) approximation
is used when the second (joint) inclusion probabilities are not available (Pij = NULL
). Note that the Hajek approximation is suggested for large-entropy sampling designs, large samples, and large populations (see Tille 2006; Berger and Tille, 2009; Haziza et al., 2008; Berger, 2011). For instance, this approximation is not recomended for highly-stratified samples (Berger, 2005). The Hartley-Rao variance estimator requires the first inclusion probabilities at the population level (argument PiU
). zjakknife
computes the confidence interval based on the jackknife technique with critical values based on the Normal approximation. zalinearization
and zblinearization
compute the confidence intervals based on the linearization technique applied to the estimators
and
respectively, where
Critical values are also based on the Normal approximation. pbootstrap
computes the variance using the rescaled bootstrap, and the confidence interval is constructed using the percentile method. The vignette vignette("GiniVarInterval")
contains a detailed description of the various methods for variance estimation and confidence intervals for the Gini index.
The following table summarises the various types of variances and confidence intervals that the function fgini
computes. The argument varformula
only applies for the jackknife and linearization techniques (see Berger, 2008; Langel and Tillé, 2013).
Interval | Variance | Critical values | References |
_______________ | ______________ | _________________ | _________________________ |
zjackknife |
Jackknife | Normal | Berger (2008) |
zalinearization |
Linearization | Normal | Langel and Tille (2013) |
zblinearization |
Linearization | Normal | Berger (2008) |
pBootstrap |
Rescaled bootstrap | Percentile bootstrap | Berger and Gedik-Balay (2020) |
When interval = NULL
, the function returns a single numeric value between 0 and 1 informing about the estimation of the Gini index. When interval
is not NULL
, the function returns a list with 3 components: a single numeric value with the estimation of the Gini index; a single numeric value with the variance estimation of the Gini index; and a vector of length two containing the lower and upper limits of the confidence interval for the Gini index.
Juan F Munoz [email protected]
Jose M Pavia [email protected]
Encarnacion Alvarez [email protected]
Alfons, A., and Templ, M. (2012). Estimation of social exclusion indicators from complex surveys: The R package laeken. KU Leuven, Faculty of Business and Economics Working Paper.
Berger, Y. G. (2005). Variance estimation with highly stratified sampling designs with unequal probabilities. Australian & New Zealand Journal of Statistics, 47, 365–373.
Berger, Y. G. (2008). A note on the asymptotic equivalence of jackknife and linearization variance estimation for the Gini Coefficient. Journal of Official Statistics, 24(4), 541-555.
Berger, Y. G. (2011). Asymptotic consistency under large entropy sampling designs with unequal probabilities. Pakistan Journal of Statistics, 27, 407–426.
Berger, Y. G. and Tillé, Y. (2009). Sampling with unequal probabilities. In Sample Surveys: Design, Methods and Applications (eds. D. Pfeffermann and C. R. Rao), 39–54. Elsevier, Amsterdam
Berger, Y., and Gedik-Balay, I. (2020). Confidence intervals of Gini coefficient under unequal probability sampling. Journal of Official Statistics, 36(2), 237-249.
Hajek, J. (1964). Asymptotic theory of rejective sampling with varying probabilities from a finite population. The Annals of Mathematical Statistics, 35, 4, 1491–1523.
Hartley, H. O., and Rao, J. N. K. (1962). Sampling with unequal probabilities and without replacement. The Annals of Mathematical Statistics, 350-374.
Haziza, D., Mecatti, F. and Rao, J. N. K. (2008). Evaluation of some approximate variance estimators under the Rao-Sampford unequal probability sampling design. Metron, LXVI, 91–108.
Horvitz, D. G. and Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663–685.
Langel, M., and Tille, Y. (2013). Variance estimation of the Gini index: revisiting a result several times published. Journal of the Royal Statistical Society: Series A (Statistics in Society), 176(2), 521-540.
Lerman, R. I., and Yitzhaki, S. (1989). Improving the accuracy of estimates of Gini coefficients. Journal of econometrics, 42(1), 43-47.
Muñoz, J. F., Moya-Fernández, P. J., and Álvarez-Verdejo, E. (2023). Exploring and Correcting the Bias in the Estimation of the Gini Measure of Inequality. Sociological Methods & Research. https://doi.org/10.1177/00491241231176847
Sen, A. R. (1953). On the estimate of the variance in sampling with varying probabilities. Journal of the Indian Society of Agricultural Statistics, 5, 119–127.
Tillé, Y. (2006). Sampling Algorithms. Springer, New York.
Yates, F., and Grundy, P. M. (1953). Selection without replacement from within strata with probability proportional to size. Journal of the Royal Statistical Society B, 15, 253–261.
# Income and weights (region 'Burgenland') from the 2006 Austrian EU-SILC (Package 'laeken'). data(eusilc, package="laeken") y <- eusilc$eqIncome[eusilc$db040 == "Burgenland"] w <- eusilc$rb050[eusilc$db040 == "Burgenland"] # Estimation of the Gini index using 'method = 2' . fgini(y, w) y <- c(30428.83, 14976.54, 18094.09, 29476.79, 20381.93, 6876.17, 10360.96, 8239.82, 29476.79, 32230.71) w <- c(357.86, 480.99, 480.99, 476.01, 498.58, 498.58, 476, 498.58, 476.01, 476.01) # Gini index estimation and confidence interval using: ## a: The method 2 for point estimation. ## b: The method 'zjackknife' for variance estimation. ## c: The Sen-Yates-Grundy type variance estimator. ## d: The Hajek approximation for the joint inclusion probabilities. fgini(y, w, interval = "zjackknife") # Gini index estimation and confidence interval using: ## a: The method 2 for point estimation. ## b: The method 'zalinearization' for variance estimation. ## c: The Sen-Yates-Grundy type variance estimator. ## d: The Hajek approximation for the joint inclusion probabilities. fgini(y, w, interval = "zalinearization") # Gini index estimation and confidence interval using: ## a: The method 3 for point estimation. ## b: The method 'zblinearization' for variance estimation. ## c: The Sen-Yates-Grundy type variance estimator. ## d: The Hajek approximation for the joint inclusion probabilities. fgini(y, w, method = 3L, interval = "zblinearization") # Gini index estimation and confidence interval using: ## a: The method 2 for point estimation. ## b: The method 'pbootstrap' for variance estimation. ## c: The percentile bootstrap method for the confidence interval. fgini(y, w, interval = "pbootstrap")
# Income and weights (region 'Burgenland') from the 2006 Austrian EU-SILC (Package 'laeken'). data(eusilc, package="laeken") y <- eusilc$eqIncome[eusilc$db040 == "Burgenland"] w <- eusilc$rb050[eusilc$db040 == "Burgenland"] # Estimation of the Gini index using 'method = 2' . fgini(y, w) y <- c(30428.83, 14976.54, 18094.09, 29476.79, 20381.93, 6876.17, 10360.96, 8239.82, 29476.79, 32230.71) w <- c(357.86, 480.99, 480.99, 476.01, 498.58, 498.58, 476, 498.58, 476.01, 476.01) # Gini index estimation and confidence interval using: ## a: The method 2 for point estimation. ## b: The method 'zjackknife' for variance estimation. ## c: The Sen-Yates-Grundy type variance estimator. ## d: The Hajek approximation for the joint inclusion probabilities. fgini(y, w, interval = "zjackknife") # Gini index estimation and confidence interval using: ## a: The method 2 for point estimation. ## b: The method 'zalinearization' for variance estimation. ## c: The Sen-Yates-Grundy type variance estimator. ## d: The Hajek approximation for the joint inclusion probabilities. fgini(y, w, interval = "zalinearization") # Gini index estimation and confidence interval using: ## a: The method 3 for point estimation. ## b: The method 'zblinearization' for variance estimation. ## c: The Sen-Yates-Grundy type variance estimator. ## d: The Hajek approximation for the joint inclusion probabilities. fgini(y, w, method = 3L, interval = "zblinearization") # Gini index estimation and confidence interval using: ## a: The method 2 for point estimation. ## b: The method 'pbootstrap' for variance estimation. ## c: The percentile bootstrap method for the confidence interval. fgini(y, w, interval = "pbootstrap")
Estimates the Gini index in finite populations, using different methods.
fginindex( y, w, method = 2L, Pi = NULL, na.rm = TRUE, useRcpp = TRUE )
fginindex( y, w, method = 2L, Pi = NULL, na.rm = TRUE, useRcpp = TRUE )
y |
A vector with the non-negative real numbers to be used for estimating the Gini index. |
w |
A numeric vector with the survey weights to be used for estimating the Gini index. This argument can be missing if argument |
method |
An integer between 1 and 5 selecting one of the 5 methods detailed below for estimating the Gini index in finite populations. The default method is |
Pi |
A numeric vector with the (sample) first inclusion probabilites to be used for estimating the Gini index. This argument can be |
na.rm |
A 'TRUE/FALSE' logical value indicating whether |
useRcpp |
A 'TRUE/FALSE' logical value indicating whether |
For a sample , with size
and inclusion probabilities
(argument
Pi
), derived from a finite population , with size
, different formulations of the Gini index have been proposed in the literature. This function estimates the Gini index using various formulations, and both
R
and C++
codes are implemented. This can be useful for research purposes, and speed comparisons can be made. The different methods for estimating the Gini index are (see also Muñoz et al., 2023):
method = 1
(Langel and Tillé, 2013)
where ,
, and
are the survey weights. For example, the survey weights can be
.
w
or Pi
must be provided, but not both. It is required that , for
, when both
w
and Pi
are provided.
method = 2
(Alfons and Templ, 2012; Langel and Tillé, 2013)
where are the values
sorted in increasing order,
are the values
sorted according to the increasing order of the values
, and
. Langel and Tillé (2013) show that
.
method = 3
(Berger, 2008)
where
is the smooth (mid-point) distribution function, and is the indicator variable that takes the value 1 when its argument is true, and the value 0 otherwise. It can be seen that
.
method = 4
(Berger and Gedik-Balay, 2020)
where and
method = 5
(Lerman and Yitzhaki, 1989)
where
and .
A single numeric value between 0 and 1. The estimation of the Gini index.
Juan F Munoz [email protected]
Jose M Pavia [email protected]
Encarnacion Alvarez [email protected]
Alfons, A., and Templ, M. (2012). Estimation of social exclusion indicators from complex surveys: The R package laeken. KU Leuven, Faculty of Business and Economics Working Paper.
Berger, Y. G. (2008). A note on the asymptotic equivalence of jackknife and linearization variance estimation for the Gini Coefficient. Journal of Official Statistics, 24(4), 541-555.
Berger, Y. G., and Gedik-Balay, İ. (2020). Confidence intervals of Gini coefficient under unequal probability sampling. Journal of official statistics, 36(2), 237-249.
Langel, M., and Tillé, Y. (2013). Variance estimation of the Gini index: revisiting a result several times published. Journal of the Royal Statistical Society: Series A (Statistics in Society), 176(2), 521-540.
Lerman, R. I., and Yitzhaki, S. (1989). Improving the accuracy of estimates of Gini coefficients. Journal of econometrics, 42(1), 43-47.
Muñoz, J. F., Moya-Fernández, P. J., and Álvarez-Verdejo, E. (2023). Exploring and Correcting the Bias in the Estimation of the Gini Measure of Inequality. Sociological Methods & Research. https://doi.org/10.1177/00491241231176847
# Income and weights (region "Burgenland") from the 2006 Austrian EU-SILC (Package 'laeken'). data(eusilc, package="laeken") y <- eusilc$eqIncome[eusilc$db040 == "Burgenland"] w <- eusilc$rb050[eusilc$db040 == "Burgenland"] #Comparing the computation time for the various estimation methods and using R microbenchmark::microbenchmark( fginindex(y, w, method = 1L, useRcpp = FALSE), fginindex(y, w, method = 2L, useRcpp = FALSE), fginindex(y, w, method = 3L, useRcpp = FALSE), fginindex(y, w, method = 4L, useRcpp = FALSE), fginindex(y, w, method = 5L, useRcpp = FALSE) ) # Comparing the computation time for the various estimation methods and using Rcpp microbenchmark::microbenchmark( fginindex(y, w, method = 1L), fginindex(y, w, method = 2L), fginindex(y, w, method = 3L), fginindex(y, w, method = 4L), fginindex(y, w, method = 5L) ) # Estimation of the Gini index using 'method = 4'. y <- c(30428.83, 14976.54, 18094.09, 29476.79, 20381.93, 6876.17, 10360.96, 8239.82, 29476.79, 32230.71) w <- c(357.86, 480.99, 480.99, 476.01, 498.58, 498.58, 476, 498.58, 476.01, 476.01) fginindex(y, w, method = 4L)
# Income and weights (region "Burgenland") from the 2006 Austrian EU-SILC (Package 'laeken'). data(eusilc, package="laeken") y <- eusilc$eqIncome[eusilc$db040 == "Burgenland"] w <- eusilc$rb050[eusilc$db040 == "Burgenland"] #Comparing the computation time for the various estimation methods and using R microbenchmark::microbenchmark( fginindex(y, w, method = 1L, useRcpp = FALSE), fginindex(y, w, method = 2L, useRcpp = FALSE), fginindex(y, w, method = 3L, useRcpp = FALSE), fginindex(y, w, method = 4L, useRcpp = FALSE), fginindex(y, w, method = 5L, useRcpp = FALSE) ) # Comparing the computation time for the various estimation methods and using Rcpp microbenchmark::microbenchmark( fginindex(y, w, method = 1L), fginindex(y, w, method = 2L), fginindex(y, w, method = 3L), fginindex(y, w, method = 4L), fginindex(y, w, method = 5L) ) # Estimation of the Gini index using 'method = 4'. y <- c(30428.83, 14976.54, 18094.09, 29476.79, 20381.93, 6876.17, 10360.96, 8239.82, 29476.79, 32230.71) w <- c(357.86, 480.99, 480.99, 476.01, 498.58, 498.58, 476, 498.58, 476.01, 476.01) fginindex(y, w, method = 4L)
Calculates the Gini index for the Beta distribution with shape parameters (
shape1
) and (
shape2
).
gbeta(shape1, shape2)
gbeta(shape1, shape2)
shape1 |
A positive real number specifying the shape1 parameter |
shape2 |
A positive real number specifying the shape2 parameter |
The Beta distribution with shape parameters (argument
shape1
) and (argument
shape2
) and denoted as , where
and
, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022)
and a cumulative distribution function given by
where ,
is the beta function,
is the gamma function, and
is the incomplete beta function.
The Gini index can be computed as
A numeric value with the Gini index. A NA
is returned when a shape parameter is non-numeric or non-positive.
Juan F Munoz [email protected]
Jose M Pavia [email protected]
Encarnacion Alvarez [email protected]
Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.
Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995). Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.
Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.
gf
, gunif
, gweibull
, ggamma
, gchisq
# Gini index for the Beta distribution with shape parameters 'a = 2' and 'b = 1'. gbeta(shape1 = 2, shape2 = 1) # Gini index for the Beta distribution with shape parameters 'a = 1' and 'b = 2'. gbeta(shape1 = 1, shape2 = 2)
# Gini index for the Beta distribution with shape parameters 'a = 2' and 'b = 1'. gbeta(shape1 = 2, shape2 = 1) # Gini index for the Beta distribution with shape parameters 'a = 1' and 'b = 2'. gbeta(shape1 = 1, shape2 = 2)
Calculates the Gini index for the Burr Type XII (Singh-Maddala) distribution with scale
parameter and shape parameters
(
shape.g
) and (
shape.s
).
gburr( scale = 1, shape.g = 1, shape.s = 1 )
gburr( scale = 1, shape.g = 1, shape.s = 1 )
scale |
A positive real number specifying the scale parameter |
shape.g |
A positive real number specifying the shape parameter |
shape.s |
A positive real number specifying the shape parameter |
The Burr Type XII (Singh-Maddala) distribution with scale
parameter , shape parameters
(argument
shape.g
) and (argument
shape.s
) and denoted as , where
,
and
, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Rodriguez, 1977; Yee, 2022)
and a cumulative distribution function given by
where .
The Gini index can be computed as
where
is the quantile function of the Burr Type XII (Singh-Maddala) distribution, and
is the expectation of the distribution. The Burr Type XII (Singh-Maddala) distribution is related to the Pareto (IV) distribution:
.
A numeric value with the Gini index. A NA
is returned when any of the parameter is non-numeric or non-positive.
Juan F Munoz [email protected]
Jose M Pavia [email protected]
Encarnacion Alvarez [email protected]
Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.
Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995). Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.
Rodriguez, R. N. (1977). A guide to the Burr type XII distributions. Biometrika, 64(1), 129-134.
Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.
gparetoIV
, gpareto
, gparetoI
, gparetoII
, gparetoIII
, gfisk
# Gini index for the Burr Type XII distribution with 'scale = 1', 'shape.g = 2', 'shape.s = 1'. gburr(scale = 1, shape.g = 2, shape.s = 1) # Gini index for the Burr Type XII distribution with 'scale = 1', 'shape.g = 5', 'shape.s = 3'. gburr(scale = 1, shape.g = 5, shape.s = 3)
# Gini index for the Burr Type XII distribution with 'scale = 1', 'shape.g = 2', 'shape.s = 1'. gburr(scale = 1, shape.g = 2, shape.s = 1) # Gini index for the Burr Type XII distribution with 'scale = 1', 'shape.g = 5', 'shape.s = 3'. gburr(scale = 1, shape.g = 5, shape.s = 3)
Calculates Gini indices for the Chi-Squared distribution with degrees of freedom (
df
).
gchisq(df)
gchisq(df)
df |
A vector of positive real numbers specifying degrees of freedom of the Chi-Squared distribution. |
The Chi-Squared distribution with degrees of freedom (argument
df
) and denoted as , where
, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995)
and a cumulative distribution function given by
where , the gamma function is defined by
and the lower incomplete gamma function is given by
The Gini index can be computed as
The Chi-Squared distribution is related to the Gamma distribution: .
A numeric vector with the Gini indices. A NA
is returned when degrees of freedom are non-numeric or non-positive.
Juan F Munoz [email protected]
Jose M Pavia [email protected]
Encarnacion Alvarez [email protected]
Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.
Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.
# Gini index for the Chi-Squared distribution with degrees of freedom equal to 2. gchisq(df = 2) # Gini indices for the Chi-Squared distribution and different degrees of freedom. gchisq(df = 5:10)
# Gini index for the Chi-Squared distribution with degrees of freedom equal to 2. gchisq(df = 2) # Gini indices for the Chi-Squared distribution and different degrees of freedom. gchisq(df = 5:10)
Calculates the Gini index for the Dagum distribution with shape parameters (
shape1.a
) and (
shape2.p
).
gdagum(shape1.a, shape2.p)
gdagum(shape1.a, shape2.p)
shape1.a |
A positive real number specifying the shape1 parameter |
shape2.p |
A positive real number specifying the shape parameter |
The Dagum distribution with scale parameter , shape parameters
(argument
shape1.a
) and (argument
shape2.p
) and denoted as , where
,
and
,
has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Rodriguez, 1977; Yee, 2022)
and a cumulative distribution function given by
where .
The Gini index can be computed as
where the gamma function is defined as
The Dagum distribution is also known the Burr III, inverse Burr, beta-K, or 3-parameter kappa distribution. The Dagum distribution is related to the Fisk (Log Logistic) distribution: . The Dagum distribution is also related to the inverse Lomax distribution and the inverse paralogistic distribution (see Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022).
A numeric value with the Gini index. A NA
is returned when a shape parameter is non-numeric or non-positive.
The Gini index of the Dagum distribution does not depend on its scale parameter.
Juan F Munoz [email protected]
Jose M Pavia [email protected]
Encarnacion Alvarez [email protected]
Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.
Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.
Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.
gburr
, gpareto
, gfisk
, ggompertz
, gfrechet
# Gini index for the Dagum distribution with shape parameters 'a = 2' and 'p = 20'. gdagum(shape1.a = 2, shape2.p = 20)
# Gini index for the Dagum distribution with shape parameters 'a = 2' and 'p = 20'. gdagum(shape1.a = 2, shape2.p = 20)
Calculates the Gini index for the F distribution with degrees of freedom (
df1
) and (
df2
).
gf(df1, df2)
gf(df1, df2)
df1 |
A positive real number specifying the degrees of freedom |
df2 |
A positive real number higher or equal than two specifying the degrees of freedom |
The F distribution with (argument
df1
) and (argument
df2
) degrees of freedom and denoted as , where
and
, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995)
and a cumulative distribution function given by
where ,
is the gamma function,
is the regularized incomplete beta function,
is the beta function, and
is the incomplete beta function.
The Gini index, for , can be computed as
where is the quantile function of the F distribution.
A numeric value with the Gini index. A NA
is returned when degrees of freedom are non-numeric or or
.
Juan F Munoz [email protected]
Jose M Pavia [email protected]
Encarnacion Alvarez [email protected]
Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.
Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.
gchisq
, ggamma
, ggompertz
, glnorm
# Gini index for the F distribution with 'df1 = 10' and 'df2 = 20' degrees of freedom. gf(df1 = 10, df2 = 20)
# Gini index for the F distribution with 'df1 = 10' and 'df2 = 20' degrees of freedom. gf(df1 = 10, df2 = 20)
Calculates the Gini indices for the Fisk (Log Logistic) distribution with shape parameters (
shape1.a
).
gfisk(shape1.a)
gfisk(shape1.a)
shape1.a |
A vector of positive real numbers specifying shape parameters |
The Fisk (Log Logistic) distribution with scale parameter , shape parameter
(argument
shape1.a
) and denoted as , where
and
, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022)
and a cumulative distribution function given by
where .
The Gini index can be computed as
The Fisk (Log Logistic) distribution is related to the Dagum distribution: .
A numeric vector with the Gini indices. A NA
is returned when a shape parameter is non-numeric or non-positive.
The Gini index of the Fisk (Log Logistic) distribution does not depend on its scale parameter.
Juan F Munoz [email protected]
Jose M Pavia [email protected]
Encarnacion Alvarez [email protected]
Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.
Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.
Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.
gdagum
, gburr
, gpareto
, ggompertz
# Gini index for the Fisk distribution with a shape parameter 'a = 2'. gfisk(shape1.a = 2) # Gini indices for the Fisk distribution and different shape parameters. gfisk(shape1.a = 1:10)
# Gini index for the Fisk distribution with a shape parameter 'a = 2'. gfisk(shape1.a = 2) # Gini indices for the Fisk distribution and different shape parameters. gfisk(shape1.a = 1:10)
Calculates the Gini indices for the Frechet distribution with shape
parameters .
gfrechet(shape)
gfrechet(shape)
shape |
A vector of positive real numbers higher or equal than 1 specifying shape parameters |
The Frechet distribution with location parameter , scale parameter
,
shape
parameter and denoted as
, where
,
and
, has a
probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995)
and a cumulative distribution function given by
where .
The Gini index, for , can be computed as
A numeric vector with the Gini indices. A NA
is returned when a shape parameter is non-numeric or smaller than 1.
The Gini index of the Frechet distribution does not depend on its location and scale parameters and only is defined when its shape parameter is at least 1.
Juan F Munoz [email protected]
Jose M Pavia [email protected]
Encarnacion Alvarez [email protected]
Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.
Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.
gdagum
, gburr
, gfisk
, gpareto
, ggompertz
# Gini index for the Frechet distribution with a shape parameter 's = 1'. gfrechet(shape = 1) # Gini indices for the Frechet distribution and different shape parameters. gfrechet(shape = 1:10)
# Gini index for the Frechet distribution with a shape parameter 's = 1'. gfrechet(shape = 1) # Gini indices for the Frechet distribution and different shape parameters. gfrechet(shape = 1:10)
Calculates the Gini indices for the Gamma distribution with shape
parameters .
ggamma(shape)
ggamma(shape)
shape |
A vector of positive real numbers specifying the shape parameters |
The Gamma distribution with shape
parameter , scale parameter
and denoted as
, where
and
, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995)
and a cumulative distribution function given by
where , the gamma function is defined by
and the lower incomplete gamma function is given by
The Gini index can be computed as
The Gamma distribution is related to the Chi-squared distribution: .
A numeric vector with the Gini indices. A NA
is returned when a shape parameter is non-numeric or non-positive.
The Gini index of the Gamma distribution does not depend on its scale parameter.
Juan F Munoz [email protected]
Jose M Pavia [email protected]
Encarnacion Alvarez [email protected]
Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.
Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.
gchisq
, gf
, gbeta
, gweibull
, glnorm
# Gini index for the Gamma distribution with 'shape = 1'. ggamma(shape = 1) # Gini indices for the Gamma distribution and different shape parameters. ggamma(shape = 1:10)
# Gini index for the Gamma distribution with 'shape = 1'. ggamma(shape = 1) # Gini indices for the Gamma distribution and different shape parameters. ggamma(shape = 1:10)
Calculate the Gini index for the Gompertz distribution with scale
parameter and
shape
parameter .
ggompertz( scale = 1, shape )
ggompertz( scale = 1, shape )
scale |
A positive real number specifying the scale parameter |
shape |
A positive real number specifying the shape parameter |
The Gompertz distribution with scale
parameter ,
shape
parameter and denoted as
, where
and
, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Rodriguez, 1977; Yee, 2022)
and a cumulative distribution function given by
where .
The Gini index can be computed as
where is the quantile function of the Gompertz distribution, and
is the expectation of the distribution. If
scale
is not specified it assumes the default value of 1.
A numeric value with the Gini index. A NA
is returned when a parameter is non-numeric or non-positive.
Juan F Munoz [email protected]
Jose M Pavia [email protected]
Encarnacion Alvarez [email protected]
Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.
Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.
Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.
ggamma
, gbeta
, gchisq
, gpareto
# Gini index for the Gompertz distribution with 'scale = 1' and 'shape = 3'. ggompertz(scale = 1, shape = 3)
# Gini index for the Gompertz distribution with 'scale = 1' and 'shape = 3'. ggompertz(scale = 1, shape = 3)
Calculates the Gini indices for the Log Normal distribution with standard deviations (
sdlog
).
glnorm(sdlog)
glnorm(sdlog)
sdlog |
A vector of positive real numbers specifying standard deviations |
The Log Normal distribution with mean , standard deviation
on the log scale (argument
sdlog
) and denoted as , has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995)
and a cumulative distribution function given by
where and
is the cumulative distribution function of a standard Normal distribution.
The Gini index can be computed as
A numeric vector with the Gini indices. A NA
is returned when a standard deviation is non-numeric or non-positive.
The Gini index of the logNormal distribution does not depend on the mean parameter.
Juan F Munoz [email protected]
Jose M Pavia [email protected]
Encarnacion Alvarez [email protected]
Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.
Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.
ggamma
, gpareto
, gchisq
, gweibull
# Gini index for the Log Normal distribution with standard deviation 'sdlog = 2'. glnorm(sdlog = 2) # Gini indices for the Log Normal distribution with different standard deviations. glnorm(sdlog = c(0.2, 0.5, 1:3))
# Gini index for the Log Normal distribution with standard deviation 'sdlog = 2'. glnorm(sdlog = 2) # Gini indices for the Log Normal distribution with different standard deviations. glnorm(sdlog = c(0.2, 0.5, 1:3))
Calculates the Gini indices for the Pareto distribution with shape
parameters .
gpareto(shape)
gpareto(shape)
shape |
A vector of positive real numbers specifying shape parameters |
The Pareto distribution with scale parameter ,
shape
parameter and denoted as
, where
and
, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022)
and a cumulative distribution function given by
where .
The Gini index can be computed as
A numeric vector with the Gini indices. A NA
is returned when a shape parameter is non-numeric or non-positive.
The Gini index of the Pareto distribution does not depend on the shape parameter.
Juan F Munoz [email protected]
Jose M Pavia [email protected]
Encarnacion Alvarez [email protected]
Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.
Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.
Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.
gparetoI
, gparetoII
, gparetoIII
, gparetoIV
, gdagum
, gburr
, gfisk
# Gini index for the Pareto distribution with 'shape = 2'. gpareto(shape = 2) # Gini indices for the Pareto distribution and different shape parameters. gpareto(shape = 1:5)
# Gini index for the Pareto distribution with 'shape = 2'. gpareto(shape = 2) # Gini indices for the Pareto distribution and different shape parameters. gpareto(shape = 1:5)
Calculate the Gini index for the Pareto (I) distribution with scale
parameter and
shape
parameter .
gparetoI( scale = 1, shape = 1 )
gparetoI( scale = 1, shape = 1 )
scale |
A positive real number specifying the scale parameter |
shape |
A positive real number specifying the shape parameter |
The Pareto (I) distribution with scale
parameter ,
shape
parameter s
and denoted as ParetoI(b,s)
, where and
, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022)
and a cumulative distribution function given by
where .
The Gini index can be computed as
where is the quantile function of the Pareto (I) distribution, and
is the expectation of the distribution. If
scale
or shape
are not specified they assume the default value of 1. The Pareto (I) distribution is related to the Pareto (IV) distribution:
A numeric value with the Gini index. A NA
is returned when a parameter is non-numeric or non-positive.
Juan F Munoz [email protected]
Jose M Pavia [email protected]
Encarnacion Alvarez [email protected]
Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.
Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.
Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.
gpareto
, gparetoII
, gparetoIII
, gparetoIV
, gdagum
, gburr
, gfisk
# Gini index for the Pareto (I) distribution with scale 'b = 1' and shape 's = 3'. gparetoI(scale = 1, shape = 3)
# Gini index for the Pareto (I) distribution with scale 'b = 1' and shape 's = 3'. gparetoI(scale = 1, shape = 3)
Calculates the Gini index for the Pareto (II) distribution with location
parameter ,
scale
parameter and
shape
parameter .
gparetoII( location = 0, scale = 1, shape = 1 )
gparetoII( location = 0, scale = 1, shape = 1 )
location |
A positive real number specifying the location parameter |
scale |
A positive real number specifying the scale parameter |
shape |
A positive real number specifying the shape parameter |
The Pareto (II) distribution with location
parameter ,
scale
parameter ,
shape
parameter and denoted as
, where
,
and
, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022)
and a cumulative distribution function given by
where .
The Gini index can be computed as
where is the quantile function of the Pareto (II) distribution, and
is the expectation of the distribution. If
location
is not specified it assumes the default value of 0, and scale
and shape
assume the default value of 1. The Pareto (II) distribution is related to the Pareto (IV) distribution: .
A numeric value with the Gini index. A NA
is returned when a parameter is non-numeric or positive, except the location parameter that can be equal to 0.
Juan F Munoz [email protected]
Jose M Pavia [email protected]
Encarnacion Alvarez [email protected]
Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.
Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.
Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.
gpareto
, gparetoI
, gparetoIII
, gparetoIV
, gdagum
, gburr
, gfisk
# Gini index for the Pareto (II) distribution with parameters 'a = 1', 'b = 1' and 's = 3'. gparetoII(location = 1, scale = 1, shape = 3)
# Gini index for the Pareto (II) distribution with parameters 'a = 1', 'b = 1' and 's = 3'. gparetoII(location = 1, scale = 1, shape = 3)
Calculate the Gini index for the Pareto (III) distribution with inequality
parameters .
gparetoIII( inequality = 1 )
gparetoIII( inequality = 1 )
inequality |
A vector of positive numbers in the |
The Pareto (III) distribution with location parameter , scale parameter
,
inequality
parameter g
and denoted as , where
,
, and
, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022)
and a cumulative distribution function given by
where .
The Gini index is
If inequality
is not specified it assumes the default value of 1. The Pareto (III) distribution is related to the Pareto (IV) distribution: .
A numeric vector with the Gini indices. A NA
is returned when a inequality parameter is non-numeric or it is out of the interval .
The Gini index of the Pareto (III) distribution does not depend on its location and scale parameters.
Juan F Munoz [email protected]
Jose M Pavia [email protected]
Encarnacion Alvarez [email protected]
Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.
Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.
Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.
gpareto
, gparetoI
, gparetoII
, gparetoIV
, gdagum
, gburr
, gfisk
# Gini index for the Pareto (III) distribution with inequality parameter 'g = 0.3'. gparetoIII(inequality = 0.3) # Gini indices for the Pareto (III) distribution with different inequality parameters. gparetoIII(inequality = seq(0.1, 0.9, by=0.1))
# Gini index for the Pareto (III) distribution with inequality parameter 'g = 0.3'. gparetoIII(inequality = 0.3) # Gini indices for the Pareto (III) distribution with different inequality parameters. gparetoIII(inequality = seq(0.1, 0.9, by=0.1))
Calculates the Gini index for the Pareto (IV) distribution with location
parameter ,
scale
parameter ,
inequality
parameter and
shape
parameter .
gparetoIV( location = 0, scale = 1, inequality = 1, shape = 1 )
gparetoIV( location = 0, scale = 1, inequality = 1, shape = 1 )
location |
A non-negative real number specifying the location parameter |
scale |
A positive real number specifying the scale parameter |
inequality |
A positive real number specifying the inequality parameter |
shape |
A positive real number specifying the shape parameter |
The Pareto (IV) distribution with location
parameter ,
scale
parameter ,
inequality
parameter ,
shape
parameter and denoted as
ParetoIV(a,b,g,s)
, where ,
,
and
, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022)
and a cumulative distribution function given by
where .
The Gini index can be computed as
where is the quantile function of the Pareto (IV) distribution, and
is the expectation of the distribution. If
location
is not specified it assumes the default value of 0, and the remaining parameters assume the default value of 1. The Pareto (IV) distribution is related to:
1. The Burr distribution: .
2. The Pareto (I) distribution: .
3. The Pareto (II) distribution: .
4. The Pareto (III) distribution: .
A numeric value with the Gini index. A NA
is returned when a parameter is non-numeric or positive, except for the location parameter that can be equal to 0.
Juan F Munoz [email protected]
Jose M Pavia [email protected]
Encarnacion Alvarez [email protected]
Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.
Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.
Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.
gpareto
, gparetoI
, gparetoII
, gparetoIII
, gdagum
, gburr
, gfisk
# Gini index for the Pareto (IV) distribution with 'a = 1', 'b = 1', 'g = 0.5', 's = 1'. gparetoIV(location = 1, scale = 1, inequality = 0.5, shape = 1) # Gini index for the Pareto (IV) distribution with 'a = 1', 'b = 1', 'g = 2', 's = 3'. gparetoIV(location = 1, scale = 1, inequality = 2, shape = 3)
# Gini index for the Pareto (IV) distribution with 'a = 1', 'b = 1', 'g = 0.5', 's = 1'. gparetoIV(location = 1, scale = 1, inequality = 0.5, shape = 1) # Gini index for the Pareto (IV) distribution with 'a = 1', 'b = 1', 'g = 2', 's = 3'. gparetoIV(location = 1, scale = 1, inequality = 2, shape = 3)
Draws samples from a continuous probability distribution with Gini indices set by the user.
gsample( n, gini, distribution = c("pareto", "dagum", "lognormal", "fisk", "weibull", "gamma", "chisq", "frechet"), scale = 1, meanlog = 0, shape2.p = 1, location = 0 )
gsample( n, gini, distribution = c("pareto", "dagum", "lognormal", "fisk", "weibull", "gamma", "chisq", "frechet"), scale = 1, meanlog = 0, shape2.p = 1, location = 0 )
n |
An integer specifying the sample(s) size. |
gini |
A numeric vector of values between 0 and 1, indicating the Gini indices for the continuous distribution from which samples are generated. |
distribution |
A character string specifying the continuous probability distribution to be used to generate the sample. Possible values are |
scale |
The scale parameter for the Pareto, Dagum, Fisk, Weibull, Gamma and Frechet distributions. The default value is |
meanlog |
The mean for the logNormal distribution on the log scale. The default value is |
shape2.p |
The scale parameter |
location |
The location parameter for the Frechet distribution. The default value is |
For each continuous probability distribution, parameters involved in the theoretical formulation of the Gini index () are selected such that
takes the values set in the argument
gini
. Additional parameters required in the distribution can be set by the user, and default values are provided. scale
is the scale parameter for the Pareto, Dagum, Fisk, Weibull, Gamma and Frechet distributions, meanlog
is the mean for the Lognormal distribution on the log scale, shape2.p
is the scale parameter p
for the Dagum distribution, and location
is the location parameter for the Frechet distribution. Additional information for the continuous probability distributions used by this function can be seen in Kleiber and Kotz (2003), Johnson et al. (1995) and Yee (2022).
A numeric vector (or matrix of order
size(
)) with the samples by columns extracted from the continuous probability distribution stated in
distribution
and the Gini indices corresponding to the vector gini
.
Underestimation problems may appear for large heavy-tailed distributions (Pareto, Dagum, Lognormal, Fisk and Frechet) and large values of gini
. A larger sample size may solve/minimize this problem.
Juan F Munoz [email protected]
Jose M Pavia [email protected]
Encarnacion Alvarez [email protected]
Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.
Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.
Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.
gpareto
, gdagum
, glnorm
, gfisk
, gweibull
, ggamma
, gchisq
, gfrechet
# Sample from the Pareto distribution and parameter selected such that the Gini index is 0.3. gsample(n = 10, gini = 0.3, "pareto") # Samples from the Pareto distribution and gini indices 0.2 and 0.5. gsample(n = 10, gini = c(0.2,0.5), "par", scale = 2) # Samples from the Lognormal distribution and gini indices 0.2 and 0.5. gsample(n = 10, gini = c(0.2,0.5), "lognormal", meanlog = 5) # Samples from the Dagum distribution and gini indices 0.2 and 0.5. gsample(n = 10, gini = c(0.2,0.5), "dagum") # Samples from the Fisk (Log-logistic) distribution and gini indices 0.3 and 0.6. gsample(n = 10, gini = c(0.3,0.6), "fisk") # Sample from the Weibull distribution and parameter selected such that the Gini index is 0.2. gsample(n = 10, gini = 0.2, "weibull") # Sample from the Gamma distribution and parameter selected such that the Gini index is 0.3. gsample(n = 10, gini = 0.2, "gamma") # Samples from the Chi-Squared distribution and gini indices 0.3 and 0.6.. gsample(n = 10, gini = c(0.3,0.6), "chi") # Samples from the Frechet distribution and gini indices 0.3 and 0.6. gsample(n = 10, gini = c(0.3,0.6), "fre")
# Sample from the Pareto distribution and parameter selected such that the Gini index is 0.3. gsample(n = 10, gini = 0.3, "pareto") # Samples from the Pareto distribution and gini indices 0.2 and 0.5. gsample(n = 10, gini = c(0.2,0.5), "par", scale = 2) # Samples from the Lognormal distribution and gini indices 0.2 and 0.5. gsample(n = 10, gini = c(0.2,0.5), "lognormal", meanlog = 5) # Samples from the Dagum distribution and gini indices 0.2 and 0.5. gsample(n = 10, gini = c(0.2,0.5), "dagum") # Samples from the Fisk (Log-logistic) distribution and gini indices 0.3 and 0.6. gsample(n = 10, gini = c(0.3,0.6), "fisk") # Sample from the Weibull distribution and parameter selected such that the Gini index is 0.2. gsample(n = 10, gini = 0.2, "weibull") # Sample from the Gamma distribution and parameter selected such that the Gini index is 0.3. gsample(n = 10, gini = 0.2, "gamma") # Samples from the Chi-Squared distribution and gini indices 0.3 and 0.6.. gsample(n = 10, gini = c(0.3,0.6), "chi") # Samples from the Frechet distribution and gini indices 0.3 and 0.6. gsample(n = 10, gini = c(0.3,0.6), "fre")
Calculates the Gini index for the Uniform distribution with lower limit min
and upper limit max
.
gunif( min = 0, max = 1 )
gunif( min = 0, max = 1 )
min |
A non-negative real number specifying the lower limit of the Uniform distribution. The default value is |
max |
A positive real number higher than |
The Uniform distribution with lower and upper limits and
, and denoted as
, where
,
,
and both must be finite, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022)
where . The cumulative distribution function is given by
The Gini index can be computed as
If min
or max
are not specified they assume the default values of 0 and 1, respectively.
A numeric value with the Gini index. A NA
value is returned when a limit is non-numeric or non-negative, or .
Juan F Munoz [email protected]
Jose M Pavia [email protected]
Encarnacion Alvarez [email protected]
Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.
Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.
Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.
# Gini index for the Uniform distribution with lower limit 0 and upper limit 1. gunif() # Gini index for the Uniform distribution with lower limit 10 and upper limit 190. gunif(min = 10, max = 190)
# Gini index for the Uniform distribution with lower limit 0 and upper limit 1. gunif() # Gini index for the Uniform distribution with lower limit 10 and upper limit 190. gunif(min = 10, max = 190)
Calculate the Gini indices for the Weibull distribution with shape
parameters .
gweibull(shape)
gweibull(shape)
shape |
A vector of positive real numbers specifying shape parameters |
The Weibull distribution with scale parameter ,
shape
parameter , and denoted as
, where
and
, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022)
and a cumulative distribution function given by
where .
The Gini index can be computed as
A numeric vector with the Gini indices. A NA
is returned when a shape parameter is non-numeric or non-positive.
The Gini index of the Weibull distribution does not depend on its scale parameter.
Juan F Munoz [email protected]
Jose M Pavia [email protected]
Encarnacion Alvarez [email protected]
Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.
Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.
# Gini index for the Weibull distribution with 'shape = 1'. gweibull(shape = 1) # Gini indices for the Weibull distribution and different shape parameters. gweibull(shape = 1:10)
# Gini index for the Weibull distribution with 'shape = 1'. gweibull(shape = 1) # Gini indices for the Weibull distribution and different shape parameters. gweibull(shape = 1:10)
Compares variance estimates and confidence intervals for the Gini index in infinite populations.
icompareCI( y, B = 1000L, alpha = 0.05, plotCI = TRUE, digitsgini = 2L, digitsvar = 4L, cum.sums = NULL, na.rm = TRUE, precisionEL = 1e-4, maxiterEL = 100L, line.types = c(1L, 2L), colors = c("red", "green"), save.plot = FALSE )
icompareCI( y, B = 1000L, alpha = 0.05, plotCI = TRUE, digitsgini = 2L, digitsvar = 4L, cum.sums = NULL, na.rm = TRUE, precisionEL = 1e-4, maxiterEL = 100L, line.types = c(1L, 2L), colors = c("red", "green"), save.plot = FALSE )
y |
A vector with the non-negative real numbers to be used for estimating the Gini index. This argument can be missing if argument |
B |
A single integer specifying the number of bootstrap replicates. The default value is |
alpha |
A single numeric value between 0 and 1 specifying the confidence level 1- |
plotCI |
A 'TRUE/FALSE' logical value indicating whether confidence intervals are compared using a plot. The default value is |
digitsgini |
A single integer specifying the number of decimals used in the estimation of the Gini index and confidence intervals. The default value is |
digitsvar |
A single integer specifying the number of decimals used in the variance estimation of the Gini index. The default value is |
cum.sums |
A numeric vector of non-negative real numbers specifying the cumulative sums of the variable used to estimate the Gini index. This argument can be |
na.rm |
A 'TRUE/FALSE' logical value indicating whether the |
precisionEL |
A single numeric value specifying the precision for the confidence interval based on the empirical likelihood method. The default value is |
maxiterEL |
A single integer specifying the maximum number of iterations allowed for the convergence in the empirical likelihood method. The default value is |
line.types |
A numeric vector with length equal 2 specifying the line types. See the function |
colors |
A numeric vector with length equal 2 specifying the colors for lines of the plot. The default value is |
save.plot |
A 'TRUE/FALSE' logical value indicating whether the ggplot object of the plot comparing the confidence intervals should be saved in the output. The default value is |
For a sample , with size
, derived from an infinite population, the Gini index is estimated by two different versions (see Muñoz et al., 2023 for more details):
where the label indicates that the bias correction is applied. The table below sumarises the various types of variances and confidence intervals that computes this function.
Methods based on the jackknife technique use the fast algorithm suggested by Ogwang (2000). The linearization technique for variance estimation (Deville, 1999) has been applied to the following estimators of the Gini index (Berger, 2008; Langel and Tille, 2013):
and
where
zalinearization
and zblinearization
linearizate, respectively, the estimators and
. The percentile bootstrap (see Qin et al., 2010) is computed using
pbootstrap
. Bca
is the bias corrected bootstrap confidence interval (Efron and Tibshirani, 1993). ELchisq
and ELboot
are the confidence intervals based on the empirical likelihood method.
The vignette vignette("GiniVarInterval")
contains a detailed description of the various methods for variance estimation and confidence intervals for the Gini index.
Interval | Variance | Critical values | References |
_______________ | ____________ | __________________ | __________________________ |
zjackknife |
Jackknife | Normal | Berger (2008) |
tjackknife |
Jackknife | Studentized bootstrap | Biewen (2002); Berger (2008) |
zalinearization |
Linearization | Normal | Langel and Tille (2013) |
zblinearization |
Linearization | Normal | Berger (2008) |
talinearization |
Linearization | Studentized bootstrap | Langel and Tille (2013) |
tblinearization |
Linearization | Studentized bootstrap | Biewen (2002); Berger (2008) |
pBootstrap |
Bootstrap | Percentile bootstrap | Qin et al. (2010) |
BCa |
Bootstrap | BCa bootstrap | Davison and Hinkley (1997) |
ELchisq |
Linearization | Chi-Squared | Qin et al. (2010) |
ELboot |
Bootstrap | Percentile bootstrap | Qin et al. (2010) |
If save.plot = FALSE
, a data frame with columns:
interval
. The method used to construct the confidence interval.
bc
. A 'TRUE/FALSE' logical value indicating whether the bias correction is applied.
gini
. The estimation of the Gini index.
lowerlimit
. The lower limit of the confidence interval.
upperlimit
. The upper limit of the confidence interval.
var.gini
. The variance estimation for the estimator of the Gini index.
If save.plot = TRUE
, a list with two components: (i) 'base.CI' a data frame of six columns as just described and (ii) 'plot' a (ggplot) description of the plot, which is a list with components that contain the plot itself, the data, information about the scales, panels, etc. As a side-effect, a plot that compares the various methods for constructing confidence intervals for the Gini index is displayed. **ggplot2** is needed to be installed for this option to work.
If plotCI = TRUE
, as a side-effect, a plot that compares the various methods for constructing confidence intervals for the Gini index is displayed. **ggplot2** is needed to be installed for this option to work.
Juan F Munoz [email protected]
Jose M Pavia [email protected]
Encarnacion Alvarez [email protected]
Berger, Y. G. (2008). A note on the asymptotic equivalence of jackknife and linearization variance estimation for the Gini Coefficient. Journal of Official Statistics, 24(4), 541-555.
Biewen, M. (2002). Bootstrap inference for inequality, mobility and poverty measurement. Journal of Econometrics, 108(2), 317-342.
Davison, A. C., and Hinkley, D. V. (1997). Bootstrap Methods and Their Application (Cambridge Series in Statistical and Probabilistic Mathematics, No 1)–Cambridge University Press.
Deville, J.C. (1999). Variance Estimation for Complex Statistics and Estimators: Linearization and Residual Techniques. Survey Methodology, 25, 193–203.
Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman and Hall, New York, London.
Langel, M., and Tille, Y. (2013). Variance estimation of the Gini index: revisiting a result several times published. Journal of the Royal Statistical Society: Series A (Statistics in Society), 176(2), 521-540.
Muñoz, J. F., Moya-Fernández, P. J., and Álvarez-Verdejo, E. (2023). Exploring and Correcting the Bias in the Estimation of the Gini Measure of Inequality. Sociological Methods & Research. https://doi.org/10.1177/00491241231176847
Ogwang, T. (2000). A convenient method of computing the Gini index and its standard error. Oxford Bulletin of Economics and Statistics, 62(1), 123-123.
Qin, Y., Rao, J. N. K., and Wu, C. (2010). Empirical likelihood confidence intervals for the Gini measure of income inequality. Economic Modelling, 27(6), 1429-1435.
# Sample, with size 50, from a Lognormal distribution. The true Gini index is 0.5. set.seed(123) y <- gsample(n = 50, gini = 0.5, distribution = "lognormal") # Estimation of the Gini index and confidence intervals using different methods. icompareCI(y)
# Sample, with size 50, from a Lognormal distribution. The true Gini index is 0.5. set.seed(123) y <- gsample(n = 50, gini = 0.5, distribution = "lognormal") # Estimation of the Gini index and confidence intervals using different methods. icompareCI(y)
Estimation of the Gini index and computation of variances and confidence interval for infinite populations.
igini( y, bias.correction = TRUE, interval = NULL, B = 1000L, alpha = 0.05, cum.sums = NULL, na.rm = TRUE, precisionEL = 1e-04, maxiterEL = 100L, large.sample = FALSE )
igini( y, bias.correction = TRUE, interval = NULL, B = 1000L, alpha = 0.05, cum.sums = NULL, na.rm = TRUE, precisionEL = 1e-04, maxiterEL = 100L, large.sample = FALSE )
y |
A vector with the non-negative real numbers to be used for estimating the Gini index. This argument can be missing if argument |
bias.correction |
A 'TRUE/FALSE' logical value indicating whether the bias correction should be applied to the estimation of the Gini index. The default value is |
interval |
A character string specifying the type of variance estimation and confidence interval to be used, or |
B |
A single integer specifying the number of bootstrap replicates. The default value is |
alpha |
A single numeric value between 0 and 1. If |
cum.sums |
A vector with the non-negative real numbers specifying the cumulative sums of the variable used to estimate the Gini index. This argument can be |
na.rm |
A 'TRUE/FALSE' logical value indicating whether |
precisionEL |
A single numeric value specifying the precision for the confidence interval based on the empirical likelihood method. The default value is |
maxiterEL |
A single integer specifying the maximal number of iterations allowed for the convergene of the empirical likelihood method. The default value is |
large.sample |
A 'TRUE/FALSE' logical value indicating whether the sample is large to apply a faster algorithm to sort the sample values. The default value is |
For a sample , with size
, derived from an infinite population, the Gini index is estimated by
when bias.correction = FALSE
, and by
when bias.correction = TRUE
. For more details, see Muñoz et al. (2023). The table below sumarises the various types of variances and confidence intervals that computes this function.
Methods based on the jackknife technique use the fast algorithm suggested by Ogwang (2000). The linearization technique for variance estimation (Deville, 1999) has been applied to the following estimators of the Gini index (Berger, 2008; Langel and Tille, 2013):
and
where
zalinearization
and zblinearization
linearizate, respectively, the estimators and
. The percentile bootstrap (see Qin et al., 2010) is computed using
pbootstrap
. Bca
is the bias corrected bootstrap confidence interval (Efron and Tibshirani, 1993). ELchisq
and ELboot
are the confidence intervals based on the empirical likelihood method.
The vignette vignette("GiniVarInterval")
contains a detailed description of the various methods for variance estimation and confidence intervals for the Gini index.
Interval | Variance | Critical values | References |
_______________ | ____________ | __________________ | __________________________ |
zjackknife |
Jackknife | Normal | Berger (2008) |
tjackknife |
Jackknife | Studentized bootstrap | Biewen (2002); Berger (2008) |
zalinearization |
Linearization | Normal | Langel and Tille (2013) |
zblinearization |
Linearization | Normal | Berger (2008) |
talinearization |
Linearization | Studentized bootstrap | Langel and Tille (2013) |
tblinearization |
Linearization | Studentized bootstrap | Biewen (2002); Berger (2008) |
pBootstrap |
Bootstrap | Percentile bootstrap | Qin et al. (2010) |
BCa |
Bootstrap | BCa bootstrap | Davison and Hinkley (1997) |
ELchisq |
Linearization | Chi-Squared | Qin et al. (2010) |
ELboot |
Bootstrap | Percentile bootstrap | Qin et al. (2010) |
When interval = NULL
, a single numeric value between 0 and 1, containing the estimation of the Gini index based on the vector y
or the vector cum.sums
.
When interval
is not NULL
, a list of 3 components: a single numeric value with the estimation of the Gini index; a single numeric value with the variance estimation of the Gini index; and a numeric matrix with 1 row and 2 columns containing the lower and upper limits of the confidence intervals for the Gini index.
Juan F Munoz [email protected]
Jose M Pavia [email protected]
Encarnacion Alvarez [email protected]
Berger, Y. G. (2008). A note on the asymptotic equivalence of jackknife and linearization variance estimation for the Gini Coefficient. Journal of Official Statistics, 24(4), 541-555.
Biewen, M. (2002). Bootstrap inference for inequality, mobility and poverty measurement. Journal of Econometrics, 108(2), 317-342.
Davison, A. C., and Hinkley, D. V. (1997). Bootstrap Methods and Their Application (Cambridge Series in Statistical and Probabilistic Mathematics, No 1)–Cambridge University Press.
Deville, J.C. (1999). Variance Estimation for Complex Statistics and Estimators: Linearization and Residual Techniques. Survey Methodology, 25, 193–203.
Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman and Hall, New York, London.
Langel, M., and Tille, Y. (2013). Variance estimation of the Gini index: revisiting a result several times published. Journal of the Royal Statistical Society: Series A (Statistics in Society), 176(2), 521-540.
Muñoz, J. F., Moya-Fernández, P. J., and Álvarez-Verdejo, E. (2023). Exploring and Correcting the Bias in the Estimation of the Gini Measure of Inequality. Sociological Methods & Research. https://doi.org/10.1177/00491241231176847
Ogwang, T. (2000). A convenient method of computing the Gini index and its standard error. Oxford Bulletin of Economics and Statistics, 62(1), 123-123.
Qin, Y., Rao, J. N. K., and Wu, C. (2010). Empirical likelihood confidence intervals for the Gini measure of income inequality. Economic Modelling, 27(6), 1429-1435.
# Sample, with size 50, from a Lognormal distribution. The true Gini index is 0.5. set.seed(123) y <- gsample(n = 50, gini = 0.5, distribution = "lognormal") # Bias corrected estimation of the Gini index. igini(y) # Estimation of the Gini index and confidence interval based on jackknife and studentized bootstrap. igini(y, interval = "tjackknife")
# Sample, with size 50, from a Lognormal distribution. The true Gini index is 0.5. set.seed(123) y <- gsample(n = 50, gini = 0.5, distribution = "lognormal") # Bias corrected estimation of the Gini index. igini(y) # Estimation of the Gini index and confidence interval based on jackknife and studentized bootstrap. igini(y, interval = "tjackknife")
Estimates the Gini index in infinite populations, using different methods.
iginindex( y, method = 5L, bias.correction = TRUE, cum.sums = NULL, na.rm = TRUE, useRcpp = TRUE )
iginindex( y, method = 5L, bias.correction = TRUE, cum.sums = NULL, na.rm = TRUE, useRcpp = TRUE )
y |
A vector with the non-negative real numbers to be used for estimating the Gini index. This argument can be missing if argument |
method |
An integer between 1 and 10 selecting one of the 10 methods detailed below for estimating the Gini index in infinite populations. The default method is |
bias.correction |
A 'TRUE/FALSE' logical value indicating whether the bias correction should be applied to the estimation of the Gini index. The default value is |
cum.sums |
A vector with the non-negative real numbers specifying the cumulative sums of the variable used to estimate the Gini index. This argument can be |
na.rm |
A 'TRUE/FALSE' logical value indicating whether |
useRcpp |
A 'TRUE/FALSE' logical value indicating whether |
For a sample , with size
, derived from an infinite population, different formulations of the Gini index have been proposed in the literature, but they only provide two different outputs.
This function estimates the Gini index using the various formulations, and both R
and C++
codes are implemented. This can be useful for research purposes, and speed comparisons can be made. The argument cum.sums
does not require that the cumulative sums are based on the non-decreasing order of the variable y
.
The different methods for estimating the Gini index are (see Wang et al., 2016; Giorgi and Gigliarano, 2017; Mukhopadhyay and Sengupta, 2021; Muñoz et al., 2023):
method = 1
where is the sample mean and the label
indicates that the bias correction is applied to the estimation of the Gini index.
method = 2
where
and , with
, are the cumulative sums
of the ordered values
(in non-decreasing order) of the variable of interest
.
method = 3
method = 4
where
method = 5
method = 6
method = 7
where
is the smooth (mid-point) distribution function.
method = 8
method = 9
method = 10
A single numeric value between 0 and 1 containing the estimation of the Gini index based on the vector y
or the vector cum.sums
.
Juan F Munoz [email protected]
Jose M Pavia [email protected]
Encarnacion Alvarez [email protected]
Giorgi, G. M., and Gigliarano, C. (2017). The Gini concentration index: a review of the inference literature. Journal of Economic Surveys, 31(4), 1130-1148.
Mukhopadhyay, N., and Sengupta, P. P. (Eds.). (2021). Gini inequality index: Methods and applications. CRC press.
Muñoz, J. F., Moya-Fernández, P. J., and Álvarez-Verdejo, E. (2023). Exploring and Correcting the Bias in the Estimation of the Gini Measure of Inequality. Sociological Methods & Research. https://doi.org/10.1177/00491241231176847
Wang, D., Zhao, Y., and Gilmore, D. W. (2016). Jackknife empirical likelihood confidence interval for the Gini index. Statistics & Probability Letters, 110, 289-295.
# Sample, with size 50, from a Lognormal distribution. The true Gini index is 0.5. set.seed(123) y <- gsample(n = 50, gini = 0.5, meanlog = 5) # Estimation of the Gini index using the method = 5, bias correction, and Rcpp. iginindex(y) # Estimation of the Gini index using the method = 5, bias correction, and R. iginindex(y, useRcpp = FALSE) #Comparing the computation time for the various estimation methods and using R microbenchmark::microbenchmark( iginindex(y, method = 1, useRcpp = FALSE), iginindex(y, method = 2, useRcpp = FALSE), iginindex(y, method = 3, useRcpp = FALSE), iginindex(y, method = 4, useRcpp = FALSE), iginindex(y, method = 5, useRcpp = FALSE), iginindex(y, method = 6, useRcpp = FALSE), iginindex(y, method = 7, useRcpp = FALSE), iginindex(y, method = 8, useRcpp = FALSE), iginindex(y, method = 9, useRcpp = FALSE), iginindex(y, method = 10, useRcpp = FALSE) ) # Comparing the computation time for the various estimation methods and using Rcpp microbenchmark::microbenchmark( iginindex(y, method = 1), iginindex(y, method = 2), iginindex(y, method = 3), iginindex(y, method = 4), iginindex(y, method = 5), iginindex(y, method = 6), iginindex(y, method = 7), iginindex(y, method = 8), iginindex(y, method = 9), iginindex(y, method = 10) )
# Sample, with size 50, from a Lognormal distribution. The true Gini index is 0.5. set.seed(123) y <- gsample(n = 50, gini = 0.5, meanlog = 5) # Estimation of the Gini index using the method = 5, bias correction, and Rcpp. iginindex(y) # Estimation of the Gini index using the method = 5, bias correction, and R. iginindex(y, useRcpp = FALSE) #Comparing the computation time for the various estimation methods and using R microbenchmark::microbenchmark( iginindex(y, method = 1, useRcpp = FALSE), iginindex(y, method = 2, useRcpp = FALSE), iginindex(y, method = 3, useRcpp = FALSE), iginindex(y, method = 4, useRcpp = FALSE), iginindex(y, method = 5, useRcpp = FALSE), iginindex(y, method = 6, useRcpp = FALSE), iginindex(y, method = 7, useRcpp = FALSE), iginindex(y, method = 8, useRcpp = FALSE), iginindex(y, method = 9, useRcpp = FALSE), iginindex(y, method = 10, useRcpp = FALSE) ) # Comparing the computation time for the various estimation methods and using Rcpp microbenchmark::microbenchmark( iginindex(y, method = 1), iginindex(y, method = 2), iginindex(y, method = 3), iginindex(y, method = 4), iginindex(y, method = 5), iginindex(y, method = 6), iginindex(y, method = 7), iginindex(y, method = 8), iginindex(y, method = 9), iginindex(y, method = 10) )