Package 'giniVarCI' reference manual

Title:	Gini Indices, Variances and Confidence Intervals for Finite and Infinite Populations
Description:	Estimates the Gini index and computes variances and confidence intervals for finite and infinite populations, using different methods; also computes Gini index for continuous probability distributions, draws samples from continuous probability distributions with Gini indices set by the user; uses 'Rcpp'. References: Muñoz et al. (2023) <doi:10.1177/00491241231176847>. Álvarez et al. (2021) <doi:10.3390/math9243252>. Giorgi and Gigliarano (2017) <doi:10.1111/joes.12185>. Langel and Tillé (2013) <doi:10.1111/j.1467-985X.2012.01048.x>.
Authors:	Juan Francisco Muñoz [aut, cre] , Jose M. Pavía [aut] , Encarnación Álvarez Verdejo [aut] , MCIN-AEI and ERDF. Reference PID2022-136235NB-I00 [fnd]
Maintainer:	Juan Francisco Muñoz <[email protected]>
License:	GPL
Version:	0.0.1-3
Built:	2025-03-05 04:03:40 UTC
Source:	https://github.com/cran/giniVarCI

Comparisons of variance estimates and confidence intervals for the Gini index in finite populations

Description

Compares variance estimates and confidence intervals for the Gini index in finite populations.

Usage

fcompareCI(
  y,
  w,
  Pi = NULL,
  Pij = NULL,
  PiU,
  alpha = 0.05,
  B = 1000L,
  digitsgini = 2L,
  digitsvar = 4L,
  na.rm = TRUE,
  plotCI = TRUE,
  line.types = c(1L, 2L, 4L),
  colors = c("red", "green", "blue"),
  shapes = c(8L, 4L, 3L),
  save.plot = FALSE,
  large.sample = FALSE)
fcompareCI(
  y,
  w,
  Pi = NULL,
  Pij = NULL,
  PiU,
  alpha = 0.05,
  B = 1000L,
  digitsgini = 2L,
  digitsvar = 4L,
  na.rm = TRUE,
  plotCI = TRUE,
  line.types = c(1L, 2L, 4L),
  colors = c("red", "green", "blue"),
  shapes = c(8L, 4L, 3L),
  save.plot = FALSE,
  large.sample = FALSE)

Arguments

`y`	A vector with the non-negative real numbers to be used for estimating the Gini index.
`w`	A numeric vector with the survey weights to be used for estimating the Gini index, the variance estimation and the confidence interval. This argument can be missing if argument `Pi` is provided.
`Pi`	A numeric vector with the (sample) first inclusion probabilites to be used for estimating the Gini index, the variance estimation and the confidence interval. This argument can be `NULL` if argument `w` is provided. The default value is `Pi = NULL`.
`Pij`	A numeric square matrix with the (sample) second (joint) inclusion probabilites to be used for the variance estimation and the confidence interval. The Hajek approximation is used when `Pij = NULL`. This argument is used by the intervals `"zjackknife"`, `"zalinearization"` and `"zblinearization"`. The default value is `Pij = NULL`.
`PiU`	A numeric vector with the (population) first inclusion probabilites. The Hartley-Rao (`HR`) expression for the variance estimation is also computed if this argument is provided.
`alpha`	A single numeric value between 0 and 1 specifying the confidence level 1-`alpha` to be used for computing the confidence interval for the Gini index. Some authors call `alpha` the significance level. The default value is `alpha = 0.05`.
`B`	A single integer specifying the number of bootstrap replicates. The default value is `B = 1000L`.
`digitsgini`	A single integer specifying the number of decimals used in the estimation of the Gini index and confidence intervals. The default value is `digitsgini = 2L`.
`digitsvar`	A single integer specifying the number of decimals used in the variance estimation of the Gini index. The default value is `digitsvar = 4L`.
`na.rm`	A 'TRUE/FALSE' logical value indicating whether `NA` values should be removed before the computation proceeds. The default value is `na.rm = TRUE`.
`plotCI`	A 'TRUE/FALSE' logical value indicating whether confidence intervals are compared using a plot. The default value is `plotCI = TRUE`.
`line.types`	A numeric vector of length 3 specifying the line types. See the function `plot` for the different line types. The default value is `line.types = c(1L, 2L, 4L)`.
`colors`	A vector of length 3 specifying the colors for lines of the plot. The default value is `colors = c("red", "green", "blue")`.
`shapes`	A numeric vector specifying the point shapes for the limits of intervals. If `PiU` is missing, the function uses the two first components of `shapes`, i.e., it must have at least length 2. If `PiU` is provided, `shapes` must have at least length 3. See the function `plot` for the different point shapes. The default value is `shapes = c(8L, 4L, 3L).`
`save.plot`	A 'TRUE/FALSE' logical value indicating whether the ggplot object of the plot comparing the confidence intervals should be saved in the output. The default value is `save.plot = FALSE`.
`large.sample`	A 'TRUE/FALSE' logical value indicating whether the sample is large to apply a faster algorithm to sort the sample values in the computation of the Gini index. The default value is `large.sample = FALSE`.

Details

For a sample $S$ , with size $n$ and inclusion probabilities $\pi_i=P(i\in S)$ (argument Pi), derived from a finite population $U$ , with size $N$ , different formulations of the Gini index have been proposed in the literature. This function estimates the Gini index, variances and confidence intervals using various formulations. The different methods for estimating the Gini index are (see also Muñoz et al., 2023):

\ Gini Index formulae.

Method 1 (Langel and Tillé, 2013)

$\widehat{G}_{w1}= \displaystyle \frac{1}{2\widehat{N}^{2}\overline{y}_{w}}\sum_{i \in S}\sum_{j \in S}w_{i}w_{j}|y_{i}-y_{j}|,$

where $\widehat{N}=\sum_{i \in S}w_i$ , $\overline{y}_{w}=\widehat{N}^{-1}\sum_{i \in S}w_{i}y_{i}$ , and $w_i$ are the survey weights. For example, the survey weights can be $w_i=\pi_{i}^{-1}$ . w or Pi must be provided, but not both. It is required that $w_i = \pi_i^{-1}$ , for $i \in S$ , when both w and Pi are provided.

Method 2 (Alfons and Templ, 2012; Langel and Tillé, 2013)

$\widehat{G}_{w2} =\displaystyle \frac{2\sum_{i \in S}w_{(i)}^{+}\widehat{N}_{(i)}y_{(i)} - \sum_{i \in S}w_{i}^{2}y_{i} }{\widehat{N}^{2}\overline{y}_{w}}-1,$

where $y_{(i)}$ are the values $y_i$ sorted in increasing order, $w_{(i)}^{+}$ are the values $w_i$ sorted according to the increasing order of the values $y_i$ , and $\widehat{N}_{(i)}=\sum_{j=1}^{i}w_{(j)}^{+}$ . Langel and Tillé (2013) show that $\widehat{G}_{w1} = \widehat{G}_{w2}$ , so the computation of $\widehat{G}_{w1}$ is ommited in results.

Method 3 (Berger, 2008)

$\widehat{G}_{w3} = \displaystyle \frac{2}{\widehat{N}\overline{y}_{w}}\sum_{i \in S}w_{i}y_{i}\widehat{F}_{w}^{\ast}(y_{i})-1,$

where

$\widehat{F}_{w}^{\ast}(t) = \displaystyle \frac{1}{\widehat{N}}\sum_{i \in S}w_{i}[\delta(y_i < t) + 0.5\delta(y_i = t)]$

is the smooth (mid-point) distribution function, and $\delta(\cdot)$ is the indicator variable that takes the value 1 when its argument is true, and 0 otherwise. It can be seen that $\widehat{G}_{w2} = \widehat{G}_{w3}$ , so the computation of $\widehat{G}_{w3}$ is ommited in results.

Method 4 (Berger and Gedik-Balay, 2020)

$\widehat{G}_{w4} = 1 - \displaystyle \frac{\overline{v}_{w}}{\overline{y}_{w}},$

where $\overline{v}_{w}=\widehat{N}^{-1}\sum_{i \in S}w_{i}v_{i}$ and

$v_{i} = \displaystyle \frac{1}{\widehat{N} - w_{i}}\sum_{ \substack{j \in S\\ j\neq i}}\min(y_{i},y_{j}).$

Method 5 (Lerman and Yitzhaki, 1989)

$\widehat{G}_{w5} = \displaystyle \frac{2}{\widehat{N}\overline{y}_{w}} \sum_{i \in S} w_{(i)}^{+}[y_{(i)} - \overline{y}_{w}]\left[ \widehat{F}_{w}^{LY}(y_{(i)}) - \overline{F}_{w}^{LY} \right],$

where

$\widehat{F}_{w}^{LY}(y_{(i)}) = \displaystyle \frac{1}{\widehat{N}}\left(\widehat{N}_{(i-1)} + \frac{w_{(i)}^{+}}{2} \right)$

and $\overline{F}_{w}^{LY}=\widehat{N}^{-1}\sum_{i \in S}w_{(i)}^{+}\widehat{F}_{w}^{LY}(y_{(i)})$ .

\ Variances and confidence intervals.

For a given estimator $\widehat{G}_{w}$ and variable $z$ , the Horvitz-Thompson type variance estimator (Hortvitz and Thompson, 1952) is given by

$\widehat{V}_{HT}(\widehat{G}_{w}) = \displaystyle \sum_{i\in S}\sum_{j\in S}\breve{\Delta}_{ij}w_{i}w_{j}z_{i}z_{j},$

where

$\breve{\Delta}_{ij}=\displaystyle \frac{\pi_{ij}-\pi_{i}\pi_{j}}{\pi_{ij}}$

and $\pi_{ij}$ is the second (joint) inclusion probability of the individuals $i$ and $j$ , i.e., $\pi_{ij}=P\{(i,j)\in S)\}$ (argument Pij).

The Sen-Yates-Grundy type variance estimator (Sen, 1953; Yates and Grundy, 1953) is defined as

$\widehat{V}_{SYG}(\widehat{G}_{w}) = - \displaystyle \frac{1}{2}\sum_{i\in S}\sum_{j\in S}\breve{\Delta}_{ij}(w_{i}z_i-w_{j}z_{j})^{2}$

The Hartley-Rao type variance estimator (Hartley and Rao, 1962) is given by

$\widehat{V}_{HR}(\widehat{G}_{w}) = \displaystyle \frac{1}{n-1}\sum_{i\in S}\sum_{\substack{j \in S\\ j < i}}\left(1-\pi_i-\pi_j + \frac{1}{n}\sum_{k\in U}\pi_{k}^{2} \right)(w_{i}z_i-w_{j}z_{j})^{2}.$

Note that the The Horvitz-Thompson variance estimator can give negative values. We observe that both Horvitz-Thompson and Sen-Yates-Grundy variance estimators depend on second (joint) inclusion probabilities (argument Pij). The Hajek (1964) approximation

$\pi_{ij}\cong \pi_{i}\pi_{j}\left[1- \displaystyle \frac{(1-\pi_{i})(1-\pi_{j})}{\sum_{i \in S}(1-\pi_{i})} \right]$

is used when the second (joint) inclusion probabilities are not available (Pij = NULL). Note that the Hajek approximation is suggested for large-entropy sampling designs, large samples, and large populations (see Tille 2006; Berger and Tillé, 2009; Haziza et al., 2008; Berger, 2011). For instance, this approximation is not recomended for highly-stratified samples (Berger, 2005). The Hartley-Rao variance estimator requires the first inclusion probabilities at the population level (argument PiU). zjackknife computes the confidence interval based on the jackknife technique with critical values based on the Normal approximation. zalinearization and zblinearization compute the confidence intervals based on the linearization technique applied to the estimators

$\widehat{G}_{w}^{a} = \widehat{G}_{w1}$

and

$\widehat{G}_{w}^{b} = \displaystyle \frac{2}{\widehat{N}\overline{y}_{w}}\sum_{i \in S}w_{i}y_{i}\widehat{F}_{w}(y_{i})-1,$

respectively, where

$\widehat{F}_{w}(t)=\frac{1}{\widehat{N}}\sum_{i \in S}w_i\delta(y_i \leq t).$

Critical values are also based on the Normal approximation. pbootstrap computes the variance using the rescaled bootstrap, and the confidence interval is constructed using the percentile method. The vignette vignette("GiniVarInterval") contains a detailed description of the various methods for variance estimation and confidence intervals for the Gini index.

The following table summarises the various types of variances and confidence intervals that the function fcompareCI computes.

Interval	Variance	Critical values	References
_______________	______________	_________________	_________________________
`zjackknife`	Jackknife	Normal	Berger (2008)
`zalinearization`	Linearization	Normal	Langel and Tille (2013)
`zblinearization`	Linearization	Normal	Berger (2008)
`pBootstrap`	Rescaled bootstrap	Percentile bootstrap	Berger and Gedik-Balay (2020)

Value

If save.plot = FALSE, a data frame with columns:

interval. The method used to construct the confidence interval.
method. The method used to estimate the Gini index.
varformula. The type of formula for the variance estimator. Posible values are HT and SYG if argument PiU is missing, and HT, SYG amd HR if argument PiU is provided.
gini. The estimation of the Gini index.
lowerlimit. The lower limit of the confidence interval.
upperlimit. The upper limit of the confidence interval.
var.gini. The variance estimation for the estimator of the Gini index.

If save.plot = TRUE, a list with two components: (i) 'base.CI' a data frame of seven columns as just described and (ii) 'plot' a (ggplot) description of the plot, which is a list with components that contain the plot itself, the data, information about the scales, panels, etc. As a side-effect, a plot that compares the various methods for constructing confidence intervals for the Gini index is displayed. **ggplot2** is needed to be installed for this option to work.

If plotCI = TRUE, as a side-effect, a plot that compares the various methods for constructing confidence intervals for the Gini index is displayed. **ggplot2** is needed to be installed for this option to work.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Alfons, A., and Templ, M. (2012). Estimation of social exclusion indicators from complex surveys: The R package laeken. KU Leuven, Faculty of Business and Economics Working Paper.

Berger, Y. G. (2005). Variance estimation with highly stratified sampling designs with unequal probabilities. Australian & New Zealand Journal of Statistics, 47, 365–373.

Berger, Y. G. (2008). A note on the asymptotic equivalence of jackknife and linearization variance estimation for the Gini Coefficient. Journal of Official Statistics, 24(4), 541-555.

Berger, Y. G. (2011). Asymptotic consistency under large entropy sampling designs with unequal probabilities. Pakistan Journal of Statistics, 27, 407–426.

Berger, Y., and Gedik-Balay, İ. (2020). Confidence intervals of Gini coefficient under unequal probability sampling. Journal of Official Statistics, 36(2), 237-249.

Berger, Y. G. and Tillé, Y. (2009). Sampling with unequal probabilities. In Sample Surveys: Design, Methods and Applications (eds. D. Pfeffermann and C. R. Rao), 39–54. Elsevier, Amsterdam.

Hajek, J. (1964). Asymptotic theory of rejective sampling with varying probabilities from a finite population. The Annals of Mathematical Statistics, 35, 4, 1491–1523.

Hartley, H. O., and Rao, J. N. K. (1962). Sampling with unequal probabilities and without replacement. The Annals of Mathematical Statistics, 350-374.

Haziza, D., Mecatti, F. and Rao, J. N. K. (2008). Evaluation of some approximate variance estimators under the Rao-Sampford unequal probability sampling design. Metron, LXVI, 91–108.

Horvitz, D. G. and Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663–685.

Langel, M., and Tillé, Y. (2013). Variance estimation of the Gini index: revisiting a result several times published. Journal of the Royal Statistical Society: Series A (Statistics in Society), 176(2), 521-540.

Lerman, R. I., and Yitzhaki, S. (1989). Improving the accuracy of estimates of Gini coefficients. Journal of econometrics, 42(1), 43-47.

Muñoz, J. F., Moya-Fernández, P. J., and Álvarez-Verdejo, E. (2023). Exploring and Correcting the Bias in the Estimation of the Gini Measure of Inequality. Sociological Methods & Research. https://doi.org/10.1177/00491241231176847

Sen, A. R. (1953). On the estimate of the variance in sampling with varying probabilities. Journal of the Indian Society of Agricultural Statistics, 5, 119–127.

Tillé, Y. (2006). Sampling Algorithms. Springer, New York.

Yates, F., and Grundy, P. M. (1953). Selection without replacement from within strata with probability proportional to size. Journal of the Royal Statistical Society B, 15, 253–261.

Examples

# Income and weights (region 'Burgenland') from the 2006 Austrian EU-SILC (Package 'laeken').
data(eusilc, package="laeken")
y <- eusilc$eqIncome[eusilc$db040 == "Burgenland"]
w <- eusilc$rb050[eusilc$db040 == "Burgenland"]

# Estimation of the Gini index and confidence intervals using different methods.
fcompareCI(y, w)

y <- c(30428.83, 14976.54, 18094.09, 29476.79, 20381.93, 6876.17,
       10360.96, 8239.82, 29476.79, 32230.71)
w <- c(357.86, 480.99, 480.99, 476.01, 498.58, 498.58, 476, 498.58, 476.01, 476.01)
fcompareCI(y, w, plotCI = FALSE)
# Income and weights (region 'Burgenland') from the 2006 Austrian EU-SILC (Package 'laeken').
data(eusilc, package="laeken")
y <- eusilc$eqIncome[eusilc$db040 == "Burgenland"]
w <- eusilc$rb050[eusilc$db040 == "Burgenland"]

# Estimation of the Gini index and confidence intervals using different methods.
fcompareCI(y, w)

y <- c(30428.83, 14976.54, 18094.09, 29476.79, 20381.93, 6876.17,
       10360.96, 8239.82, 29476.79, 32230.71)
w <- c(357.86, 480.99, 480.99, 476.01, 498.58, 498.58, 476, 498.58, 476.01, 476.01)
fcompareCI(y, w, plotCI = FALSE)

Gini index, variances and confidence intervals in finite populations

Description

Estimates the Gini index and computes variances and confidence intervals in finite populations.

Usage

fgini(
  y,
  w,
  method = 2L,
  interval = NULL,
  Pi = NULL,
  Pij = NULL,
  PiU,
  alpha = 0.05,
  B = 1000L,
  na.rm = TRUE,
  varformula = "SYG",
  large.sample = FALSE
)

fgini(
  y,
  w,
  method = 2L,
  interval = NULL,
  Pi = NULL,
  Pij = NULL,
  PiU,
  alpha = 0.05,
  B = 1000L,
  na.rm = TRUE,
  varformula = "SYG",
  large.sample = FALSE
)

Arguments

`y`	A vector with the non-negative real numbers to be used for estimating the Gini index.
`w`	A numeric vector with the survey weights to be used for estimating the Gini index, the variance and the confidence interval. This argument can be missing if argument `Pi` is provided.
`method`	An integer between 1 and 5 selecting one of the 5 methods detailed below for estimating the Gini index in finite populations. The default method is `method = 2L`.
`interval`	A character string specifying the type of variance estimation and confidence interval to be used. Possible values are `"zjackknife"`, `"zalinearization"`, `"zblinearization"` and `"pbootstrap"`. `interval = NULL` omits the computation of both variance and confidence interval. The default value is `interval = NULL`.
`Pi`	A numeric vector with the (sample) first inclusion probabilites to be used for estimating the Gini index, the variance and the confidence interval. This argument can be `NULL` if argument `w` is provided. The default value is `Pi = NULL`.
`Pij`	A numeric square matrix with the (sample) second (joint) inclusion probabilites to be used for the variance estimation and the confidence interval. The Hajek approximation is used when `Pij = NULL`. This argument is used when `interval={"zjackknife", "zalinearization", "zblinearization"}`. The default value is `Pij = NULL`.
`PiU`	A numeric vector with the (population) first inclusion probabilites. This argument is only required when the Hartley-Rao expression for the variance estimation is selected (`varformula = "HR"`).
`alpha`	A single numeric value between 0 and 1. If `interval` is not `NULL`, the confidence level to be used for computing the confidence interval for the Gini is `1-alpha`. Some authors call `alpha` the significance level. The default value is `alpha = 0.05`.
`B`	A single integer specifying the number of bootstrap replicates. This argument is required when `interval = "pbootsptrap"`. The default value is `B = 1000L`.
`na.rm`	A 'TRUE/FALSE' logical value indicating whether `NA`'s should be removed before the computation proceeds. The default value is `na.rm = TRUE`.
`varformula`	A character string specifying the type of formula to be used for the variance estimator when `interval = {"zjackknife", "zalinearization", "zblinearization"}`. Possible values are `"HT"` (Hortvitz-Thompson), `"SYG"` (Sen-Yates-Grundy) and `"HR"` (Hartley-Rao). The default value is `varformula = "SYG"`.
`large.sample`	A 'TRUE/FALSE' logical value indicating indicating whether the sample is large to apply a faster algorithm to sort the sample values in the computation of the Gini index. The default value is `large.sample = FALSE`.

Details

For a sample $S$ , with size $n$ and inclusion probabilities $\pi_i=P(i\in S)$ (argument Pi), derived from a finite population $U$ , with size $N$ , different formulations of the Gini index have been proposed in the literature. his function estimates the Gini index, variances and confidence intervals using various formulations. The different methods for estimating the Gini index are (see also Muñoz et al., 2023):

\ Gini Index formulae.

method = 1 (Langel and Tillé, 2013)

$\widehat{G}_{w1}= \displaystyle \frac{1}{2\widehat{N}^{2}\overline{y}_{w}}\sum_{i \in S}\sum_{j \in S}w_{i}w_{j}|y_{i}-y_{j}|,$

method = 2 (Alfons and Templ, 2012; Langel and Tillé, 2013)

$\widehat{G}_{w2} =\displaystyle \frac{2\sum_{i \in S}w_{(i)}^{+}\widehat{N}_{(i)}y_{(i)} - \sum_{i \in S}w_{i}^{2}y_{i} }{\widehat{N}^{2}\overline{y}_{w}}-1,$

method = 3 (Berger, 2008)

$\widehat{G}_{w3} = \displaystyle \frac{2}{\widehat{N}\overline{y}_{w}}\sum_{i \in S}w_{i}y_{i}\widehat{F}_{w}^{\ast}(y_{i})-1,$

where

$\widehat{F}_{w}^{\ast}(t) = \displaystyle \frac{1}{\widehat{N}}\sum_{i \in S}w_{i}[\delta(y_i < t) + 0.5\delta(y_i = t)]$

is the smooth (mid-point) distribution function, and $\delta(\cdot)$ is the indicator variable that takes the value 1 when its argument is true, and the value 0 otherwise. It can be seen that $\widehat{G}_{w2} = \widehat{G}_{w3}$ .

method = 4 (Berger and Gedik-Balay, 2020)

$\widehat{G}_{w4} = 1 - \displaystyle \frac{\overline{v}_{w}}{\overline{y}_{w}},$

where $\overline{v}_{w}=\widehat{N}^{-1}\sum_{i \in S}w_{i}v_{i}$ and

$v_{i} = \displaystyle \frac{1}{\widehat{N} - w_{i}}\sum_{ \substack{j \in S\\ j\neq i}}\min(y_{i},y_{j}).$

method = 5 (Lerman and Yitzhaki, 1989)

$\widehat{G}_{w5} = \displaystyle \frac{2}{\widehat{N}\overline{y}_{w}} \sum_{i \in S} w_{(i)}^{+}[y_{(i)} - \overline{y}_{w}]\left[ \widehat{F}_{w}^{LY}(y_{(i)}) - \overline{F}_{w}^{LY} \right],$

where

$\widehat{F}_{w}^{LY}(y_{(i)}) = \displaystyle \frac{1}{\widehat{N}}\left(\widehat{N}_{(i-1)} + \frac{w_{(i)}^{+}}{2} \right)$

and $\overline{F}_{w}^{LY}=\widehat{N}^{-1}\sum_{i \in S}w_{(i)}^{+}\widehat{F}_{w}^{LY}(y_{(i)})$ .

\ Variances and confidence intervals.

For a given estimator $\widehat{G}_{w}$ and variable $z$ , the Horvitz-Thompson type variance estimator (Hortvitz and Thompson, 1952)

$\widehat{V}_{HT}(\widehat{G}_{w}) = \displaystyle \sum_{i\in S}\sum_{j\in S}\breve{\Delta}_{ij}w_{i}w_{j}z_{i}z_{j}$

is computed when varformula = "HT", where

$\breve{\Delta}_{ij}=\displaystyle \frac{\pi_{ij}-\pi_{i}\pi_{j}}{\pi_{ij}}$

and $\pi_{ij}$ is the second (joint) inclusion probability of the individuals $i$ and $j$ , i.e., $\pi_{ij}=P\{(i,j)\in S)\}$ (argument Pij).

The Sen-Yates-Grundy type variance estimator (Sen, 1953; Yates and Grundy, 1953)

$\widehat{V}_{SYG}(\widehat{G}_{w}) = - \displaystyle \frac{1}{2}\sum_{i\in S}\sum_{j\in S}\breve{\Delta}_{ij}(w_{i}z_i-w_{j}z_{j})^{2}$

is computed when varformula = "SYG", and the Hartley-Rao type variance estimator (Hartley and Rao, 1962)

is computed when varformula = "HR". Note that the The Horvitz-Thompson variance estimator can give negative values. We observe that both Horvitz-Thompson and Sen-Yates-Grundy variance estimators depend on second (joint) inclusion probabilities (argument Pij). The Hajek (1964) approximation

$\pi_{ij}\cong \pi_{i}\pi_{j}\left[1- \displaystyle \frac{(1-\pi_{i})(1-\pi_{j})}{\sum_{i \in S}(1-\pi_{i})} \right]$

is used when the second (joint) inclusion probabilities are not available (Pij = NULL). Note that the Hajek approximation is suggested for large-entropy sampling designs, large samples, and large populations (see Tille 2006; Berger and Tille, 2009; Haziza et al., 2008; Berger, 2011). For instance, this approximation is not recomended for highly-stratified samples (Berger, 2005). The Hartley-Rao variance estimator requires the first inclusion probabilities at the population level (argument PiU). zjakknife computes the confidence interval based on the jackknife technique with critical values based on the Normal approximation. zalinearization and zblinearization compute the confidence intervals based on the linearization technique applied to the estimators

$\widehat{G}_{w}^{a} = \widehat{G}_{w1}$

and

$\widehat{G}_{w}^{b} = \displaystyle \frac{2}{\widehat{N}\overline{y}_{w}}\sum_{i \in S}w_{i}y_{i}\widehat{F}_{w}(y_{i})-1,$

respectively, where

$\widehat{F}_{w}(t)=\frac{1}{\widehat{N}}\sum_{i \in S}w_i\delta(y_i \leq t).$

The following table summarises the various types of variances and confidence intervals that the function fgini computes. The argument varformula only applies for the jackknife and linearization techniques (see Berger, 2008; Langel and Tillé, 2013).

Interval	Variance	Critical values	References
_______________	______________	_________________	_________________________
`zjackknife`	Jackknife	Normal	Berger (2008)
`zalinearization`	Linearization	Normal	Langel and Tille (2013)
`zblinearization`	Linearization	Normal	Berger (2008)
`pBootstrap`	Rescaled bootstrap	Percentile bootstrap	Berger and Gedik-Balay (2020)

Value

When interval = NULL, the function returns a single numeric value between 0 and 1 informing about the estimation of the Gini index. When interval is not NULL, the function returns a list with 3 components: a single numeric value with the estimation of the Gini index; a single numeric value with the variance estimation of the Gini index; and a vector of length two containing the lower and upper limits of the confidence interval for the Gini index.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Alfons, A., and Templ, M. (2012). Estimation of social exclusion indicators from complex surveys: The R package laeken. KU Leuven, Faculty of Business and Economics Working Paper.

Berger, Y. G. (2005). Variance estimation with highly stratified sampling designs with unequal probabilities. Australian & New Zealand Journal of Statistics, 47, 365–373.

Berger, Y. G. (2008). A note on the asymptotic equivalence of jackknife and linearization variance estimation for the Gini Coefficient. Journal of Official Statistics, 24(4), 541-555.

Berger, Y. G. (2011). Asymptotic consistency under large entropy sampling designs with unequal probabilities. Pakistan Journal of Statistics, 27, 407–426.

Berger, Y. G. and Tillé, Y. (2009). Sampling with unequal probabilities. In Sample Surveys: Design, Methods and Applications (eds. D. Pfeffermann and C. R. Rao), 39–54. Elsevier, Amsterdam

Berger, Y., and Gedik-Balay, I. (2020). Confidence intervals of Gini coefficient under unequal probability sampling. Journal of Official Statistics, 36(2), 237-249.

Hajek, J. (1964). Asymptotic theory of rejective sampling with varying probabilities from a finite population. The Annals of Mathematical Statistics, 35, 4, 1491–1523.

Hartley, H. O., and Rao, J. N. K. (1962). Sampling with unequal probabilities and without replacement. The Annals of Mathematical Statistics, 350-374.

Haziza, D., Mecatti, F. and Rao, J. N. K. (2008). Evaluation of some approximate variance estimators under the Rao-Sampford unequal probability sampling design. Metron, LXVI, 91–108.

Horvitz, D. G. and Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663–685.

Langel, M., and Tille, Y. (2013). Variance estimation of the Gini index: revisiting a result several times published. Journal of the Royal Statistical Society: Series A (Statistics in Society), 176(2), 521-540.

Lerman, R. I., and Yitzhaki, S. (1989). Improving the accuracy of estimates of Gini coefficients. Journal of econometrics, 42(1), 43-47.

Sen, A. R. (1953). On the estimate of the variance in sampling with varying probabilities. Journal of the Indian Society of Agricultural Statistics, 5, 119–127.

Tillé, Y. (2006). Sampling Algorithms. Springer, New York.

Yates, F., and Grundy, P. M. (1953). Selection without replacement from within strata with probability proportional to size. Journal of the Royal Statistical Society B, 15, 253–261.

Examples

# Income and weights (region 'Burgenland') from the 2006 Austrian EU-SILC (Package 'laeken').
data(eusilc, package="laeken")
y <- eusilc$eqIncome[eusilc$db040 == "Burgenland"]
w <- eusilc$rb050[eusilc$db040 == "Burgenland"]

# Estimation of the Gini index using 'method = 2' .
fgini(y, w)


y <- c(30428.83, 14976.54, 18094.09, 29476.79, 20381.93, 6876.17,
       10360.96, 8239.82, 29476.79, 32230.71)
w <- c(357.86, 480.99, 480.99, 476.01, 498.58, 498.58, 476, 498.58, 476.01, 476.01)

# Gini index estimation and confidence interval using:
 ## a: The method 2 for point estimation.
 ## b: The method 'zjackknife' for variance estimation.
 ## c: The Sen-Yates-Grundy type variance estimator.
 ## d: The Hajek approximation for the joint inclusion probabilities.
fgini(y, w, interval = "zjackknife")

# Gini index estimation and confidence interval using:
 ## a: The method 2 for point estimation.
 ## b: The method 'zalinearization' for variance estimation.
 ## c: The Sen-Yates-Grundy type variance estimator.
 ## d: The Hajek approximation for the joint inclusion probabilities.
fgini(y, w, interval = "zalinearization")

# Gini index estimation and confidence interval using:
 ## a: The method 3 for point estimation.
 ## b: The method 'zblinearization' for variance estimation.
 ## c: The Sen-Yates-Grundy type variance estimator.
 ## d: The Hajek approximation for the joint inclusion probabilities.
fgini(y, w, method = 3L, interval = "zblinearization")

# Gini index estimation and confidence interval using:
 ## a: The method 2 for point estimation.
 ## b: The method 'pbootstrap' for variance estimation.
 ## c: The percentile bootstrap method for the confidence interval.
fgini(y, w, interval = "pbootstrap")
# Income and weights (region 'Burgenland') from the 2006 Austrian EU-SILC (Package 'laeken').
data(eusilc, package="laeken")
y <- eusilc$eqIncome[eusilc$db040 == "Burgenland"]
w <- eusilc$rb050[eusilc$db040 == "Burgenland"]

# Estimation of the Gini index using 'method = 2' .
fgini(y, w)


y <- c(30428.83, 14976.54, 18094.09, 29476.79, 20381.93, 6876.17,
       10360.96, 8239.82, 29476.79, 32230.71)
w <- c(357.86, 480.99, 480.99, 476.01, 498.58, 498.58, 476, 498.58, 476.01, 476.01)

# Gini index estimation and confidence interval using:
 ## a: The method 2 for point estimation.
 ## b: The method 'zjackknife' for variance estimation.
 ## c: The Sen-Yates-Grundy type variance estimator.
 ## d: The Hajek approximation for the joint inclusion probabilities.
fgini(y, w, interval = "zjackknife")

# Gini index estimation and confidence interval using:
 ## a: The method 2 for point estimation.
 ## b: The method 'zalinearization' for variance estimation.
 ## c: The Sen-Yates-Grundy type variance estimator.
 ## d: The Hajek approximation for the joint inclusion probabilities.
fgini(y, w, interval = "zalinearization")

# Gini index estimation and confidence interval using:
 ## a: The method 3 for point estimation.
 ## b: The method 'zblinearization' for variance estimation.
 ## c: The Sen-Yates-Grundy type variance estimator.
 ## d: The Hajek approximation for the joint inclusion probabilities.
fgini(y, w, method = 3L, interval = "zblinearization")

# Gini index estimation and confidence interval using:
 ## a: The method 2 for point estimation.
 ## b: The method 'pbootstrap' for variance estimation.
 ## c: The percentile bootstrap method for the confidence interval.
fgini(y, w, interval = "pbootstrap")

Gini index for finite populations and different estimation methods.

Description

Estimates the Gini index in finite populations, using different methods.

Usage

fginindex(
 y,
 w,
 method = 2L,
 Pi = NULL,
 na.rm = TRUE,
 useRcpp = TRUE
)
fginindex(
 y,
 w,
 method = 2L,
 Pi = NULL,
 na.rm = TRUE,
 useRcpp = TRUE
)

Arguments

`y`	A vector with the non-negative real numbers to be used for estimating the Gini index.
`w`	A numeric vector with the survey weights to be used for estimating the Gini index. This argument can be missing if argument `Pi` is provided.
`method`	An integer between 1 and 5 selecting one of the 5 methods detailed below for estimating the Gini index in finite populations. The default method is `method = 2L`.
`Pi`	A numeric vector with the (sample) first inclusion probabilites to be used for estimating the Gini index. This argument can be `NULL` if argument `w` is provided. The default value is `Pi = NULL`.
`na.rm`	A 'TRUE/FALSE' logical value indicating whether `NA`'s should be removed before the computation proceeds. The default value is `na.rm = TRUE`.
`useRcpp`	A 'TRUE/FALSE' logical value indicating whether `Rcpp` (`useRcpp = TRUE`), or `R` (`useRcpp = FALSE`), is used for computation. The default value is `UseRcpp = TRUE`.

Details

For a sample $S$ , with size $n$ and inclusion probabilities $\pi_i=P(i\in S)$ (argument Pi), derived from a finite population $U$ , with size $N$ , different formulations of the Gini index have been proposed in the literature. This function estimates the Gini index using various formulations, and both R and ⁠C++⁠ codes are implemented. This can be useful for research purposes, and speed comparisons can be made. The different methods for estimating the Gini index are (see also Muñoz et al., 2023):

method = 1 (Langel and Tillé, 2013)

$\widehat{G}_{w1}= \displaystyle \frac{1}{2\widehat{N}^{2}\overline{y}_{w}}\sum_{i \in S}\sum_{j \in S}w_{i}w_{j}|y_{i}-y_{j}|,$

method = 2 (Alfons and Templ, 2012; Langel and Tillé, 2013)

$\widehat{G}_{w2} =\displaystyle \frac{2\sum_{i \in S}w_{(i)}^{*}\widehat{N}_{(i)}y_{(i)} - \sum_{i \in S}w_{i}^{2}y_{i} }{\widehat{N}^{2}\overline{y}_{w}}-1,$

where $y_{(i)}$ are the values $y_i$ sorted in increasing order, $w_{(i)}^{*}$ are the values $w_i$ sorted according to the increasing order of the values $y_i$ , and $\widehat{N}_{(i)}=\sum_{j=1}^{i}w_{(j)}^{*}$ . Langel and Tillé (2013) show that $\widehat{G}_{w1} = \widehat{G}_{w2}$ .

method = 3 (Berger, 2008)

$\widehat{G}_{w3} = \displaystyle \frac{2}{\widehat{N}\overline{y}_{w}}\sum_{i \in S}w_{i}y_{i}\widehat{F}_{w}^{\ast}(y_{i})-1,$

where

$\widehat{F}_{w}^{\ast}(t) = \displaystyle \frac{1}{\widehat{N}}\sum_{i \in S}w_{i}[\delta(y_i < t) + 0.5\delta(y_i = t)]$

method = 4 (Berger and Gedik-Balay, 2020)

$\widehat{G}_{w4} = 1 - \displaystyle \frac{\overline{z}_{w}}{\overline{y}_{w}},$

where $\overline{z}_{w}=\widehat{N}^{-1}\sum_{i \in S}w_{i}z_{i}$ and

$z_{i} = \displaystyle \frac{1}{\widehat{N} - w_{i}}\sum_{ \substack{j \in S\\ j\neq i}}\min(y_{i},y_{j}).$

method = 5 (Lerman and Yitzhaki, 1989)

$\widehat{G}_{w5} = \displaystyle \frac{2}{\widehat{N}\overline{y}_{w}} \sum_{i \in S} w_{i}[y_{i} - \overline{y}_{w}]\left[ \widehat{F}_{w}^{LY}(y_{i}) - \overline{F}_{w}^{LY} \right],$

where

$\widehat{F}_{w}^{LY}(y_{i}) = \displaystyle \frac{1}{\widehat{N}}\left(\widehat{N}_{(i-1)} + \frac{w_{(i)}^{\ast}}{2} \right)$

and $\overline{F}_{w}^{LY}=\widehat{N}^{-1}\sum_{i \in S}w_{i}\widehat{F}_{w}^{LY}(y_{i})$ .

Value

A single numeric value between 0 and 1. The estimation of the Gini index.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Alfons, A., and Templ, M. (2012). Estimation of social exclusion indicators from complex surveys: The R package laeken. KU Leuven, Faculty of Business and Economics Working Paper.

Berger, Y. G. (2008). A note on the asymptotic equivalence of jackknife and linearization variance estimation for the Gini Coefficient. Journal of Official Statistics, 24(4), 541-555.

Berger, Y. G., and Gedik-Balay, İ. (2020). Confidence intervals of Gini coefficient under unequal probability sampling. Journal of official statistics, 36(2), 237-249.

Lerman, R. I., and Yitzhaki, S. (1989). Improving the accuracy of estimates of Gini coefficients. Journal of econometrics, 42(1), 43-47.

Examples

# Income and weights (region "Burgenland") from the 2006 Austrian EU-SILC (Package 'laeken').
data(eusilc, package="laeken")
y <- eusilc$eqIncome[eusilc$db040 == "Burgenland"]
w <- eusilc$rb050[eusilc$db040 == "Burgenland"]

#Comparing the computation time for the various estimation methods and using R
microbenchmark::microbenchmark(
fginindex(y, w, method = 1L,  useRcpp = FALSE),
fginindex(y, w, method = 2L,  useRcpp = FALSE),
fginindex(y, w, method = 3L,  useRcpp = FALSE),
fginindex(y, w, method = 4L,  useRcpp = FALSE),
fginindex(y, w, method = 5L,  useRcpp = FALSE)
)

# Comparing the computation time for the various estimation methods and using Rcpp
microbenchmark::microbenchmark(
fginindex(y, w, method = 1L),
fginindex(y, w, method = 2L),
fginindex(y, w, method = 3L),
fginindex(y, w, method = 4L),
fginindex(y, w, method = 5L)
)



# Estimation of the Gini index using 'method = 4'.
y <- c(30428.83, 14976.54, 18094.09, 29476.79, 20381.93, 6876.17,
       10360.96, 8239.82, 29476.79, 32230.71)
w <- c(357.86, 480.99, 480.99, 476.01, 498.58, 498.58, 476, 498.58, 476.01, 476.01)
fginindex(y, w, method = 4L)

# Income and weights (region "Burgenland") from the 2006 Austrian EU-SILC (Package 'laeken').
data(eusilc, package="laeken")
y <- eusilc$eqIncome[eusilc$db040 == "Burgenland"]
w <- eusilc$rb050[eusilc$db040 == "Burgenland"]

#Comparing the computation time for the various estimation methods and using R
microbenchmark::microbenchmark(
fginindex(y, w, method = 1L,  useRcpp = FALSE),
fginindex(y, w, method = 2L,  useRcpp = FALSE),
fginindex(y, w, method = 3L,  useRcpp = FALSE),
fginindex(y, w, method = 4L,  useRcpp = FALSE),
fginindex(y, w, method = 5L,  useRcpp = FALSE)
)

# Comparing the computation time for the various estimation methods and using Rcpp
microbenchmark::microbenchmark(
fginindex(y, w, method = 1L),
fginindex(y, w, method = 2L),
fginindex(y, w, method = 3L),
fginindex(y, w, method = 4L),
fginindex(y, w, method = 5L)
)



# Estimation of the Gini index using 'method = 4'.
y <- c(30428.83, 14976.54, 18094.09, 29476.79, 20381.93, 6876.17,
       10360.96, 8239.82, 29476.79, 32230.71)
w <- c(357.86, 480.99, 480.99, 476.01, 498.58, 498.58, 476, 498.58, 476.01, 476.01)
fginindex(y, w, method = 4L)

Gini index for the Beta distribution with user-defined shape parameters

Description

Calculates the Gini index for the Beta distribution with shape parameters $a$ (shape1) and $b$ (shape2).

Usage

gbeta(shape1, shape2)
gbeta(shape1, shape2)

Arguments

`shape1`	A positive real number specifying the shape1 parameter $a$ of the Beta distribution.
`shape2`	A positive real number specifying the shape2 parameter $b$ of the Beta distribution.

Details

The Beta distribution with shape parameters $a$ (argument shape1) and $b$ (argument shape2) and denoted as $Beta(a,b)$ , where $a>0$ and $b>0$ , has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022)

$f(y) = \displaystyle \frac{1}{B(a,b)}y^{a-1}(1-y)^{b-1},$

and a cumulative distribution function given by

$F(y)= \displaystyle \frac{B(y;a,b)}{B(a,b)}$

where $0 \leq y \leq 1$ ,

$B(a,b) = \displaystyle \frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)}$

is the beta function,

$\Gamma(\alpha) = \int_{0}^{\infty}t^{\alpha-1}e^{-t}dt$

is the gamma function, and

$B(y;a,b) = \displaystyle \int_{0}^{y}t^{a-1}(1-t)^{b-1}dt$

is the incomplete beta function.

The Gini index can be computed as

$G = \displaystyle \frac{2}{a}\frac{B(a+b,a+b)}{B(a,a)B(b,b)}.$

Value

A numeric value with the Gini index. A NA is returned when a shape parameter is non-numeric or non-positive.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995). Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.

Examples

# Gini index for the Beta distribution with shape parameters 'a = 2' and 'b = 1'.
gbeta(shape1 = 2, shape2 = 1)

# Gini index for the Beta distribution with shape parameters 'a = 1' and 'b = 2'.
gbeta(shape1 = 1, shape2 = 2)
# Gini index for the Beta distribution with shape parameters 'a = 2' and 'b = 1'.
gbeta(shape1 = 2, shape2 = 1)

# Gini index for the Beta distribution with shape parameters 'a = 1' and 'b = 2'.
gbeta(shape1 = 1, shape2 = 2)

Gini index for the Burr Type XII (Singh-Maddala) distribution with user-defined scale and shape parameters

Description

Calculates the Gini index for the Burr Type XII (Singh-Maddala) distribution with scale parameter $b$ and shape parameters $g$ (shape.g) and $s$ (shape.s).

Usage

gburr(
 scale = 1,
 shape.g = 1,
 shape.s = 1
)
gburr(
 scale = 1,
 shape.g = 1,
 shape.s = 1
)

Arguments

`scale`	A positive real number specifying the scale parameter $b$ of the Burr Type XII (Singh-Maddala) distribution. The default value is `scale = 1`.
`shape.g`	A positive real number specifying the shape parameter $g$ of the Burr Type XII (Singh-Maddala) distribution. The default value is `shape.g = 1`.
`shape.s`	A positive real number specifying the shape parameter $s$ of the Burr Type XII (Singh-Maddala) distribution. The default value is `shape.s = 1`.

Details

The Burr Type XII (Singh-Maddala) distribution with scale parameter $b$ , shape parameters $g$ (argument shape.g) and $s$ (argument shape.s) and denoted as $BurrXII(b,g,s)$ , where $b>0$ , $g>0$ and $s>0$ , has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Rodriguez, 1977; Yee, 2022)

$f(y) = \displaystyle \frac{gs}{b}\left(\frac{y}{b}\right)^{g-1}\left[1 + \left(\frac{y}{b}\right)^{g}\right]^{-(s+1)},$

and a cumulative distribution function given by

$F(y)=1-\left[1 + \displaystyle \left( \frac{y}{b}\right)^{g} \right]^{-s},$

where $y>0$ .

The Gini index can be computed as

$G = 2\left(0.5 - \displaystyle \frac{1}{E[y]}\int_{0}^{1}\int_{0}^{Q(y)}yf(y)dy\right),$

where $Q(y)$ is the quantile function of the Burr Type XII (Singh-Maddala) distribution, and $E[y]$ is the expectation of the distribution. The Burr Type XII (Singh-Maddala) distribution is related to the Pareto (IV) distribution: $BurrXII(b,g,s) = ParetoIV(0,b,1/g,s)$ .

Value

A numeric value with the Gini index. A NA is returned when any of the parameter is non-numeric or non-positive.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995). Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Rodriguez, R. N. (1977). A guide to the Burr type XII distributions. Biometrika, 64(1), 129-134.

Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.

Examples

# Gini index for the Burr Type XII distribution with 'scale = 1', 'shape.g = 2', 'shape.s = 1'.
gburr(scale = 1, shape.g = 2, shape.s = 1)

# Gini index for the Burr Type XII distribution with 'scale = 1', 'shape.g = 5', 'shape.s = 3'.
gburr(scale = 1, shape.g = 5, shape.s = 3)

# Gini index for the Burr Type XII distribution with 'scale = 1', 'shape.g = 2', 'shape.s = 1'.
gburr(scale = 1, shape.g = 2, shape.s = 1)

# Gini index for the Burr Type XII distribution with 'scale = 1', 'shape.g = 5', 'shape.s = 3'.
gburr(scale = 1, shape.g = 5, shape.s = 3)

Gini index for the Chi-Squared distribution with user-defined degrees of freedom

Description

Calculates Gini indices for the Chi-Squared distribution with degrees of freedom $n$ (df).

Usage

gchisq(df)
gchisq(df)

Arguments

`df`	A vector of positive real numbers specifying degrees of freedom of the Chi-Squared distribution.

Details

The Chi-Squared distribution with degrees of freedom $n$ (argument df) and denoted as $\chi_{n}^2$ , where $n>0$ , has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995)

$f(y)= \displaystyle \frac{1}{2^{n/2}\Gamma\left(\frac{n}{2}\right)}y^{n/2-1}e^{-y/2},$

and a cumulative distribution function given by

$F(y) = \frac{\gamma\left(\frac{n}{2}, \frac{y}{2}\right)}{\Gamma(\alpha)},$

where $y \geq 0$ , the gamma function is defined by

$\Gamma(\alpha) = \int_{0}^{\infty}t^{\alpha-1}e^{-t}dt,$

and the lower incomplete gamma function is given by

$\gamma(\alpha,y) = \int_{0}^{y}t^{\alpha-1}e^{-t}dt.$

The Gini index can be computed as

$G=\displaystyle \frac{2\Gamma\left( \frac{1+n}{2}\right)}{n\Gamma\left(\frac{n}{2}\right)\sqrt{\pi}}.$

The Chi-Squared distribution is related to the Gamma distribution: $\chi_{n}^2 = Gamma(n/2, 2)$ .

Value

A numeric vector with the Gini indices. A NA is returned when degrees of freedom are non-numeric or non-positive.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Examples

# Gini index for the Chi-Squared distribution with degrees of freedom equal to 2.
gchisq(df = 2)

# Gini indices for the Chi-Squared distribution and different degrees of freedom.
gchisq(df = 5:10)
# Gini index for the Chi-Squared distribution with degrees of freedom equal to 2.
gchisq(df = 2)

# Gini indices for the Chi-Squared distribution and different degrees of freedom.
gchisq(df = 5:10)

Gini index for the Dagum distribution with user-defined shape parameters

Description

Calculates the Gini index for the Dagum distribution with shape parameters $a$ (shape1.a) and $p$ (shape2.p).

Usage

gdagum(shape1.a, shape2.p)
gdagum(shape1.a, shape2.p)

Arguments

`shape1.a`	A positive real number specifying the shape1 parameter $a$ of the Dagum distribution.
`shape2.p`	A positive real number specifying the shape parameter $p$ of the Dagum distribution.

Details

The Dagum distribution with scale parameter $b$ , shape parameters $a$ (argument shape1.a) and $p$ (argument shape2.p) and denoted as $Dagum(b,a,p)$ , where $b>0$ , $a>0$ and $p>0$ , has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Rodriguez, 1977; Yee, 2022)

$f(y) = \displaystyle \frac{ap}{y}\frac{\left(\frac{y}{b}\right)^{ap}}{ \left[\left(\frac{y}{b} \right)^{a} + 1 \right]^{p+1} },$

and a cumulative distribution function given by

$F(y)= \left[1 + \displaystyle \left( \frac{y}{b}\right)^{-a} \right]^{-p},$

where $y > 0$ .

The Gini index can be computed as

$G = \displaystyle \frac{\Gamma(p)\Gamma(2p+1/a)}{\Gamma(2p)\Gamma(p+1/a)}-1,$

where the gamma function is defined as

$\Gamma(\alpha) = \int_{0}^{\infty}t^{\alpha-1}e^{-t}dt.$

The Dagum distribution is also known the Burr III, inverse Burr, beta-K, or 3-parameter kappa distribution. The Dagum distribution is related to the Fisk (Log Logistic) distribution: $Dagum(b,a,1) = Fisk(b,a)$ . The Dagum distribution is also related to the inverse Lomax distribution and the inverse paralogistic distribution (see Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022).

Value

A numeric value with the Gini index. A NA is returned when a shape parameter is non-numeric or non-positive.

Note

The Gini index of the Dagum distribution does not depend on its scale parameter.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.

Examples

# Gini index for the Dagum distribution with shape parameters 'a = 2' and 'p = 20'.
gdagum(shape1.a = 2, shape2.p = 20)

# Gini index for the Dagum distribution with shape parameters 'a = 2' and 'p = 20'.
gdagum(shape1.a = 2, shape2.p = 20)

Gini index for the F distribution with user-defined degrees of freedom

Description

Calculates the Gini index for the F distribution with degrees of freedom $\nu_1$ (df1) and $\nu_2$ (df2).

Usage

gf(df1, df2)
gf(df1, df2)

Arguments

`df1`	A positive real number specifying the degrees of freedom $\nu_1$ of the F distribution.
`df2`	A positive real number higher or equal than two specifying the degrees of freedom $\nu_2$ of the F distribution.

Details

The F distribution with $\nu_1$ (argument df1) and $\nu_2$ (argument df2) degrees of freedom and denoted as $F_{\nu_1,\nu_2}$ , where $\nu_1>0$ and $\nu_2 > 0$ , has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995)

$f(y) = \displaystyle \frac{\Gamma\left(\frac{\nu_{1}}{2} + \frac{\nu_{2}}{2}\right)}{\Gamma\left(\frac{\nu_{1}}{2}\right)\Gamma\left(\frac{\nu_{2}}{2}\right)}\left( \frac{\nu_{1}}{\nu_{2}}\right)^{\nu_{1}/2}y^{\nu_{1}/2-1}\left(1 + \frac{\nu_{1}y}{\nu_{2}}\right)^{-(\nu_{1}+\nu_{2})/2},$

and a cumulative distribution function given by

$F(y)= \displaystyle I_{\nu_{1}y/(\nu_{1}y + \nu_{2})}\left( \frac{\nu_{1}}{2}, \frac{\nu_{2}}{2} \right),$

where $y \geq 0$ ,

$\Gamma(\alpha) = \int_{0}^{\infty}t^{\alpha-1}e^{-t}dt$

is the gamma function,

$I_{y}(a,b)=\displaystyle \frac{B(y;a,b)}{B(a,b)}$

is the regularized incomplete beta function,

$B(a,b) = \displaystyle \frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)}$

is the beta function, and

$B(y;a,b) = \displaystyle \int_{0}^{y}t^{a-1}(1-t)^{b-1}dt$

is the incomplete beta function.

The Gini index, for $\nu_2 \geq 2$ , can be computed as

$G = 2\left(0.5 - \displaystyle \frac{\nu_{2} - 2}{ \nu_{2}}\int_{0}^{1}\int_{0}^{Q(y)}yf(y)dy\right),$

where $Q(y)$ is the quantile function of the F distribution.

Value

A numeric value with the Gini index. A NA is returned when degrees of freedom are non-numeric or $df1 \leq 0$ or $df2 < 2$ .

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Examples

# Gini index for the F distribution with 'df1 = 10' and 'df2 = 20' degrees of freedom.
gf(df1 = 10, df2 = 20)

# Gini index for the F distribution with 'df1 = 10' and 'df2 = 20' degrees of freedom.
gf(df1 = 10, df2 = 20)

Gini index for the Fisk (Log Logistic) distribution with user-defined shape parameters

Description

Calculates the Gini indices for the Fisk (Log Logistic) distribution with shape parameters $a$ (shape1.a).

Usage

gfisk(shape1.a)
gfisk(shape1.a)

Arguments

shape1.a

A vector of positive real numbers specifying shape parameters $a$ of the Fisk (Log Logistic) distribution.

Details

The Fisk (Log Logistic) distribution with scale parameter $b$ , shape parameter $a$ (argument shape1.a) and denoted as $Fisk(b,a)$ , where $b>0$ and $a>0$ , has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022)

$f(y) = \displaystyle \frac{a}{y}\frac{\left(\frac{y}{b}\right)^{a}}{ \left[\left(\frac{y}{b} \right)^{a} + 1 \right]^{2} },$

and a cumulative distribution function given by

$F(y)=1-\left[1 + \displaystyle \left( \frac{y}{b}\right)^{a} \right]^{-1},$

where $y \geq 0$ .

The Gini index can be computed as

$G = \left\{ \begin{array}{cl} 1 , & 0< a <1; \\ \displaystyle \frac{1}{a}, & a \geq 1. \end{array} \right.$

The Fisk (Log Logistic) distribution is related to the Dagum distribution: $Fisk(b,a) = Dagum(b,a,1)$ .

Value

A numeric vector with the Gini indices. A NA is returned when a shape parameter is non-numeric or non-positive.

Note

The Gini index of the Fisk (Log Logistic) distribution does not depend on its scale parameter.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.

Examples

# Gini index for the Fisk distribution with a shape parameter 'a = 2'.
gfisk(shape1.a = 2)

# Gini indices for the Fisk distribution and different shape parameters.
gfisk(shape1.a = 1:10)
# Gini index for the Fisk distribution with a shape parameter 'a = 2'.
gfisk(shape1.a = 2)

# Gini indices for the Fisk distribution and different shape parameters.
gfisk(shape1.a = 1:10)

Gini index for the Frechet distribution with user-defined shape parameters

Description

Calculates the Gini indices for the Frechet distribution with shape parameters $s$ .

Usage

gfrechet(shape)
gfrechet(shape)

Arguments

shape

A vector of positive real numbers higher or equal than 1 specifying shape parameters $s$ of the Frechet distribution.

Details

The Frechet distribution with location parameter $a$ , scale parameter $b$ , shape parameter $s$ and denoted as $Frechet(a,b,s)$ , where $a>0$ , $b>0$ and $s>0$ , has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995)

$f(y) = \displaystyle \frac{sb}{(y-a)^{2}} \left(\frac{b}{y-a}\right)^{s-1} \exp\left[- \displaystyle \left(\frac{b}{y-a}\right)^{s} \right],$

and a cumulative distribution function given by

$F(y)= \displaystyle \exp\left[- \displaystyle \left(\frac{b}{y-a}\right)^{s} \right],$

where $y > a$ .

The Gini index, for $s \geq 1$ , can be computed as

$G = 2^{1/s} -1.$

Value

A numeric vector with the Gini indices. A NA is returned when a shape parameter is non-numeric or smaller than 1.

Note

The Gini index of the Frechet distribution does not depend on its location and scale parameters and only is defined when its shape parameter is at least 1.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Examples

# Gini index for the Frechet distribution with a shape parameter 's = 1'.
gfrechet(shape = 1)

# Gini indices for the Frechet distribution and different shape parameters.
gfrechet(shape = 1:10)
# Gini index for the Frechet distribution with a shape parameter 's = 1'.
gfrechet(shape = 1)

# Gini indices for the Frechet distribution and different shape parameters.
gfrechet(shape = 1:10)

Gini index for the Gamma distribution with user-defined shape parameter

Description

Calculates the Gini indices for the Gamma distribution with shape parameters $\alpha$ .

Usage

ggamma(shape)
ggamma(shape)

Arguments

shape

A vector of positive real numbers specifying the shape parameters $\alpha$ of the Gamma distribution.

Details

The Gamma distribution with shape parameter $\alpha$ , scale parameter $\sigma$ and denoted as $Gamma(\alpha, \sigma)$ , where $\alpha>0$ and $\sigma>0$ , has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995)

$f(y) = \displaystyle \frac{1}{\sigma^{\alpha}\Gamma(\alpha)}y^{\alpha-1}e^{-y/\sigma},$

and a cumulative distribution function given by

$F(y) = \frac{\gamma\left(\alpha, \frac{y}{\sigma}\right)}{\Gamma(\alpha)},$

where $y \geq 0$ , the gamma function is defined by

$\Gamma(\alpha) = \int_{0}^{\infty}t^{\alpha-1}e^{-t}dt,$

and the lower incomplete gamma function is given by

$\gamma(\alpha,y) = \int_{0}^{y}t^{\alpha-1}e^{-t}dt.$

The Gini index can be computed as

$G = \displaystyle \frac{\Gamma\left(\frac{2\alpha+1}{2}\right)}{\alpha\Gamma(\alpha)\sqrt{\pi}}.$

The Gamma distribution is related to the Chi-squared distribution: $Gamma(n/2, 2) = \chi_{n}^2$ .

Value

A numeric vector with the Gini indices. A NA is returned when a shape parameter is non-numeric or non-positive.

Note

The Gini index of the Gamma distribution does not depend on its scale parameter.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Examples

# Gini index for the Gamma distribution with 'shape = 1'.
ggamma(shape = 1)

# Gini indices for the Gamma distribution and different shape parameters.
ggamma(shape = 1:10)
# Gini index for the Gamma distribution with 'shape = 1'.
ggamma(shape = 1)

# Gini indices for the Gamma distribution and different shape parameters.
ggamma(shape = 1:10)

Gini index for the Gompertz distribution with user-defined scale and shape parameters

Description

Calculate the Gini index for the Gompertz distribution with scale parameter $\beta$ and shape parameter $\alpha$ .

Usage

ggompertz(
 scale = 1,
 shape
)
ggompertz(
 scale = 1,
 shape
)

Arguments

`scale`	A positive real number specifying the scale parameter $\beta$ of the Gompertz distribution. The default value is `scale = 1`.
`shape`	A positive real number specifying the shape parameter $\alpha$ of the Gompertz distribution.

Details

The Gompertz distribution with scale parameter $\beta$ , shape parameter $\alpha$ and denoted as $Gompertz(\beta, \alpha)$ , where $\beta>0$ and $\alpha>0$ , has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Rodriguez, 1977; Yee, 2022)

$f(y)= \alpha e^{\beta y} \exp\left[ - \displaystyle \frac{\alpha}{\beta}\left(e^{\beta y} - 1 \right) \right],$

and a cumulative distribution function given by

$F(y)= 1 -\exp\left[ - \displaystyle \frac{\alpha}{\beta}\left(e^{\beta y} - 1 \right) \right],$

where $y \geq 0$ .

The Gini index can be computed as

$G = 2\left(0.5 - \displaystyle \frac{1}{E[y]}\int_{0}^{1}\int_{0}^{Q(y)}yf(y)dy\right),$

where $Q(y)$ is the quantile function of the Gompertz distribution, and $E[y]$ is the expectation of the distribution. If scale is not specified it assumes the default value of 1.

Value

A numeric value with the Gini index. A NA is returned when a parameter is non-numeric or non-positive.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.

Examples

# Gini index for the Gompertz distribution with 'scale = 1' and 'shape = 3'.
ggompertz(scale = 1, shape = 3)
# Gini index for the Gompertz distribution with 'scale = 1' and 'shape = 3'.
ggompertz(scale = 1, shape = 3)

Gini index for the Log Normal distribution with user-defined standard deviations

Description

Calculates the Gini indices for the Log Normal distribution with standard deviations $\sigma$ (sdlog).

Usage

glnorm(sdlog)
glnorm(sdlog)

Arguments

sdlog

A vector of positive real numbers specifying standard deviations $\sigma$ of the Log Normal distribution.

Details

The Log Normal distribution with mean $\mu$ , standard deviation $\sigma$ on the log scale (argument sdlog) and denoted as $logNormal(\mu, \sigma)$ , has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995)

$f(y)=\displaystyle \frac{1}{\sqrt{2\pi}\sigma y}\exp\left[- \frac{(\ln(x) - \mu)^2}{2\sigma^2} \right],$

and a cumulative distribution function given by

$F(y)=\displaystyle \Phi\left(\frac{\ln(x) - \mu}{\sigma}\right),$

where $y > 0$ and

$\Phi(y) = \frac{1}{\sqrt{2\pi}}\int_{-\infty}^{y} e^{-t^{2}/2}dt$

is the cumulative distribution function of a standard Normal distribution.

The Gini index can be computed as

$G = 2\Phi\left( \displaystyle \frac{\sigma}{\sqrt{2}}\right) - 1.$

Value

A numeric vector with the Gini indices. A NA is returned when a standard deviation is non-numeric or non-positive.

Note

The Gini index of the logNormal distribution does not depend on the mean parameter.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Examples

# Gini index for the Log Normal distribution with standard deviation 'sdlog = 2'.
glnorm(sdlog = 2)

# Gini indices for the Log Normal distribution with different standard deviations.
glnorm(sdlog = c(0.2, 0.5, 1:3))
# Gini index for the Log Normal distribution with standard deviation 'sdlog = 2'.
glnorm(sdlog = 2)

# Gini indices for the Log Normal distribution with different standard deviations.
glnorm(sdlog = c(0.2, 0.5, 1:3))

Gini index for the Pareto distribution with user-defined shape parameters

Description

Calculates the Gini indices for the Pareto distribution with shape parameters $\alpha$ .

Usage

gpareto(shape)
gpareto(shape)

Arguments

shape

A vector of positive real numbers specifying shape parameters $\alpha$ of the Pareto distribution.

Details

The Pareto distribution with scale parameter $k$ , shape parameter $\alpha$ and denoted as $Pareto(k, \alpha)$ , where $k>0$ and $\alpha>0$ , has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022)

$f(y)=\displaystyle \frac{\alpha k^{\alpha}}{y^{\alpha +1}},$

and a cumulative distribution function given by

$F(y) = \displaystyle 1 - \left(\frac{k}{y}\right)^{\alpha},$

where $y \geq k$ .

The Gini index can be computed as

$G = \left\{ \begin{array}{cl} 1 , & 0<\alpha <1; \\ \displaystyle \frac{1}{2\alpha-1}, & \alpha \geq 1. \end{array} \right.$

Value

A numeric vector with the Gini indices. A NA is returned when a shape parameter is non-numeric or non-positive.

Note

The Gini index of the Pareto distribution does not depend on the shape parameter.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.

Examples

# Gini index for the Pareto distribution with 'shape = 2'.
gpareto(shape = 2)

# Gini indices for the Pareto distribution and different shape parameters.
gpareto(shape = 1:5)
# Gini index for the Pareto distribution with 'shape = 2'.
gpareto(shape = 2)

# Gini indices for the Pareto distribution and different shape parameters.
gpareto(shape = 1:5)

Gini index for the Pareto (I) distribution with user-defined scale and shape parameters

Description

Calculate the Gini index for the Pareto (I) distribution with scale parameter $b$ and shape parameter $s$ .

Usage

gparetoI(
 scale = 1,
 shape = 1
)
gparetoI(
 scale = 1,
 shape = 1
)

Arguments

`scale`	A positive real number specifying the scale parameter $b$ of the Pareto (I) distribution. The default value is `scale = 1`.
`shape`	A positive real number specifying the shape parameter $s$ of the Pareto (I) distribution. The default value is `shape = 1`.

Details

The Pareto (I) distribution with scale parameter $b$ , shape parameter s and denoted as ParetoI(b,s), where $b>0$ and $s>0$ , has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022)

$f(y)= \displaystyle \frac{s}{b} \left(\frac{y}{b}\right)^{-(s+1)},$

and a cumulative distribution function given by

$F(y)=1 - \displaystyle \left(\frac{y}{b}\right)^{-s},$

where $y>b$ .

The Gini index can be computed as

$G = 2\left(0.5 - \displaystyle \frac{1}{E[y]}\int_{0}^{1}\int_{0}^{Q(y)}yf(y)dy\right),$

where $Q(y)$ is the quantile function of the Pareto (I) distribution, and $E[y]$ is the expectation of the distribution. If scale or shape are not specified they assume the default value of 1. The Pareto (I) distribution is related to the Pareto (IV) distribution: $ParetoI(b,s) = ParetoIV(b,b,1,s)$

Value

A numeric value with the Gini index. A NA is returned when a parameter is non-numeric or non-positive.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.

Examples

# Gini index for the Pareto (I) distribution with scale 'b = 1' and shape 's = 3'.
gparetoI(scale = 1, shape = 3)
# Gini index for the Pareto (I) distribution with scale 'b = 1' and shape 's = 3'.
gparetoI(scale = 1, shape = 3)

Gini index for the Pareto (II) distribution with user-defined location, scale and shape parameters

Description

Calculates the Gini index for the Pareto (II) distribution with location parameter $a$ , scale parameter $b$ and shape parameter $s$ .

Usage

gparetoII(
 location = 0,
 scale = 1,
 shape = 1
)
gparetoII(
 location = 0,
 scale = 1,
 shape = 1
)

Arguments

`location`	A positive real number specifying the location parameter $a$ of the Pareto (II) distribution. The default value is `location = 0`.
`scale`	A positive real number specifying the scale parameter $b$ of the Pareto (II) distribution. The default value is `scale = 1`.
`shape`	A positive real number specifying the shape parameter $s$ of the Pareto (II) distribution. The default value is `shape = 1`.

Details

The Pareto (II) distribution with location parameter $a$ , scale parameter $b$ , shape parameter $s$ and denoted as $ParetoII(a,b,s)$ , where $a \geq 0$ , $b>0$ and $s>0$ , has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022)

$f(y)= \displaystyle \frac{s}{b} \left[1 + \left( \frac{y-a}{b}\right)\right]^{-(s+1)},$

and a cumulative distribution function given by

$F(y)=1-\left(1 + \displaystyle \frac{y-a}{b} \right)^{-s},$

where $y>a$ .

The Gini index can be computed as

$G = 2\left(0.5 - \displaystyle \frac{1}{E[y]}\int_{0}^{1}\int_{0}^{Q(y)}yf(y)dy\right),$

where $Q(y)$ is the quantile function of the Pareto (II) distribution, and $E[y]$ is the expectation of the distribution. If location is not specified it assumes the default value of 0, and scale and shape assume the default value of 1. The Pareto (II) distribution is related to the Pareto (IV) distribution: $ParetoII(a,b,s) = ParetoIV(a,b,1,s)$ .

Value

A numeric value with the Gini index. A NA is returned when a parameter is non-numeric or positive, except the location parameter that can be equal to 0.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.

Examples

# Gini index for the Pareto (II) distribution with parameters 'a = 1', 'b = 1' and 's = 3'.
gparetoII(location = 1, scale = 1, shape = 3)
# Gini index for the Pareto (II) distribution with parameters 'a = 1', 'b = 1' and 's = 3'.
gparetoII(location = 1, scale = 1, shape = 3)

Gini index for the Pareto (III) distribution with user-defined inequality parameters

Description

Calculate the Gini index for the Pareto (III) distribution with inequality parameters $g$ .

Usage

gparetoIII(
 inequality = 1
)
gparetoIII(
 inequality = 1
)

Arguments

inequality

A vector of positive numbers in the $[0,1]$ interval specifying inequality parameters $g$ of the Pareto (III) distribution. The default value is inequality = 1.

Details

The Pareto (III) distribution with location parameter $a$ , scale parameter $b$ , inequality parameter g and denoted as $ParetoIII(a,b,g)$ , where $a>0$ , $b>0$ , and $g \in [0,1]$ , has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022)

$f(y)= \displaystyle \frac{1}{bg} \left( \frac{y-a}{b}\right)^{1/g-1} \left[1 + \left( \frac{y-a}{b}\right)^{1/g} \right]^{-2},$

and a cumulative distribution function given by

$F(y)=1-\left[1 + \displaystyle \left( \frac{y-a}{b}\right)^{1/g} \right]^{-1},$

where $y>a$ .

The Gini index is $G = g.$

If inequality is not specified it assumes the default value of 1. The Pareto (III) distribution is related to the Pareto (IV) distribution: $ParetoIII(a,b,g) = ParetoIV(a,b,g,1)$ .

Value

A numeric vector with the Gini indices. A NA is returned when a inequality parameter is non-numeric or it is out of the interval $[0,1]$ .

Note

The Gini index of the Pareto (III) distribution does not depend on its location and scale parameters.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.

Examples

# Gini index for the Pareto (III) distribution with inequality parameter 'g = 0.3'.
gparetoIII(inequality = 0.3)

# Gini indices for the Pareto (III) distribution with different inequality parameters.
gparetoIII(inequality = seq(0.1, 0.9, by=0.1))

# Gini index for the Pareto (III) distribution with inequality parameter 'g = 0.3'.
gparetoIII(inequality = 0.3)

# Gini indices for the Pareto (III) distribution with different inequality parameters.
gparetoIII(inequality = seq(0.1, 0.9, by=0.1))

Gini index for the Pareto (IV) distribution with user-defined location, scale, inequality and shape parameters

Description

Calculates the Gini index for the Pareto (IV) distribution with location parameter $a$ , scale parameter $b$ , inequality parameter $g$ and shape parameter $s$ .

Usage

gparetoIV(
 location = 0,
 scale = 1,
 inequality = 1,
 shape = 1
)
gparetoIV(
 location = 0,
 scale = 1,
 inequality = 1,
 shape = 1
)

Arguments

`location`	A non-negative real number specifying the location parameter $a$ of the Pareto (IV) distribution. The default value is `location = 0`.
`scale`	A positive real number specifying the scale parameter $b$ of the Pareto (IV) distribution. The default value is `scale = 1`.
`inequality`	A positive real number specifying the inequality parameter $g$ of the Pareto (IV) distribution. The default value is `inequality = 1`.
`shape`	A positive real number specifying the shape parameter $s$ of the Pareto (IV) distribution. The default value is `shape = 1`.

Details

The Pareto (IV) distribution with location parameter $a$ , scale parameter $b$ , inequality parameter $g$ , shape parameter $s$ and denoted as ParetoIV(a,b,g,s), where $a \geq 0$ , $b>0$ , $g>0$ and $s>0$ , has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022)

$f(y)= \displaystyle \frac{s}{bg} \left( \frac{y-a}{b}\right)^{1/g-1} \left[1 + \left( \frac{y-a}{b}\right)^{1/g} \right]^{-(s+1)},$

and a cumulative distribution function given by

$F(y)=1- \left[1 + \displaystyle \left( \frac{y-a}{b}\right)^{1/g} \right]^{-s},$

where $y>a$ .

The Gini index can be computed as

$G = 2\left(0.5 - \displaystyle \frac{1}{E[y]}\int_{0}^{1}\int_{0}^{Q(y)}yf(y)dy\right),$

where $Q(y)$ is the quantile function of the Pareto (IV) distribution, and $E[y]$ is the expectation of the distribution. If location is not specified it assumes the default value of 0, and the remaining parameters assume the default value of 1. The Pareto (IV) distribution is related to:

1. The Burr distribution: $ParetoIV(0,b,g,s) = BurrXII(b,1/g,s)$ .

2. The Pareto (I) distribution: $ParetoIV(b,b,1,s) = ParetoI(b,s)$ .

3. The Pareto (II) distribution: $ParetoIV(a,b,1,s) = ParetoII(a,b,s)$ .

4. The Pareto (III) distribution: $ParetoIV(a,b,g,1) = ParetoIII(a,b,g)$ .

Value

A numeric value with the Gini index. A NA is returned when a parameter is non-numeric or positive, except for the location parameter that can be equal to 0.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.

Examples

# Gini index for the Pareto (IV) distribution with 'a = 1', 'b = 1',  'g = 0.5', 's = 1'.
gparetoIV(location = 1, scale = 1, inequality = 0.5, shape = 1)

# Gini index for the Pareto (IV) distribution with 'a = 1', 'b = 1',  'g = 2', 's = 3'.
gparetoIV(location = 1, scale = 1, inequality = 2, shape = 3)
# Gini index for the Pareto (IV) distribution with 'a = 1', 'b = 1',  'g = 0.5', 's = 1'.
gparetoIV(location = 1, scale = 1, inequality = 0.5, shape = 1)

# Gini index for the Pareto (IV) distribution with 'a = 1', 'b = 1',  'g = 2', 's = 3'.
gparetoIV(location = 1, scale = 1, inequality = 2, shape = 3)

Samples from a set of continuous probability distributions with user-defined Gini indices

Description

Draws samples from a continuous probability distribution with Gini indices set by the user.

Usage

gsample(
  n,
  gini,
  distribution = c("pareto", "dagum", "lognormal", "fisk", "weibull", "gamma",
  "chisq", "frechet"),
  scale = 1,
  meanlog = 0,
  shape2.p = 1,
  location = 0
)
gsample(
  n,
  gini,
  distribution = c("pareto", "dagum", "lognormal", "fisk", "weibull", "gamma",
  "chisq", "frechet"),
  scale = 1,
  meanlog = 0,
  shape2.p = 1,
  location = 0
)

Arguments

`n`	An integer specifying the sample(s) size.
`gini`	A numeric vector of values between 0 and 1, indicating the Gini indices for the continuous distribution from which samples are generated.
`distribution`	A character string specifying the continuous probability distribution to be used to generate the sample. Possible values are `"pareto"`, `"dagum"`, `"lognormal"`, `"fisk"`, `"weibull"`, `"gamma"`, `"chisq"` and `"frechet"` for the Pareto, Dagum, logNormal, Fisk (Log-logistic), Weibull, Gamma, Chi-Squared and Frechet distributions, respectively.
`scale`	The scale parameter for the Pareto, Dagum, Fisk, Weibull, Gamma and Frechet distributions. The default value is `scale = 1`.
`meanlog`	The mean for the logNormal distribution on the log scale. The default value is `meanlog = 0`.
`shape2.p`	The scale parameter `p` for the Dagum distribution. The default value is `shape2.p = 1`.
`location`	The location parameter for the Frechet distribution. The default value is `location = 0`.

Details

For each continuous probability distribution, parameters involved in the theoretical formulation of the Gini index ( $G$ ) are selected such that $G$ takes the values set in the argument gini. Additional parameters required in the distribution can be set by the user, and default values are provided. scale is the scale parameter for the Pareto, Dagum, Fisk, Weibull, Gamma and Frechet distributions, meanlog is the mean for the Lognormal distribution on the log scale, shape2.p is the scale parameter p for the Dagum distribution, and location is the location parameter for the Frechet distribution. Additional information for the continuous probability distributions used by this function can be seen in Kleiber and Kotz (2003), Johnson et al. (1995) and Yee (2022).

Value

A numeric vector (or matrix of order $n$ $\times$ size( $gini$ )) with the samples by columns extracted from the continuous probability distribution stated in distribution and the Gini indices corresponding to the vector gini.

Note

Underestimation problems may appear for large heavy-tailed distributions (Pareto, Dagum, Lognormal, Fisk and Frechet) and large values of gini. A larger sample size may solve/minimize this problem.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.

Examples

# Sample from the Pareto distribution and parameter selected such that the Gini index is 0.3.
gsample(n = 10, gini = 0.3, "pareto")

# Samples from the Pareto distribution and gini indices 0.2 and 0.5.
gsample(n = 10, gini = c(0.2,0.5), "par", scale = 2)

# Samples from the Lognormal distribution and gini indices 0.2 and 0.5.
gsample(n = 10, gini = c(0.2,0.5), "lognormal", meanlog = 5)

# Samples from the Dagum distribution and gini indices 0.2 and 0.5.
gsample(n = 10, gini = c(0.2,0.5), "dagum")

# Samples from the Fisk (Log-logistic) distribution and gini indices 0.3 and 0.6.
gsample(n = 10, gini = c(0.3,0.6), "fisk")

# Sample from the Weibull distribution and parameter selected such that the Gini index is 0.2.
gsample(n = 10, gini = 0.2, "weibull")

# Sample from the Gamma distribution and parameter selected such that the Gini index is 0.3.
gsample(n = 10, gini = 0.2, "gamma")

# Samples from the Chi-Squared distribution and gini indices 0.3 and 0.6..
gsample(n = 10, gini = c(0.3,0.6), "chi")

# Samples from the Frechet distribution and gini indices 0.3 and 0.6.
gsample(n = 10, gini = c(0.3,0.6), "fre")
# Sample from the Pareto distribution and parameter selected such that the Gini index is 0.3.
gsample(n = 10, gini = 0.3, "pareto")

# Samples from the Pareto distribution and gini indices 0.2 and 0.5.
gsample(n = 10, gini = c(0.2,0.5), "par", scale = 2)

# Samples from the Lognormal distribution and gini indices 0.2 and 0.5.
gsample(n = 10, gini = c(0.2,0.5), "lognormal", meanlog = 5)

# Samples from the Dagum distribution and gini indices 0.2 and 0.5.
gsample(n = 10, gini = c(0.2,0.5), "dagum")

# Samples from the Fisk (Log-logistic) distribution and gini indices 0.3 and 0.6.
gsample(n = 10, gini = c(0.3,0.6), "fisk")

# Sample from the Weibull distribution and parameter selected such that the Gini index is 0.2.
gsample(n = 10, gini = 0.2, "weibull")

# Sample from the Gamma distribution and parameter selected such that the Gini index is 0.3.
gsample(n = 10, gini = 0.2, "gamma")

# Samples from the Chi-Squared distribution and gini indices 0.3 and 0.6..
gsample(n = 10, gini = c(0.3,0.6), "chi")

# Samples from the Frechet distribution and gini indices 0.3 and 0.6.
gsample(n = 10, gini = c(0.3,0.6), "fre")

Gini index for the Uniform distribution with user-defined lower and upper limits

Description

Calculates the Gini index for the Uniform distribution with lower limit min and upper limit max.

Usage

gunif(
 min = 0,
 max = 1
)
gunif(
 min = 0,
 max = 1
)

Arguments

`min`	A non-negative real number specifying the lower limit of the Uniform distribution. The default value is `min = 0`.
`max`	A positive real number higher than `min` specifying the upper limit of the Uniform distribution. The default value is `max = 1`.

Details

The Uniform distribution with lower and upper limits $min$ and $max$ , and denoted as $U(min,max)$ , where $\min \geq 0$ , $\max >0$ , $\min < \max$ and both must be finite, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022)

$f(y)= \displaystyle \frac{1}{\max - \min},$

where $y \in [\min, \max]$ . The cumulative distribution function is given by

$F(y) = \left\{ \begin{array}{cl} 0 , & y < \min; \\ \displaystyle \frac{y-\min}{\max - \min}, & y \in [\min, \max]; \\ 1 , & y > \max. \end{array} \right.$

The Gini index can be computed as

$G = \displaystyle \frac{\max - \min}{3(\min + \max)}.$

If min or max are not specified they assume the default values of 0 and 1, respectively.

Value

A numeric value with the Gini index. A NA value is returned when a limit is non-numeric or non-negative, or $\min \geq \max$ .

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.

Examples

# Gini index for the Uniform distribution with lower limit 0 and upper limit 1.
gunif()

# Gini index for the Uniform distribution with lower limit 10 and upper limit 190.
gunif(min = 10, max = 190)


# Gini index for the Uniform distribution with lower limit 0 and upper limit 1.
gunif()

# Gini index for the Uniform distribution with lower limit 10 and upper limit 190.
gunif(min = 10, max = 190)

Gini index for the Weibull distribution with user-defined shape parameters

Description

Calculate the Gini indices for the Weibull distribution with shape parameters $a$ .

Usage

gweibull(shape)
gweibull(shape)

Arguments

shape

A vector of positive real numbers specifying shape parameters $a$ of the Weibull distribution.

Details

The Weibull distribution with scale parameter $\sigma$ , shape parameter $a$ , and denoted as $Weibull(\sigma, a)$ , where $\sigma>0$ and $a>0$ , has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022)

$f(y) = \displaystyle \frac{a}{\sigma}\left(\frac{y}{\sigma}\right)^{a-1}e^{-(y/\sigma)^{a}},$

and a cumulative distribution function given by

$F(y) = \displaystyle 1 - e^{-(y/\sigma)^{a}},$

where $y \geq 0$ .

The Gini index can be computed as

$G = 1-2^{-1/a}.$

Value

A numeric vector with the Gini indices. A NA is returned when a shape parameter is non-numeric or non-positive.

Note

The Gini index of the Weibull distribution does not depend on its scale parameter.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Examples

# Gini index for the Weibull distribution with 'shape = 1'.
gweibull(shape = 1)

# Gini indices for the Weibull distribution and different shape parameters.
gweibull(shape = 1:10)
# Gini index for the Weibull distribution with 'shape = 1'.
gweibull(shape = 1)

# Gini indices for the Weibull distribution and different shape parameters.
gweibull(shape = 1:10)

Comparisons of variance estimators and confidence intervals for the Gini index in infinite populations

Description

Compares variance estimates and confidence intervals for the Gini index in infinite populations.

Usage

icompareCI(
 y,
 B = 1000L,
 alpha = 0.05,
 plotCI = TRUE,
 digitsgini = 2L,
 digitsvar = 4L,
 cum.sums = NULL,
 na.rm = TRUE,
 precisionEL = 1e-4,
 maxiterEL = 100L,
 line.types = c(1L, 2L),
 colors = c("red", "green"),
 save.plot = FALSE
)
icompareCI(
 y,
 B = 1000L,
 alpha = 0.05,
 plotCI = TRUE,
 digitsgini = 2L,
 digitsvar = 4L,
 cum.sums = NULL,
 na.rm = TRUE,
 precisionEL = 1e-4,
 maxiterEL = 100L,
 line.types = c(1L, 2L),
 colors = c("red", "green"),
 save.plot = FALSE
)

Arguments

`y`	A vector with the non-negative real numbers to be used for estimating the Gini index. This argument can be missing if argument `cum.sums` is provided.
`B`	A single integer specifying the number of bootstrap replicates. The default value is `B = 1000L`.
`alpha`	A single numeric value between 0 and 1 specifying the confidence level 1-`alpha` to be used for computing the confidence interval for the Gini. Some authors call `alpha` the significance level. The default value is `alpha = 0.05`.
`plotCI`	A 'TRUE/FALSE' logical value indicating whether confidence intervals are compared using a plot. The default value is `plotCI = TRUE`.
`digitsgini`	A single integer specifying the number of decimals used in the estimation of the Gini index and confidence intervals. The default value is `digitsgini = 2L`.
`digitsvar`	A single integer specifying the number of decimals used in the variance estimation of the Gini index. The default value is `digitsvar = 4L`.
`cum.sums`	A numeric vector of non-negative real numbers specifying the cumulative sums of the variable used to estimate the Gini index. This argument can be `NULL` if argument `y` is provided. The default value is `cum.sums = NULL`.
`na.rm`	A 'TRUE/FALSE' logical value indicating whether the `NA` should be removed before the computation proceeds. The default value is `na.rm = TRUE`.
`precisionEL`	A single numeric value specifying the precision for the confidence interval based on the empirical likelihood method. The default value is `precisionEL = 1e-4`, i.e., limits of the confidence interval have a total of 4 decimal places.
`maxiterEL`	A single integer specifying the maximum number of iterations allowed for the convergence in the empirical likelihood method. The default value is `maxiterEL = 100L`.
`line.types`	A numeric vector with length equal 2 specifying the line types. See the function `plot` for the different line types. The default value is `lty = c(1L,2L)`.
`colors`	A numeric vector with length equal 2 specifying the colors for lines of the plot. The default value is `colors = c("red", "green")`.
`save.plot`	A 'TRUE/FALSE' logical value indicating whether the ggplot object of the plot comparing the confidence intervals should be saved in the output. The default value is `save.plot = FALSE`.

Details

For a sample $S$ , with size $n$ , derived from an infinite population, the Gini index is estimated by two different versions (see Muñoz et al., 2023 for more details):

$\widehat{G} = \displaystyle \frac{2}{\overline{y}n^{2}}\sum_{i \in S}iy_{(i)} - \frac{n+1}{n};$

$\widehat{G}^{bc} = \displaystyle \frac{2}{\overline{y}n(n-1)}\sum_{i \in S}iy_{(i)} - \frac{n+1}{n-1},$

where the label $bc$ indicates that the bias correction is applied. The table below sumarises the various types of variances and confidence intervals that computes this function. Methods based on the jackknife technique use the fast algorithm suggested by Ogwang (2000). The linearization technique for variance estimation (Deville, 1999) has been applied to the following estimators of the Gini index (Berger, 2008; Langel and Tille, 2013):

$\widehat{G}^{a} = \displaystyle \frac{1}{2\overline{y}n^{2}}\sum_{i \in S}\sum_{j\in S} |y_i-y_j|$

and

$\widehat{G}^{b} = \displaystyle \frac{2}{\overline{y}n}\sum_{i \in S}y_{i}\widehat{F}_{n}(y_{i}) - 1,$

where

$\widehat{F}_{n}(y_i)=\frac{1}{n}\sum_{j \in S}\delta(y_j \leq y_i).$

zalinearization and zblinearization linearizate, respectively, the estimators $\widehat{G}^{a}$ and $\widehat{G}^{b}$ . The percentile bootstrap (see Qin et al., 2010) is computed using pbootstrap. Bca is the bias corrected bootstrap confidence interval (Efron and Tibshirani, 1993). ELchisq and ELboot are the confidence intervals based on the empirical likelihood method. The vignette vignette("GiniVarInterval") contains a detailed description of the various methods for variance estimation and confidence intervals for the Gini index.

Interval	Variance	Critical values	References
_______________	____________	__________________	__________________________
`zjackknife`	Jackknife	Normal	Berger (2008)
`tjackknife`	Jackknife	Studentized bootstrap	Biewen (2002); Berger (2008)
`zalinearization`	Linearization	Normal	Langel and Tille (2013)
`zblinearization`	Linearization	Normal	Berger (2008)
`talinearization`	Linearization	Studentized bootstrap	Langel and Tille (2013)
`tblinearization`	Linearization	Studentized bootstrap	Biewen (2002); Berger (2008)
`pBootstrap`	Bootstrap	Percentile bootstrap	Qin et al. (2010)
`BCa`	Bootstrap	BCa bootstrap	Davison and Hinkley (1997)
`ELchisq`	Linearization	Chi-Squared	Qin et al. (2010)
`ELboot`	Bootstrap	Percentile bootstrap	Qin et al. (2010)

Value

If save.plot = FALSE, a data frame with columns:

interval. The method used to construct the confidence interval.
bc. A 'TRUE/FALSE' logical value indicating whether the bias correction is applied.
gini. The estimation of the Gini index.
lowerlimit. The lower limit of the confidence interval.
upperlimit. The upper limit of the confidence interval.
var.gini. The variance estimation for the estimator of the Gini index.

If save.plot = TRUE, a list with two components: (i) 'base.CI' a data frame of six columns as just described and (ii) 'plot' a (ggplot) description of the plot, which is a list with components that contain the plot itself, the data, information about the scales, panels, etc. As a side-effect, a plot that compares the various methods for constructing confidence intervals for the Gini index is displayed. **ggplot2** is needed to be installed for this option to work.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Berger, Y. G. (2008). A note on the asymptotic equivalence of jackknife and linearization variance estimation for the Gini Coefficient. Journal of Official Statistics, 24(4), 541-555.

Biewen, M. (2002). Bootstrap inference for inequality, mobility and poverty measurement. Journal of Econometrics, 108(2), 317-342.

Davison, A. C., and Hinkley, D. V. (1997). Bootstrap Methods and Their Application (Cambridge Series in Statistical and Probabilistic Mathematics, No 1)–Cambridge University Press.

Deville, J.C. (1999). Variance Estimation for Complex Statistics and Estimators: Linearization and Residual Techniques. Survey Methodology, 25, 193–203.

Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman and Hall, New York, London.

Ogwang, T. (2000). A convenient method of computing the Gini index and its standard error. Oxford Bulletin of Economics and Statistics, 62(1), 123-123.

Qin, Y., Rao, J. N. K., and Wu, C. (2010). Empirical likelihood confidence intervals for the Gini measure of income inequality. Economic Modelling, 27(6), 1429-1435.

Examples

# Sample, with size 50, from a Lognormal distribution. The true Gini index is 0.5.
set.seed(123)
y <- gsample(n = 50, gini = 0.5, distribution = "lognormal")

# Estimation of the Gini index and confidence intervals using different methods.
icompareCI(y)
# Sample, with size 50, from a Lognormal distribution. The true Gini index is 0.5.
set.seed(123)
y <- gsample(n = 50, gini = 0.5, distribution = "lognormal")

# Estimation of the Gini index and confidence intervals using different methods.
icompareCI(y)

Gini index, variances and confidence intervals in infinite populations

Description

Estimation of the Gini index and computation of variances and confidence interval for infinite populations.

Usage

igini(
  y,
  bias.correction = TRUE,
  interval = NULL,
  B = 1000L,
  alpha = 0.05,
  cum.sums = NULL,
  na.rm = TRUE,
  precisionEL = 1e-04,
  maxiterEL = 100L,
  large.sample = FALSE
)
igini(
  y,
  bias.correction = TRUE,
  interval = NULL,
  B = 1000L,
  alpha = 0.05,
  cum.sums = NULL,
  na.rm = TRUE,
  precisionEL = 1e-04,
  maxiterEL = 100L,
  large.sample = FALSE
)

Arguments

`y`	A vector with the non-negative real numbers to be used for estimating the Gini index. This argument can be missing if argument `cum.sums` is provided.
`bias.correction`	A 'TRUE/FALSE' logical value indicating whether the bias correction should be applied to the estimation of the Gini index. The default value is `bias.correction = TRUE`.
`interval`	A character string specifying the type of variance estimation and confidence interval to be used, or `NULL` (the default value) to omit the computation of both variance and confidence interval. Possible values are `"zjackknife"`, `"tjackknife"`, `"zalinearization"`, `"zblinearization"`, `"talinearization"`, `"tblinearization"`, `"pbootstrap"`, `"BCa"`, `"ELchisq"` and `"ELboot"`. The default value is `interval = NULL`.
`B`	A single integer specifying the number of bootstrap replicates. The default value is `B = 1000L`.
`alpha`	A single numeric value between 0 and 1. If `interval` is not `NULL`, the confidence level to be used for computing the confidence interval for the Gini is `1-alpha`. Some authors call `alpha` the significance level. The default value is `alpha = 0.05`.
`cum.sums`	A vector with the non-negative real numbers specifying the cumulative sums of the variable used to estimate the Gini index. This argument can be `NULL` if argument `y` is provided. The default value is `cum.sums = NULL`.
`na.rm`	A 'TRUE/FALSE' logical value indicating whether `NA`'s should be removed before the computation proceeds. The default value is `na.rm = TRUE`.
`precisionEL`	A single numeric value specifying the precision for the confidence interval based on the empirical likelihood method. The default value is `precisionEL = 1e-4`, i.e., limits of the confidence interval have a total of 4 decimal places.
`maxiterEL`	A single integer specifying the maximal number of iterations allowed for the convergene of the empirical likelihood method. The default value is `maxiterEL = 100L`.
`large.sample`	A 'TRUE/FALSE' logical value indicating whether the sample is large to apply a faster algorithm to sort the sample values. The default value is `large.sample = FALSE`.

Details

For a sample $S$ , with size $n$ , derived from an infinite population, the Gini index is estimated by

$\widehat{G} = \displaystyle \frac{2}{\overline{y}n^{2}}\sum_{i \in S}iy_{(i)} - \frac{n+1}{n}$

when bias.correction = FALSE, and by

$\widehat{G}^{bc} = \displaystyle \frac{2}{\overline{y}n(n-1)}\sum_{i \in S}iy_{(i)} - \frac{n+1}{n-1}$

when bias.correction = TRUE. For more details, see Muñoz et al. (2023). The table below sumarises the various types of variances and confidence intervals that computes this function. Methods based on the jackknife technique use the fast algorithm suggested by Ogwang (2000). The linearization technique for variance estimation (Deville, 1999) has been applied to the following estimators of the Gini index (Berger, 2008; Langel and Tille, 2013):

$\widehat{G}^{a} = \displaystyle \frac{1}{2\overline{y}n^{2}}\sum_{i \in S}\sum_{j\in S} |y_i-y_j|$

and

$\widehat{G}^{b} = \displaystyle \frac{2}{\overline{y}n}\sum_{i \in S}y_{i}\widehat{F}_{n}(y_{i}) - 1,$

where

$\widehat{F}_{n}(y_i)=\frac{1}{n}\sum_{j \in S}\delta(y_j \leq y_i).$

Interval	Variance	Critical values	References
_______________	____________	__________________	__________________________
`zjackknife`	Jackknife	Normal	Berger (2008)
`tjackknife`	Jackknife	Studentized bootstrap	Biewen (2002); Berger (2008)
`zalinearization`	Linearization	Normal	Langel and Tille (2013)
`zblinearization`	Linearization	Normal	Berger (2008)
`talinearization`	Linearization	Studentized bootstrap	Langel and Tille (2013)
`tblinearization`	Linearization	Studentized bootstrap	Biewen (2002); Berger (2008)
`pBootstrap`	Bootstrap	Percentile bootstrap	Qin et al. (2010)
`BCa`	Bootstrap	BCa bootstrap	Davison and Hinkley (1997)
`ELchisq`	Linearization	Chi-Squared	Qin et al. (2010)
`ELboot`	Bootstrap	Percentile bootstrap	Qin et al. (2010)

Value

When interval = NULL, a single numeric value between 0 and 1, containing the estimation of the Gini index based on the vector y or the vector cum.sums. When interval is not NULL, a list of 3 components: a single numeric value with the estimation of the Gini index; a single numeric value with the variance estimation of the Gini index; and a numeric matrix with 1 row and 2 columns containing the lower and upper limits of the confidence intervals for the Gini index.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Berger, Y. G. (2008). A note on the asymptotic equivalence of jackknife and linearization variance estimation for the Gini Coefficient. Journal of Official Statistics, 24(4), 541-555.

Biewen, M. (2002). Bootstrap inference for inequality, mobility and poverty measurement. Journal of Econometrics, 108(2), 317-342.

Davison, A. C., and Hinkley, D. V. (1997). Bootstrap Methods and Their Application (Cambridge Series in Statistical and Probabilistic Mathematics, No 1)–Cambridge University Press.

Deville, J.C. (1999). Variance Estimation for Complex Statistics and Estimators: Linearization and Residual Techniques. Survey Methodology, 25, 193–203.

Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman and Hall, New York, London.

Ogwang, T. (2000). A convenient method of computing the Gini index and its standard error. Oxford Bulletin of Economics and Statistics, 62(1), 123-123.

Qin, Y., Rao, J. N. K., and Wu, C. (2010). Empirical likelihood confidence intervals for the Gini measure of income inequality. Economic Modelling, 27(6), 1429-1435.

Examples

# Sample, with size 50, from a Lognormal distribution. The true Gini index is 0.5.
set.seed(123)
y <- gsample(n = 50, gini = 0.5, distribution = "lognormal")

# Bias corrected estimation of the Gini index.
igini(y)

# Estimation of the Gini index and confidence interval based on jackknife and studentized bootstrap.
igini(y, interval = "tjackknife")


# Sample, with size 50, from a Lognormal distribution. The true Gini index is 0.5.
set.seed(123)
y <- gsample(n = 50, gini = 0.5, distribution = "lognormal")

# Bias corrected estimation of the Gini index.
igini(y)

# Estimation of the Gini index and confidence interval based on jackknife and studentized bootstrap.
igini(y, interval = "tjackknife")

Gini index for infinite populations and different estimation methods.

Description

Estimates the Gini index in infinite populations, using different methods.

Usage

iginindex(
  y,
  method = 5L,
  bias.correction = TRUE,
  cum.sums = NULL,
  na.rm = TRUE,
  useRcpp = TRUE
)
iginindex(
  y,
  method = 5L,
  bias.correction = TRUE,
  cum.sums = NULL,
  na.rm = TRUE,
  useRcpp = TRUE
)

Arguments

`y`	A vector with the non-negative real numbers to be used for estimating the Gini index. This argument can be missing if argument `cum.sums` is provided.
`method`	An integer between 1 and 10 selecting one of the 10 methods detailed below for estimating the Gini index in infinite populations. The default method is `method = 5L`.
`bias.correction`	A 'TRUE/FALSE' logical value indicating whether the bias correction should be applied to the estimation of the Gini index. The default value is `bias.correction = TRUE`.
`cum.sums`	A vector with the non-negative real numbers specifying the cumulative sums of the variable used to estimate the Gini index. This argument can be `NULL` if argument `y` is provided. The default value is `cum.sums = NULL`.
`na.rm`	A 'TRUE/FALSE' logical value indicating whether `NA`'s should be removed before the computation proceeds. The default value is `na.rm = TRUE`.
`useRcpp`	A 'TRUE/FALSE' logical value indicating whether `Rcpp` (`useRcpp = TRUE`) or `R` (`useRcpp = FALSE`) is used for computation. The default value is `UseRcpp = TRUE`.

Details

For a sample $S$ , with size $n$ , derived from an infinite population, different formulations of the Gini index have been proposed in the literature, but they only provide two different outputs.

This function estimates the Gini index using the various formulations, and both R and ⁠C++⁠ codes are implemented. This can be useful for research purposes, and speed comparisons can be made. The argument cum.sums does not require that the cumulative sums are based on the non-decreasing order of the variable y.

The different methods for estimating the Gini index are (see Wang et al., 2016; Giorgi and Gigliarano, 2017; Mukhopadhyay and Sengupta, 2021; Muñoz et al., 2023):

method = 1

$\widehat{G}_1 = \displaystyle \frac{1}{2\overline{y}n^{2}}\sum_{i \in S}\sum_{j\in S} |y_i-y_j|;$

$\widehat{G}_{1}^{bc} = \displaystyle \frac{1}{2\overline{y}n(n-1)}\sum_{i \in S} \sum_{j \in S} |y_i-y_j|,$

where $\overline{y} = n^{-1}\sum_{i \in S}y_i$ is the sample mean and the label $bc$ indicates that the bias correction is applied to the estimation of the Gini index.

method = 2

$\widehat{G}_{2} = \displaystyle \frac{n-1}{n}\frac{\sum_{i=1}^{n-1}(p_i-q_i)}{\sum_{i=1}^{n-1}pi};$

$\widehat{G}_{2}^{bc} = \displaystyle \frac{\sum_{i=1}^{n-1}(p_i-q_i)}{\sum_{i=1}^{n-1}pi},$

where

$p_i= \displaystyle \frac{i}{n}; \quad q_i= \frac{y_{i}^{+}}{y_{n}^{+}},$

and $y_{i}^{+}=\sum_{j=1}^{i}y_{(j)}$ , with $i=\{1,\ldots,n\}$ , are the cumulative sums of the ordered values $y_{(i)}$ (in non-decreasing order) of the variable of interest $y$ .

method = 3

$\widehat{G}_{3} = \displaystyle \frac{n-1}{n} - \frac{2}{n}\sum_{i=1}^{n-1}q_i;$

$\widehat{G}_{3}^{bc} = 1 - \displaystyle \frac{2}{n-1}\sum_{i=1}^{n-1}q_i.$

method = 4

$\widehat{G}_{4} = 1 - \displaystyle \sum_{i=0}^{n-1}(q_{i+1} + q_i)(p_{i+1} - p_i);$

$\widehat{G}_{4}^{bc} = \displaystyle \frac{n}{n-1}\left[1 - \sum_{i=0}^{n-1}(q_{i+1} + q_i)(p_{i+1} - p_i)\right],$

where $p_0=q_0=0.$

method = 5

$\widehat{G}_{5} = \displaystyle \frac{2}{\overline{y}n^{2}}\sum_{i \in S}iy_{(i)} - \frac{n+1}{n};$

$\widehat{G}_{5}^{bc} = \displaystyle \frac{2}{\overline{y}n(n-1)}\sum_{i \in S}iy_{(i)} - \frac{n+1}{n-1}.$

method = 6

$\widehat{G}_{6} = \displaystyle \frac{2}{\overline{y}n}cov(i,y_{(i)});$

$\widehat{G}_{6}^{bc} = \displaystyle \frac{2}{\overline{y}(n-1)}cov(i,y_{(i)}).$

method = 7

$\widehat{G}_{7} = \displaystyle \frac{1}{\overline{y}n^2}\sum_{i \in S}\sum_{j\in S}|y_i-y_j|\cdot |\widehat{F}_{n}^{\ast}(y_{i})-\widehat{F}_{n}^{\ast}(y_{j})|;$

$\widehat{G}_{7}^{bc} = \displaystyle \frac{1}{\overline{y}n(n-1)}\sum_{i\in S}\sum_{j \in S}|y_i-y_j|\cdot |\widehat{F}_{n}^{\ast}(y_{i})-\widehat{F}_{n}^{\ast}(y_{j})|,$

where

$\widehat{F}_{n}^{\ast}(t)= \displaystyle \frac{1}{n}\sum_{i \in S}[\delta(y_i < t) + 0.5\delta(y_i = t)]$

is the smooth (mid-point) distribution function.

method = 8

$\widehat{G}_{8} = 1 - \displaystyle \frac{1}{\overline{y}n^2}\sum_{i \in S}\sum_{j \in S}min(y_i,y_j);$

$\widehat{G}_{8}^{bc} = 1 - \displaystyle \frac{1}{\overline{y}n(n-1)}\sum_{i \in S}\sum_{\substack{j \in S\\ j\neq i} }min(y_i,y_j).$

method = 9

$\widehat{G}_{9} = \displaystyle \frac{2}{\overline{y}n}\sum_{i \in S}y_{i}\widehat{F}_{n}^{\ast}(y_{i}) - 1;$

$\widehat{G}_{9}^{bc} = \displaystyle \frac{2}{\overline{y}(n-1)}\sum_{i \in S}y_{i}\widehat{F}_{n}^{\ast}(y_{i}) - \frac{n}{n-1}.$

method = 10

$\widehat{G}_{10} = \displaystyle \frac{n-1}{2\overline{y}n}\binom{n}{2}^{-1}\sum_{i \leq i_{1} < i_{2} \leq n}|y_{i_{1}}-y_{i_{2}}|;$

$\widehat{G}_{10}^{bc} = \displaystyle \frac{1}{2\overline{y}}\binom{n}{2}^{-1}\sum_{i \leq i_{1} < i_{2} \leq n}|y_{i_{1}}-y_{i_{2}}|.$

Value

A single numeric value between 0 and 1 containing the estimation of the Gini index based on the vector y or the vector cum.sums.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Giorgi, G. M., and Gigliarano, C. (2017). The Gini concentration index: a review of the inference literature. Journal of Economic Surveys, 31(4), 1130-1148.

Mukhopadhyay, N., and Sengupta, P. P. (Eds.). (2021). Gini inequality index: Methods and applications. CRC press.

Wang, D., Zhao, Y., and Gilmore, D. W. (2016). Jackknife empirical likelihood confidence interval for the Gini index. Statistics & Probability Letters, 110, 289-295.

Examples

# Sample, with size 50, from a Lognormal distribution. The true Gini index is 0.5.
set.seed(123)
y <- gsample(n = 50, gini = 0.5, meanlog = 5)

# Estimation of the Gini index using the method = 5, bias correction, and Rcpp.
iginindex(y)

# Estimation of the Gini index using the method = 5, bias correction, and R.
iginindex(y, useRcpp = FALSE)

#Comparing the computation time for the various estimation methods and using R
microbenchmark::microbenchmark(
iginindex(y, method = 1,  useRcpp = FALSE),
iginindex(y, method = 2,  useRcpp = FALSE),
iginindex(y, method = 3,  useRcpp = FALSE),
iginindex(y, method = 4,  useRcpp = FALSE),
iginindex(y, method = 5,  useRcpp = FALSE),
iginindex(y, method = 6,  useRcpp = FALSE),
iginindex(y, method = 7,  useRcpp = FALSE),
iginindex(y, method = 8,  useRcpp = FALSE),
iginindex(y, method = 9,  useRcpp = FALSE),
iginindex(y, method = 10, useRcpp = FALSE)
)

# Comparing the computation time for the various estimation methods and using Rcpp
microbenchmark::microbenchmark(
iginindex(y, method = 1),
iginindex(y, method = 2),
iginindex(y, method = 3),
iginindex(y, method = 4),
iginindex(y, method = 5),
iginindex(y, method = 6),
iginindex(y, method = 7),
iginindex(y, method = 8),
iginindex(y, method = 9),
iginindex(y, method = 10) )

# Sample, with size 50, from a Lognormal distribution. The true Gini index is 0.5.
set.seed(123)
y <- gsample(n = 50, gini = 0.5, meanlog = 5)

# Estimation of the Gini index using the method = 5, bias correction, and Rcpp.
iginindex(y)

# Estimation of the Gini index using the method = 5, bias correction, and R.
iginindex(y, useRcpp = FALSE)

#Comparing the computation time for the various estimation methods and using R
microbenchmark::microbenchmark(
iginindex(y, method = 1,  useRcpp = FALSE),
iginindex(y, method = 2,  useRcpp = FALSE),
iginindex(y, method = 3,  useRcpp = FALSE),
iginindex(y, method = 4,  useRcpp = FALSE),
iginindex(y, method = 5,  useRcpp = FALSE),
iginindex(y, method = 6,  useRcpp = FALSE),
iginindex(y, method = 7,  useRcpp = FALSE),
iginindex(y, method = 8,  useRcpp = FALSE),
iginindex(y, method = 9,  useRcpp = FALSE),
iginindex(y, method = 10, useRcpp = FALSE)
)

# Comparing the computation time for the various estimation methods and using Rcpp
microbenchmark::microbenchmark(
iginindex(y, method = 1),
iginindex(y, method = 2),
iginindex(y, method = 3),
iginindex(y, method = 4),
iginindex(y, method = 5),
iginindex(y, method = 6),
iginindex(y, method = 7),
iginindex(y, method = 8),
iginindex(y, method = 9),
iginindex(y, method = 10) )

Package 'giniVarCI'

Help Index

Comparisons of variance estimates and confidence intervals for the Gini index in finite populations

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Gini index, variances and confidence intervals in finite populations

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Gini index for finite populations and different estimation methods.

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Gini index for the Beta distribution with user-defined shape parameters

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Gini index for the Burr Type XII (Singh-Maddala) distribution with user-defined scale and shape parameters

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Gini index for the Chi-Squared distribution with user-defined degrees of freedom

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Gini index for the Dagum distribution with user-defined shape parameters

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Gini index for the F distribution with user-defined degrees of freedom

Description

Usage

Arguments

Details

Value

Author(s)