Package 'giniVarCI'

Title: Gini Indices, Variances and Confidence Intervals for Finite and Infinite Populations
Description: Estimates the Gini index and computes variances and confidence intervals for finite and infinite populations, using different methods; also computes Gini index for continuous probability distributions, draws samples from continuous probability distributions with Gini indices set by the user; uses 'Rcpp'. References: Muñoz et al. (2023) <doi:10.1177/00491241231176847>. Álvarez et al. (2021) <doi:10.3390/math9243252>. Giorgi and Gigliarano (2017) <doi:10.1111/joes.12185>. Langel and Tillé (2013) <doi:10.1111/j.1467-985X.2012.01048.x>.
Authors: Juan Francisco Muñoz [aut, cre] , Jose M. Pavía [aut] , Encarnación Álvarez Verdejo [aut] , MCIN-AEI and ERDF. Reference PID2022-136235NB-I00 [fnd]
Maintainer: Juan Francisco Muñoz <[email protected]>
License: GPL
Version: 0.0.1-3
Built: 2025-02-03 03:55:39 UTC
Source: https://github.com/cran/giniVarCI

Help Index


Comparisons of variance estimates and confidence intervals for the Gini index in finite populations

Description

Compares variance estimates and confidence intervals for the Gini index in finite populations.

Usage

fcompareCI(
  y,
  w,
  Pi = NULL,
  Pij = NULL,
  PiU,
  alpha = 0.05,
  B = 1000L,
  digitsgini = 2L,
  digitsvar = 4L,
  na.rm = TRUE,
  plotCI = TRUE,
  line.types = c(1L, 2L, 4L),
  colors = c("red", "green", "blue"),
  shapes = c(8L, 4L, 3L),
  save.plot = FALSE,
  large.sample = FALSE)

Arguments

y

A vector with the non-negative real numbers to be used for estimating the Gini index.

w

A numeric vector with the survey weights to be used for estimating the Gini index, the variance estimation and the confidence interval. This argument can be missing if argument Pi is provided.

Pi

A numeric vector with the (sample) first inclusion probabilites to be used for estimating the Gini index, the variance estimation and the confidence interval. This argument can be NULL if argument w is provided. The default value is Pi = NULL.

Pij

A numeric square matrix with the (sample) second (joint) inclusion probabilites to be used for the variance estimation and the confidence interval. The Hajek approximation is used when Pij = NULL. This argument is used by the intervals "zjackknife", "zalinearization" and "zblinearization". The default value is Pij = NULL.

PiU

A numeric vector with the (population) first inclusion probabilites. The Hartley-Rao (HR) expression for the variance estimation is also computed if this argument is provided.

alpha

A single numeric value between 0 and 1 specifying the confidence level 1-alpha to be used for computing the confidence interval for the Gini index. Some authors call alpha the significance level. The default value is alpha = 0.05.

B

A single integer specifying the number of bootstrap replicates. The default value is B = 1000L.

digitsgini

A single integer specifying the number of decimals used in the estimation of the Gini index and confidence intervals. The default value is digitsgini = 2L.

digitsvar

A single integer specifying the number of decimals used in the variance estimation of the Gini index. The default value is digitsvar = 4L.

na.rm

A 'TRUE/FALSE' logical value indicating whether NA values should be removed before the computation proceeds. The default value is na.rm = TRUE.

plotCI

A 'TRUE/FALSE' logical value indicating whether confidence intervals are compared using a plot. The default value is plotCI = TRUE.

line.types

A numeric vector of length 3 specifying the line types. See the function plot for the different line types. The default value is line.types = c(1L, 2L, 4L).

colors

A vector of length 3 specifying the colors for lines of the plot. The default value is colors = c("red", "green", "blue").

shapes

A numeric vector specifying the point shapes for the limits of intervals. If PiU is missing, the function uses the two first components of shapes, i.e., it must have at least length 2. If PiU is provided, shapes must have at least length 3. See the function plot for the different point shapes. The default value is shapes = c(8L, 4L, 3L).

save.plot

A 'TRUE/FALSE' logical value indicating whether the ggplot object of the plot comparing the confidence intervals should be saved in the output. The default value is save.plot = FALSE.

large.sample

A 'TRUE/FALSE' logical value indicating whether the sample is large to apply a faster algorithm to sort the sample values in the computation of the Gini index. The default value is large.sample = FALSE.

Details

For a sample SS, with size nn and inclusion probabilities πi=P(iS)\pi_i=P(i\in S) (argument Pi), derived from a finite population UU, with size NN, different formulations of the Gini index have been proposed in the literature. This function estimates the Gini index, variances and confidence intervals using various formulations. The different methods for estimating the Gini index are (see also Muñoz et al., 2023):

\ Gini Index formulae.

Method 1 (Langel and Tillé, 2013)

G^w1=12N^2ywiSjSwiwjyiyj,\widehat{G}_{w1}= \displaystyle \frac{1}{2\widehat{N}^{2}\overline{y}_{w}}\sum_{i \in S}\sum_{j \in S}w_{i}w_{j}|y_{i}-y_{j}|,

where N^=iSwi\widehat{N}=\sum_{i \in S}w_i, yw=N^1iSwiyi\overline{y}_{w}=\widehat{N}^{-1}\sum_{i \in S}w_{i}y_{i}, and wiw_i are the survey weights. For example, the survey weights can be wi=πi1w_i=\pi_{i}^{-1}. w or Pi must be provided, but not both. It is required that wi=πi1w_i = \pi_i^{-1}, for iSi \in S, when both w and Pi are provided.

Method 2 (Alfons and Templ, 2012; Langel and Tillé, 2013)

G^w2=2iSw(i)+N^(i)y(i)iSwi2yiN^2yw1,\widehat{G}_{w2} =\displaystyle \frac{2\sum_{i \in S}w_{(i)}^{+}\widehat{N}_{(i)}y_{(i)} - \sum_{i \in S}w_{i}^{2}y_{i} }{\widehat{N}^{2}\overline{y}_{w}}-1,

where y(i)y_{(i)} are the values yiy_i sorted in increasing order, w(i)+w_{(i)}^{+} are the values wiw_i sorted according to the increasing order of the values yiy_i, and N^(i)=j=1iw(j)+\widehat{N}_{(i)}=\sum_{j=1}^{i}w_{(j)}^{+}. Langel and Tillé (2013) show that G^w1=G^w2\widehat{G}_{w1} = \widehat{G}_{w2}, so the computation of G^w1\widehat{G}_{w1} is ommited in results.

Method 3 (Berger, 2008)

G^w3=2N^ywiSwiyiF^w(yi)1,\widehat{G}_{w3} = \displaystyle \frac{2}{\widehat{N}\overline{y}_{w}}\sum_{i \in S}w_{i}y_{i}\widehat{F}_{w}^{\ast}(y_{i})-1,

where

F^w(t)=1N^iSwi[δ(yi<t)+0.5δ(yi=t)]\widehat{F}_{w}^{\ast}(t) = \displaystyle \frac{1}{\widehat{N}}\sum_{i \in S}w_{i}[\delta(y_i < t) + 0.5\delta(y_i = t)]

is the smooth (mid-point) distribution function, and δ()\delta(\cdot) is the indicator variable that takes the value 1 when its argument is true, and 0 otherwise. It can be seen that G^w2=G^w3\widehat{G}_{w2} = \widehat{G}_{w3}, so the computation of G^w3\widehat{G}_{w3} is ommited in results.

Method 4 (Berger and Gedik-Balay, 2020)

G^w4=1vwyw,\widehat{G}_{w4} = 1 - \displaystyle \frac{\overline{v}_{w}}{\overline{y}_{w}},

where vw=N^1iSwivi\overline{v}_{w}=\widehat{N}^{-1}\sum_{i \in S}w_{i}v_{i} and

vi=1N^wijSjimin(yi,yj).v_{i} = \displaystyle \frac{1}{\widehat{N} - w_{i}}\sum_{ \substack{j \in S\\ j\neq i}}\min(y_{i},y_{j}).

Method 5 (Lerman and Yitzhaki, 1989)

G^w5=2N^ywiSw(i)+[y(i)yw][F^wLY(y(i))FwLY],\widehat{G}_{w5} = \displaystyle \frac{2}{\widehat{N}\overline{y}_{w}} \sum_{i \in S} w_{(i)}^{+}[y_{(i)} - \overline{y}_{w}]\left[ \widehat{F}_{w}^{LY}(y_{(i)}) - \overline{F}_{w}^{LY} \right],

where

F^wLY(y(i))=1N^(N^(i1)+w(i)+2)\widehat{F}_{w}^{LY}(y_{(i)}) = \displaystyle \frac{1}{\widehat{N}}\left(\widehat{N}_{(i-1)} + \frac{w_{(i)}^{+}}{2} \right)

and FwLY=N^1iSw(i)+F^wLY(y(i))\overline{F}_{w}^{LY}=\widehat{N}^{-1}\sum_{i \in S}w_{(i)}^{+}\widehat{F}_{w}^{LY}(y_{(i)}).

\ Variances and confidence intervals.

For a given estimator G^w\widehat{G}_{w} and variable zz, the Horvitz-Thompson type variance estimator (Hortvitz and Thompson, 1952) is given by

V^HT(G^w)=iSjSΔ˘ijwiwjzizj,\widehat{V}_{HT}(\widehat{G}_{w}) = \displaystyle \sum_{i\in S}\sum_{j\in S}\breve{\Delta}_{ij}w_{i}w_{j}z_{i}z_{j},

where

Δ˘ij=πijπiπjπij\breve{\Delta}_{ij}=\displaystyle \frac{\pi_{ij}-\pi_{i}\pi_{j}}{\pi_{ij}}

and πij\pi_{ij} is the second (joint) inclusion probability of the individuals ii and jj, i.e., πij=P{(i,j)S)}\pi_{ij}=P\{(i,j)\in S)\} (argument Pij).

The Sen-Yates-Grundy type variance estimator (Sen, 1953; Yates and Grundy, 1953) is defined as

V^SYG(G^w)=12iSjSΔ˘ij(wiziwjzj)2\widehat{V}_{SYG}(\widehat{G}_{w}) = - \displaystyle \frac{1}{2}\sum_{i\in S}\sum_{j\in S}\breve{\Delta}_{ij}(w_{i}z_i-w_{j}z_{j})^{2}

.

The Hartley-Rao type variance estimator (Hartley and Rao, 1962) is given by

V^HR(G^w)=1n1iSjSj<i(1πiπj+1nkUπk2)(wiziwjzj)2.\widehat{V}_{HR}(\widehat{G}_{w}) = \displaystyle \frac{1}{n-1}\sum_{i\in S}\sum_{\substack{j \in S\\ j < i}}\left(1-\pi_i-\pi_j + \frac{1}{n}\sum_{k\in U}\pi_{k}^{2} \right)(w_{i}z_i-w_{j}z_{j})^{2}.

Note that the The Horvitz-Thompson variance estimator can give negative values. We observe that both Horvitz-Thompson and Sen-Yates-Grundy variance estimators depend on second (joint) inclusion probabilities (argument Pij). The Hajek (1964) approximation

πijπiπj[1(1πi)(1πj)iS(1πi)]\pi_{ij}\cong \pi_{i}\pi_{j}\left[1- \displaystyle \frac{(1-\pi_{i})(1-\pi_{j})}{\sum_{i \in S}(1-\pi_{i})} \right]

is used when the second (joint) inclusion probabilities are not available (Pij = NULL). Note that the Hajek approximation is suggested for large-entropy sampling designs, large samples, and large populations (see Tille 2006; Berger and Tillé, 2009; Haziza et al., 2008; Berger, 2011). For instance, this approximation is not recomended for highly-stratified samples (Berger, 2005). The Hartley-Rao variance estimator requires the first inclusion probabilities at the population level (argument PiU). zjackknife computes the confidence interval based on the jackknife technique with critical values based on the Normal approximation. zalinearization and zblinearization compute the confidence intervals based on the linearization technique applied to the estimators

G^wa=G^w1\widehat{G}_{w}^{a} = \widehat{G}_{w1}

and

G^wb=2N^ywiSwiyiF^w(yi)1,\widehat{G}_{w}^{b} = \displaystyle \frac{2}{\widehat{N}\overline{y}_{w}}\sum_{i \in S}w_{i}y_{i}\widehat{F}_{w}(y_{i})-1,

respectively, where

F^w(t)=1N^iSwiδ(yit).\widehat{F}_{w}(t)=\frac{1}{\widehat{N}}\sum_{i \in S}w_i\delta(y_i \leq t).

Critical values are also based on the Normal approximation. pbootstrap computes the variance using the rescaled bootstrap, and the confidence interval is constructed using the percentile method. The vignette vignette("GiniVarInterval") contains a detailed description of the various methods for variance estimation and confidence intervals for the Gini index.

The following table summarises the various types of variances and confidence intervals that the function fcompareCI computes.

Interval Variance Critical values References
_______________ ______________ _________________ _________________________
zjackknife Jackknife Normal Berger (2008)
zalinearization Linearization Normal Langel and Tille (2013)
zblinearization Linearization Normal Berger (2008)
pBootstrap Rescaled bootstrap Percentile bootstrap Berger and Gedik-Balay (2020)

Value

If save.plot = FALSE, a data frame with columns:

  1. interval. The method used to construct the confidence interval.

  2. method. The method used to estimate the Gini index.

  3. varformula. The type of formula for the variance estimator. Posible values are HT and SYG if argument PiU is missing, and HT, SYG amd HR if argument PiU is provided.

  4. gini. The estimation of the Gini index.

  5. lowerlimit. The lower limit of the confidence interval.

  6. upperlimit. The upper limit of the confidence interval.

  7. var.gini. The variance estimation for the estimator of the Gini index.

If save.plot = TRUE, a list with two components: (i) 'base.CI' a data frame of seven columns as just described and (ii) 'plot' a (ggplot) description of the plot, which is a list with components that contain the plot itself, the data, information about the scales, panels, etc. As a side-effect, a plot that compares the various methods for constructing confidence intervals for the Gini index is displayed. **ggplot2** is needed to be installed for this option to work.

If plotCI = TRUE, as a side-effect, a plot that compares the various methods for constructing confidence intervals for the Gini index is displayed. **ggplot2** is needed to be installed for this option to work.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Alfons, A., and Templ, M. (2012). Estimation of social exclusion indicators from complex surveys: The R package laeken. KU Leuven, Faculty of Business and Economics Working Paper.

Berger, Y. G. (2005). Variance estimation with highly stratified sampling designs with unequal probabilities. Australian & New Zealand Journal of Statistics, 47, 365–373.

Berger, Y. G. (2008). A note on the asymptotic equivalence of jackknife and linearization variance estimation for the Gini Coefficient. Journal of Official Statistics, 24(4), 541-555.

Berger, Y. G. (2011). Asymptotic consistency under large entropy sampling designs with unequal probabilities. Pakistan Journal of Statistics, 27, 407–426.

Berger, Y., and Gedik-Balay, İ. (2020). Confidence intervals of Gini coefficient under unequal probability sampling. Journal of Official Statistics, 36(2), 237-249.

Berger, Y. G. and Tillé, Y. (2009). Sampling with unequal probabilities. In Sample Surveys: Design, Methods and Applications (eds. D. Pfeffermann and C. R. Rao), 39–54. Elsevier, Amsterdam.

Hajek, J. (1964). Asymptotic theory of rejective sampling with varying probabilities from a finite population. The Annals of Mathematical Statistics, 35, 4, 1491–1523.

Hartley, H. O., and Rao, J. N. K. (1962). Sampling with unequal probabilities and without replacement. The Annals of Mathematical Statistics, 350-374.

Haziza, D., Mecatti, F. and Rao, J. N. K. (2008). Evaluation of some approximate variance estimators under the Rao-Sampford unequal probability sampling design. Metron, LXVI, 91–108.

Horvitz, D. G. and Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663–685.

Langel, M., and Tillé, Y. (2013). Variance estimation of the Gini index: revisiting a result several times published. Journal of the Royal Statistical Society: Series A (Statistics in Society), 176(2), 521-540.

Lerman, R. I., and Yitzhaki, S. (1989). Improving the accuracy of estimates of Gini coefficients. Journal of econometrics, 42(1), 43-47.

Muñoz, J. F., Moya-Fernández, P. J., and Álvarez-Verdejo, E. (2023). Exploring and Correcting the Bias in the Estimation of the Gini Measure of Inequality. Sociological Methods & Research. https://doi.org/10.1177/00491241231176847

Sen, A. R. (1953). On the estimate of the variance in sampling with varying probabilities. Journal of the Indian Society of Agricultural Statistics, 5, 119–127.

Tillé, Y. (2006). Sampling Algorithms. Springer, New York.

Yates, F., and Grundy, P. M. (1953). Selection without replacement from within strata with probability proportional to size. Journal of the Royal Statistical Society B, 15, 253–261.

See Also

fgini, fginindex

Examples

# Income and weights (region 'Burgenland') from the 2006 Austrian EU-SILC (Package 'laeken').
data(eusilc, package="laeken")
y <- eusilc$eqIncome[eusilc$db040 == "Burgenland"]
w <- eusilc$rb050[eusilc$db040 == "Burgenland"]

# Estimation of the Gini index and confidence intervals using different methods.
fcompareCI(y, w)

y <- c(30428.83, 14976.54, 18094.09, 29476.79, 20381.93, 6876.17,
       10360.96, 8239.82, 29476.79, 32230.71)
w <- c(357.86, 480.99, 480.99, 476.01, 498.58, 498.58, 476, 498.58, 476.01, 476.01)
fcompareCI(y, w, plotCI = FALSE)

Gini index, variances and confidence intervals in finite populations

Description

Estimates the Gini index and computes variances and confidence intervals in finite populations.

Usage

fgini(
  y,
  w,
  method = 2L,
  interval = NULL,
  Pi = NULL,
  Pij = NULL,
  PiU,
  alpha = 0.05,
  B = 1000L,
  na.rm = TRUE,
  varformula = "SYG",
  large.sample = FALSE
)

Arguments

y

A vector with the non-negative real numbers to be used for estimating the Gini index.

w

A numeric vector with the survey weights to be used for estimating the Gini index, the variance and the confidence interval. This argument can be missing if argument Pi is provided.

method

An integer between 1 and 5 selecting one of the 5 methods detailed below for estimating the Gini index in finite populations. The default method is method = 2L.

interval

A character string specifying the type of variance estimation and confidence interval to be used. Possible values are "zjackknife", "zalinearization", "zblinearization" and "pbootstrap". interval = NULL omits the computation of both variance and confidence interval. The default value is interval = NULL.

Pi

A numeric vector with the (sample) first inclusion probabilites to be used for estimating the Gini index, the variance and the confidence interval. This argument can be NULL if argument w is provided. The default value is Pi = NULL.

Pij

A numeric square matrix with the (sample) second (joint) inclusion probabilites to be used for the variance estimation and the confidence interval. The Hajek approximation is used when Pij = NULL. This argument is used when interval={"zjackknife", "zalinearization", "zblinearization"}. The default value is Pij = NULL.

PiU

A numeric vector with the (population) first inclusion probabilites. This argument is only required when the Hartley-Rao expression for the variance estimation is selected (varformula = "HR").

alpha

A single numeric value between 0 and 1. If interval is not NULL, the confidence level to be used for computing the confidence interval for the Gini is 1-alpha. Some authors call alpha the significance level. The default value is alpha = 0.05.

B

A single integer specifying the number of bootstrap replicates. This argument is required when interval = "pbootsptrap". The default value is B = 1000L.

na.rm

A 'TRUE/FALSE' logical value indicating whether NA's should be removed before the computation proceeds. The default value is na.rm = TRUE.

varformula

A character string specifying the type of formula to be used for the variance estimator when interval = {"zjackknife", "zalinearization", "zblinearization"}. Possible values are "HT" (Hortvitz-Thompson), "SYG" (Sen-Yates-Grundy) and "HR" (Hartley-Rao). The default value is varformula = "SYG".

large.sample

A 'TRUE/FALSE' logical value indicating indicating whether the sample is large to apply a faster algorithm to sort the sample values in the computation of the Gini index. The default value is large.sample = FALSE.

Details

For a sample SS, with size nn and inclusion probabilities πi=P(iS)\pi_i=P(i\in S) (argument Pi), derived from a finite population UU, with size NN, different formulations of the Gini index have been proposed in the literature. his function estimates the Gini index, variances and confidence intervals using various formulations. The different methods for estimating the Gini index are (see also Muñoz et al., 2023):

\ Gini Index formulae.

method = 1 (Langel and Tillé, 2013)

G^w1=12N^2ywiSjSwiwjyiyj,\widehat{G}_{w1}= \displaystyle \frac{1}{2\widehat{N}^{2}\overline{y}_{w}}\sum_{i \in S}\sum_{j \in S}w_{i}w_{j}|y_{i}-y_{j}|,

where N^=iSwi\widehat{N}=\sum_{i \in S}w_i, yw=N^1iSwiyi\overline{y}_{w}=\widehat{N}^{-1}\sum_{i \in S}w_{i}y_{i}, and wiw_i are the survey weights. For example, the survey weights can be wi=πi1w_i=\pi_{i}^{-1}. w or Pi must be provided, but not both. It is required that wi=πi1w_i = \pi_i^{-1}, for iSi \in S, when both w and Pi are provided.

method = 2 (Alfons and Templ, 2012; Langel and Tillé, 2013)

G^w2=2iSw(i)+N^(i)y(i)iSwi2yiN^2yw1,\widehat{G}_{w2} =\displaystyle \frac{2\sum_{i \in S}w_{(i)}^{+}\widehat{N}_{(i)}y_{(i)} - \sum_{i \in S}w_{i}^{2}y_{i} }{\widehat{N}^{2}\overline{y}_{w}}-1,

where y(i)y_{(i)} are the values yiy_i sorted in increasing order, w(i)+w_{(i)}^{+} are the values wiw_i sorted according to the increasing order of the values yiy_i, and N^(i)=j=1iw(j)+\widehat{N}_{(i)}=\sum_{j=1}^{i}w_{(j)}^{+}. Langel and Tillé (2013) show that G^w1=G^w2\widehat{G}_{w1} = \widehat{G}_{w2}.

method = 3 (Berger, 2008)

G^w3=2N^ywiSwiyiF^w(yi)1,\widehat{G}_{w3} = \displaystyle \frac{2}{\widehat{N}\overline{y}_{w}}\sum_{i \in S}w_{i}y_{i}\widehat{F}_{w}^{\ast}(y_{i})-1,

where

F^w(t)=1N^iSwi[δ(yi<t)+0.5δ(yi=t)]\widehat{F}_{w}^{\ast}(t) = \displaystyle \frac{1}{\widehat{N}}\sum_{i \in S}w_{i}[\delta(y_i < t) + 0.5\delta(y_i = t)]

is the smooth (mid-point) distribution function, and δ()\delta(\cdot) is the indicator variable that takes the value 1 when its argument is true, and the value 0 otherwise. It can be seen that G^w2=G^w3\widehat{G}_{w2} = \widehat{G}_{w3}.

method = 4 (Berger and Gedik-Balay, 2020)

G^w4=1vwyw,\widehat{G}_{w4} = 1 - \displaystyle \frac{\overline{v}_{w}}{\overline{y}_{w}},

where vw=N^1iSwivi\overline{v}_{w}=\widehat{N}^{-1}\sum_{i \in S}w_{i}v_{i} and

vi=1N^wijSjimin(yi,yj).v_{i} = \displaystyle \frac{1}{\widehat{N} - w_{i}}\sum_{ \substack{j \in S\\ j\neq i}}\min(y_{i},y_{j}).

method = 5 (Lerman and Yitzhaki, 1989)

G^w5=2N^ywiSw(i)+[y(i)yw][F^wLY(y(i))FwLY],\widehat{G}_{w5} = \displaystyle \frac{2}{\widehat{N}\overline{y}_{w}} \sum_{i \in S} w_{(i)}^{+}[y_{(i)} - \overline{y}_{w}]\left[ \widehat{F}_{w}^{LY}(y_{(i)}) - \overline{F}_{w}^{LY} \right],

where

F^wLY(y(i))=1N^(N^(i1)+w(i)+2)\widehat{F}_{w}^{LY}(y_{(i)}) = \displaystyle \frac{1}{\widehat{N}}\left(\widehat{N}_{(i-1)} + \frac{w_{(i)}^{+}}{2} \right)

and FwLY=N^1iSw(i)+F^wLY(y(i))\overline{F}_{w}^{LY}=\widehat{N}^{-1}\sum_{i \in S}w_{(i)}^{+}\widehat{F}_{w}^{LY}(y_{(i)}).

\ Variances and confidence intervals.

For a given estimator G^w\widehat{G}_{w} and variable zz, the Horvitz-Thompson type variance estimator (Hortvitz and Thompson, 1952)

V^HT(G^w)=iSjSΔ˘ijwiwjzizj\widehat{V}_{HT}(\widehat{G}_{w}) = \displaystyle \sum_{i\in S}\sum_{j\in S}\breve{\Delta}_{ij}w_{i}w_{j}z_{i}z_{j}

is computed when varformula = "HT", where

Δ˘ij=πijπiπjπij\breve{\Delta}_{ij}=\displaystyle \frac{\pi_{ij}-\pi_{i}\pi_{j}}{\pi_{ij}}

and πij\pi_{ij} is the second (joint) inclusion probability of the individuals ii and jj, i.e., πij=P{(i,j)S)}\pi_{ij}=P\{(i,j)\in S)\} (argument Pij).

The Sen-Yates-Grundy type variance estimator (Sen, 1953; Yates and Grundy, 1953)

V^SYG(G^w)=12iSjSΔ˘ij(wiziwjzj)2\widehat{V}_{SYG}(\widehat{G}_{w}) = - \displaystyle \frac{1}{2}\sum_{i\in S}\sum_{j\in S}\breve{\Delta}_{ij}(w_{i}z_i-w_{j}z_{j})^{2}

is computed when varformula = "SYG", and the Hartley-Rao type variance estimator (Hartley and Rao, 1962)

V^HR(G^w)=1n1iSjSj<i(1πiπj+1nkUπk2)(wiziwjzj)2\widehat{V}_{HR}(\widehat{G}_{w}) = \displaystyle \frac{1}{n-1}\sum_{i\in S}\sum_{\substack{j \in S\\ j < i}}\left(1-\pi_i-\pi_j + \frac{1}{n}\sum_{k\in U}\pi_{k}^{2} \right)(w_{i}z_i-w_{j}z_{j})^{2}

is computed when varformula = "HR". Note that the The Horvitz-Thompson variance estimator can give negative values. We observe that both Horvitz-Thompson and Sen-Yates-Grundy variance estimators depend on second (joint) inclusion probabilities (argument Pij). The Hajek (1964) approximation

πijπiπj[1(1πi)(1πj)iS(1πi)]\pi_{ij}\cong \pi_{i}\pi_{j}\left[1- \displaystyle \frac{(1-\pi_{i})(1-\pi_{j})}{\sum_{i \in S}(1-\pi_{i})} \right]

is used when the second (joint) inclusion probabilities are not available (Pij = NULL). Note that the Hajek approximation is suggested for large-entropy sampling designs, large samples, and large populations (see Tille 2006; Berger and Tille, 2009; Haziza et al., 2008; Berger, 2011). For instance, this approximation is not recomended for highly-stratified samples (Berger, 2005). The Hartley-Rao variance estimator requires the first inclusion probabilities at the population level (argument PiU). zjakknife computes the confidence interval based on the jackknife technique with critical values based on the Normal approximation. zalinearization and zblinearization compute the confidence intervals based on the linearization technique applied to the estimators

G^wa=G^w1\widehat{G}_{w}^{a} = \widehat{G}_{w1}

and

G^wb=2N^ywiSwiyiF^w(yi)1,\widehat{G}_{w}^{b} = \displaystyle \frac{2}{\widehat{N}\overline{y}_{w}}\sum_{i \in S}w_{i}y_{i}\widehat{F}_{w}(y_{i})-1,

respectively, where

F^w(t)=1N^iSwiδ(yit).\widehat{F}_{w}(t)=\frac{1}{\widehat{N}}\sum_{i \in S}w_i\delta(y_i \leq t).

Critical values are also based on the Normal approximation. pbootstrap computes the variance using the rescaled bootstrap, and the confidence interval is constructed using the percentile method. The vignette vignette("GiniVarInterval") contains a detailed description of the various methods for variance estimation and confidence intervals for the Gini index.

The following table summarises the various types of variances and confidence intervals that the function fgini computes. The argument varformula only applies for the jackknife and linearization techniques (see Berger, 2008; Langel and Tillé, 2013).

Interval Variance Critical values References
_______________ ______________ _________________ _________________________
zjackknife Jackknife Normal Berger (2008)
zalinearization Linearization Normal Langel and Tille (2013)
zblinearization Linearization Normal Berger (2008)
pBootstrap Rescaled bootstrap Percentile bootstrap Berger and Gedik-Balay (2020)

Value

When interval = NULL, the function returns a single numeric value between 0 and 1 informing about the estimation of the Gini index. When interval is not NULL, the function returns a list with 3 components: a single numeric value with the estimation of the Gini index; a single numeric value with the variance estimation of the Gini index; and a vector of length two containing the lower and upper limits of the confidence interval for the Gini index.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Alfons, A., and Templ, M. (2012). Estimation of social exclusion indicators from complex surveys: The R package laeken. KU Leuven, Faculty of Business and Economics Working Paper.

Berger, Y. G. (2005). Variance estimation with highly stratified sampling designs with unequal probabilities. Australian & New Zealand Journal of Statistics, 47, 365–373.

Berger, Y. G. (2008). A note on the asymptotic equivalence of jackknife and linearization variance estimation for the Gini Coefficient. Journal of Official Statistics, 24(4), 541-555.

Berger, Y. G. (2011). Asymptotic consistency under large entropy sampling designs with unequal probabilities. Pakistan Journal of Statistics, 27, 407–426.

Berger, Y. G. and Tillé, Y. (2009). Sampling with unequal probabilities. In Sample Surveys: Design, Methods and Applications (eds. D. Pfeffermann and C. R. Rao), 39–54. Elsevier, Amsterdam

Berger, Y., and Gedik-Balay, I. (2020). Confidence intervals of Gini coefficient under unequal probability sampling. Journal of Official Statistics, 36(2), 237-249.

Hajek, J. (1964). Asymptotic theory of rejective sampling with varying probabilities from a finite population. The Annals of Mathematical Statistics, 35, 4, 1491–1523.

Hartley, H. O., and Rao, J. N. K. (1962). Sampling with unequal probabilities and without replacement. The Annals of Mathematical Statistics, 350-374.

Haziza, D., Mecatti, F. and Rao, J. N. K. (2008). Evaluation of some approximate variance estimators under the Rao-Sampford unequal probability sampling design. Metron, LXVI, 91–108.

Horvitz, D. G. and Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663–685.

Langel, M., and Tille, Y. (2013). Variance estimation of the Gini index: revisiting a result several times published. Journal of the Royal Statistical Society: Series A (Statistics in Society), 176(2), 521-540.

Lerman, R. I., and Yitzhaki, S. (1989). Improving the accuracy of estimates of Gini coefficients. Journal of econometrics, 42(1), 43-47.

Muñoz, J. F., Moya-Fernández, P. J., and Álvarez-Verdejo, E. (2023). Exploring and Correcting the Bias in the Estimation of the Gini Measure of Inequality. Sociological Methods & Research. https://doi.org/10.1177/00491241231176847

Sen, A. R. (1953). On the estimate of the variance in sampling with varying probabilities. Journal of the Indian Society of Agricultural Statistics, 5, 119–127.

Tillé, Y. (2006). Sampling Algorithms. Springer, New York.

Yates, F., and Grundy, P. M. (1953). Selection without replacement from within strata with probability proportional to size. Journal of the Royal Statistical Society B, 15, 253–261.

See Also

fginindex, fcompareCI

Examples

# Income and weights (region 'Burgenland') from the 2006 Austrian EU-SILC (Package 'laeken').
data(eusilc, package="laeken")
y <- eusilc$eqIncome[eusilc$db040 == "Burgenland"]
w <- eusilc$rb050[eusilc$db040 == "Burgenland"]

# Estimation of the Gini index using 'method = 2' .
fgini(y, w)


y <- c(30428.83, 14976.54, 18094.09, 29476.79, 20381.93, 6876.17,
       10360.96, 8239.82, 29476.79, 32230.71)
w <- c(357.86, 480.99, 480.99, 476.01, 498.58, 498.58, 476, 498.58, 476.01, 476.01)

# Gini index estimation and confidence interval using:
 ## a: The method 2 for point estimation.
 ## b: The method 'zjackknife' for variance estimation.
 ## c: The Sen-Yates-Grundy type variance estimator.
 ## d: The Hajek approximation for the joint inclusion probabilities.
fgini(y, w, interval = "zjackknife")

# Gini index estimation and confidence interval using:
 ## a: The method 2 for point estimation.
 ## b: The method 'zalinearization' for variance estimation.
 ## c: The Sen-Yates-Grundy type variance estimator.
 ## d: The Hajek approximation for the joint inclusion probabilities.
fgini(y, w, interval = "zalinearization")

# Gini index estimation and confidence interval using:
 ## a: The method 3 for point estimation.
 ## b: The method 'zblinearization' for variance estimation.
 ## c: The Sen-Yates-Grundy type variance estimator.
 ## d: The Hajek approximation for the joint inclusion probabilities.
fgini(y, w, method = 3L, interval = "zblinearization")

# Gini index estimation and confidence interval using:
 ## a: The method 2 for point estimation.
 ## b: The method 'pbootstrap' for variance estimation.
 ## c: The percentile bootstrap method for the confidence interval.
fgini(y, w, interval = "pbootstrap")

Gini index for finite populations and different estimation methods.

Description

Estimates the Gini index in finite populations, using different methods.

Usage

fginindex(
 y,
 w,
 method = 2L,
 Pi = NULL,
 na.rm = TRUE,
 useRcpp = TRUE
)

Arguments

y

A vector with the non-negative real numbers to be used for estimating the Gini index.

w

A numeric vector with the survey weights to be used for estimating the Gini index. This argument can be missing if argument Pi is provided.

method

An integer between 1 and 5 selecting one of the 5 methods detailed below for estimating the Gini index in finite populations. The default method is method = 2L.

Pi

A numeric vector with the (sample) first inclusion probabilites to be used for estimating the Gini index. This argument can be NULL if argument w is provided. The default value is Pi = NULL.

na.rm

A 'TRUE/FALSE' logical value indicating whether NA's should be removed before the computation proceeds. The default value is na.rm = TRUE.

useRcpp

A 'TRUE/FALSE' logical value indicating whether Rcpp (useRcpp = TRUE), or R (useRcpp = FALSE), is used for computation. The default value is UseRcpp = TRUE.

Details

For a sample SS, with size nn and inclusion probabilities πi=P(iS)\pi_i=P(i\in S) (argument Pi), derived from a finite population UU, with size NN, different formulations of the Gini index have been proposed in the literature. This function estimates the Gini index using various formulations, and both R and ⁠C++⁠ codes are implemented. This can be useful for research purposes, and speed comparisons can be made. The different methods for estimating the Gini index are (see also Muñoz et al., 2023):

method = 1 (Langel and Tillé, 2013)

G^w1=12N^2ywiSjSwiwjyiyj,\widehat{G}_{w1}= \displaystyle \frac{1}{2\widehat{N}^{2}\overline{y}_{w}}\sum_{i \in S}\sum_{j \in S}w_{i}w_{j}|y_{i}-y_{j}|,

where N^=iSwi\widehat{N}=\sum_{i \in S}w_i, yw=N^1iSwiyi\overline{y}_{w}=\widehat{N}^{-1}\sum_{i \in S}w_{i}y_{i}, and wiw_i are the survey weights. For example, the survey weights can be wi=πi1w_i=\pi_{i}^{-1}. w or Pi must be provided, but not both. It is required that wi=πi1w_i = \pi_i^{-1}, for iSi \in S, when both w and Pi are provided.

method = 2 (Alfons and Templ, 2012; Langel and Tillé, 2013)

G^w2=2iSw(i)N^(i)y(i)iSwi2yiN^2yw1,\widehat{G}_{w2} =\displaystyle \frac{2\sum_{i \in S}w_{(i)}^{*}\widehat{N}_{(i)}y_{(i)} - \sum_{i \in S}w_{i}^{2}y_{i} }{\widehat{N}^{2}\overline{y}_{w}}-1,

where y(i)y_{(i)} are the values yiy_i sorted in increasing order, w(i)w_{(i)}^{*} are the values wiw_i sorted according to the increasing order of the values yiy_i, and N^(i)=j=1iw(j)\widehat{N}_{(i)}=\sum_{j=1}^{i}w_{(j)}^{*}. Langel and Tillé (2013) show that G^w1=G^w2\widehat{G}_{w1} = \widehat{G}_{w2}.

method = 3 (Berger, 2008)

G^w3=2N^ywiSwiyiF^w(yi)1,\widehat{G}_{w3} = \displaystyle \frac{2}{\widehat{N}\overline{y}_{w}}\sum_{i \in S}w_{i}y_{i}\widehat{F}_{w}^{\ast}(y_{i})-1,

where

F^w(t)=1N^iSwi[δ(yi<t)+0.5δ(yi=t)]\widehat{F}_{w}^{\ast}(t) = \displaystyle \frac{1}{\widehat{N}}\sum_{i \in S}w_{i}[\delta(y_i < t) + 0.5\delta(y_i = t)]

is the smooth (mid-point) distribution function, and δ()\delta(\cdot) is the indicator variable that takes the value 1 when its argument is true, and the value 0 otherwise. It can be seen that G^w2=G^w3\widehat{G}_{w2} = \widehat{G}_{w3}.

method = 4 (Berger and Gedik-Balay, 2020)

G^w4=1zwyw,\widehat{G}_{w4} = 1 - \displaystyle \frac{\overline{z}_{w}}{\overline{y}_{w}},

where zw=N^1iSwizi\overline{z}_{w}=\widehat{N}^{-1}\sum_{i \in S}w_{i}z_{i} and

zi=1N^wijSjimin(yi,yj).z_{i} = \displaystyle \frac{1}{\widehat{N} - w_{i}}\sum_{ \substack{j \in S\\ j\neq i}}\min(y_{i},y_{j}).

method = 5 (Lerman and Yitzhaki, 1989)

G^w5=2N^ywiSwi[yiyw][F^wLY(yi)FwLY],\widehat{G}_{w5} = \displaystyle \frac{2}{\widehat{N}\overline{y}_{w}} \sum_{i \in S} w_{i}[y_{i} - \overline{y}_{w}]\left[ \widehat{F}_{w}^{LY}(y_{i}) - \overline{F}_{w}^{LY} \right],

where

F^wLY(yi)=1N^(N^(i1)+w(i)2)\widehat{F}_{w}^{LY}(y_{i}) = \displaystyle \frac{1}{\widehat{N}}\left(\widehat{N}_{(i-1)} + \frac{w_{(i)}^{\ast}}{2} \right)

and FwLY=N^1iSwiF^wLY(yi)\overline{F}_{w}^{LY}=\widehat{N}^{-1}\sum_{i \in S}w_{i}\widehat{F}_{w}^{LY}(y_{i}).

Value

A single numeric value between 0 and 1. The estimation of the Gini index.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Alfons, A., and Templ, M. (2012). Estimation of social exclusion indicators from complex surveys: The R package laeken. KU Leuven, Faculty of Business and Economics Working Paper.

Berger, Y. G. (2008). A note on the asymptotic equivalence of jackknife and linearization variance estimation for the Gini Coefficient. Journal of Official Statistics, 24(4), 541-555.

Berger, Y. G., and Gedik-Balay, İ. (2020). Confidence intervals of Gini coefficient under unequal probability sampling. Journal of official statistics, 36(2), 237-249.

Langel, M., and Tillé, Y. (2013). Variance estimation of the Gini index: revisiting a result several times published. Journal of the Royal Statistical Society: Series A (Statistics in Society), 176(2), 521-540.

Lerman, R. I., and Yitzhaki, S. (1989). Improving the accuracy of estimates of Gini coefficients. Journal of econometrics, 42(1), 43-47.

Muñoz, J. F., Moya-Fernández, P. J., and Álvarez-Verdejo, E. (2023). Exploring and Correcting the Bias in the Estimation of the Gini Measure of Inequality. Sociological Methods & Research. https://doi.org/10.1177/00491241231176847

See Also

fgini, fcompareCI

Examples

# Income and weights (region "Burgenland") from the 2006 Austrian EU-SILC (Package 'laeken').
data(eusilc, package="laeken")
y <- eusilc$eqIncome[eusilc$db040 == "Burgenland"]
w <- eusilc$rb050[eusilc$db040 == "Burgenland"]

#Comparing the computation time for the various estimation methods and using R
microbenchmark::microbenchmark(
fginindex(y, w, method = 1L,  useRcpp = FALSE),
fginindex(y, w, method = 2L,  useRcpp = FALSE),
fginindex(y, w, method = 3L,  useRcpp = FALSE),
fginindex(y, w, method = 4L,  useRcpp = FALSE),
fginindex(y, w, method = 5L,  useRcpp = FALSE)
)

# Comparing the computation time for the various estimation methods and using Rcpp
microbenchmark::microbenchmark(
fginindex(y, w, method = 1L),
fginindex(y, w, method = 2L),
fginindex(y, w, method = 3L),
fginindex(y, w, method = 4L),
fginindex(y, w, method = 5L)
)



# Estimation of the Gini index using 'method = 4'.
y <- c(30428.83, 14976.54, 18094.09, 29476.79, 20381.93, 6876.17,
       10360.96, 8239.82, 29476.79, 32230.71)
w <- c(357.86, 480.99, 480.99, 476.01, 498.58, 498.58, 476, 498.58, 476.01, 476.01)
fginindex(y, w, method = 4L)

Gini index for the Beta distribution with user-defined shape parameters

Description

Calculates the Gini index for the Beta distribution with shape parameters aa (shape1) and bb (shape2).

Usage

gbeta(shape1, shape2)

Arguments

shape1

A positive real number specifying the shape1 parameter aa of the Beta distribution.

shape2

A positive real number specifying the shape2 parameter bb of the Beta distribution.

Details

The Beta distribution with shape parameters aa (argument shape1) and bb (argument shape2) and denoted as Beta(a,b)Beta(a,b), where a>0a>0 and b>0b>0, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022)

f(y)=1B(a,b)ya1(1y)b1,f(y) = \displaystyle \frac{1}{B(a,b)}y^{a-1}(1-y)^{b-1},

and a cumulative distribution function given by

F(y)=B(y;a,b)B(a,b)F(y)= \displaystyle \frac{B(y;a,b)}{B(a,b)}

where 0y10 \leq y \leq 1,

B(a,b)=Γ(a)Γ(b)Γ(a+b)B(a,b) = \displaystyle \frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)}

is the beta function,

Γ(α)=0tα1etdt\Gamma(\alpha) = \int_{0}^{\infty}t^{\alpha-1}e^{-t}dt

is the gamma function, and

B(y;a,b)=0yta1(1t)b1dtB(y;a,b) = \displaystyle \int_{0}^{y}t^{a-1}(1-t)^{b-1}dt

is the incomplete beta function.

The Gini index can be computed as

G=2aB(a+b,a+b)B(a,a)B(b,b).G = \displaystyle \frac{2}{a}\frac{B(a+b,a+b)}{B(a,a)B(b,b)}.

Value

A numeric value with the Gini index. A NA is returned when a shape parameter is non-numeric or non-positive.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995). Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.

See Also

gf, gunif, gweibull, ggamma, gchisq

Examples

# Gini index for the Beta distribution with shape parameters 'a = 2' and 'b = 1'.
gbeta(shape1 = 2, shape2 = 1)

# Gini index for the Beta distribution with shape parameters 'a = 1' and 'b = 2'.
gbeta(shape1 = 1, shape2 = 2)

Gini index for the Burr Type XII (Singh-Maddala) distribution with user-defined scale and shape parameters

Description

Calculates the Gini index for the Burr Type XII (Singh-Maddala) distribution with scale parameter bb and shape parameters gg (shape.g) and ss (shape.s).

Usage

gburr(
 scale = 1,
 shape.g = 1,
 shape.s = 1
)

Arguments

scale

A positive real number specifying the scale parameter bb of the Burr Type XII (Singh-Maddala) distribution. The default value is scale = 1.

shape.g

A positive real number specifying the shape parameter gg of the Burr Type XII (Singh-Maddala) distribution. The default value is shape.g = 1.

shape.s

A positive real number specifying the shape parameter ss of the Burr Type XII (Singh-Maddala) distribution. The default value is shape.s = 1.

Details

The Burr Type XII (Singh-Maddala) distribution with scale parameter bb, shape parameters gg (argument shape.g) and ss (argument shape.s) and denoted as BurrXII(b,g,s)BurrXII(b,g,s), where b>0b>0, g>0g>0 and s>0s>0, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Rodriguez, 1977; Yee, 2022)

f(y)=gsb(yb)g1[1+(yb)g](s+1),f(y) = \displaystyle \frac{gs}{b}\left(\frac{y}{b}\right)^{g-1}\left[1 + \left(\frac{y}{b}\right)^{g}\right]^{-(s+1)},

and a cumulative distribution function given by

F(y)=1[1+(yb)g]s,F(y)=1-\left[1 + \displaystyle \left( \frac{y}{b}\right)^{g} \right]^{-s},

where y>0y>0.

The Gini index can be computed as

G=2(0.51E[y]010Q(y)yf(y)dy),G = 2\left(0.5 - \displaystyle \frac{1}{E[y]}\int_{0}^{1}\int_{0}^{Q(y)}yf(y)dy\right),

where Q(y)Q(y) is the quantile function of the Burr Type XII (Singh-Maddala) distribution, and E[y]E[y] is the expectation of the distribution. The Burr Type XII (Singh-Maddala) distribution is related to the Pareto (IV) distribution: BurrXII(b,g,s)=ParetoIV(0,b,1/g,s)BurrXII(b,g,s) = ParetoIV(0,b,1/g,s).

Value

A numeric value with the Gini index. A NA is returned when any of the parameter is non-numeric or non-positive.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995). Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Rodriguez, R. N. (1977). A guide to the Burr type XII distributions. Biometrika, 64(1), 129-134.

Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.

See Also

gparetoIV, gpareto, gparetoI, gparetoII, gparetoIII, gfisk

Examples

# Gini index for the Burr Type XII distribution with 'scale = 1', 'shape.g = 2', 'shape.s = 1'.
gburr(scale = 1, shape.g = 2, shape.s = 1)

# Gini index for the Burr Type XII distribution with 'scale = 1', 'shape.g = 5', 'shape.s = 3'.
gburr(scale = 1, shape.g = 5, shape.s = 3)

Gini index for the Chi-Squared distribution with user-defined degrees of freedom

Description

Calculates Gini indices for the Chi-Squared distribution with degrees of freedom nn (df).

Usage

gchisq(df)

Arguments

df

A vector of positive real numbers specifying degrees of freedom of the Chi-Squared distribution.

Details

The Chi-Squared distribution with degrees of freedom nn (argument df) and denoted as χn2\chi_{n}^2, where n>0n>0, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995)

f(y)=12n/2Γ(n2)yn/21ey/2,f(y)= \displaystyle \frac{1}{2^{n/2}\Gamma\left(\frac{n}{2}\right)}y^{n/2-1}e^{-y/2},

and a cumulative distribution function given by

F(y)=γ(n2,y2)Γ(α),F(y) = \frac{\gamma\left(\frac{n}{2}, \frac{y}{2}\right)}{\Gamma(\alpha)},

where y0y \geq 0, the gamma function is defined by

Γ(α)=0tα1etdt,\Gamma(\alpha) = \int_{0}^{\infty}t^{\alpha-1}e^{-t}dt,

and the lower incomplete gamma function is given by

γ(α,y)=0ytα1etdt.\gamma(\alpha,y) = \int_{0}^{y}t^{\alpha-1}e^{-t}dt.

The Gini index can be computed as

G=2Γ(1+n2)nΓ(n2)π.G=\displaystyle \frac{2\Gamma\left( \frac{1+n}{2}\right)}{n\Gamma\left(\frac{n}{2}\right)\sqrt{\pi}}.

The Chi-Squared distribution is related to the Gamma distribution: χn2=Gamma(n/2,2)\chi_{n}^2 = Gamma(n/2, 2).

Value

A numeric vector with the Gini indices. A NA is returned when degrees of freedom are non-numeric or non-positive.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

See Also

ggamma, gf, gbeta, glnorm

Examples

# Gini index for the Chi-Squared distribution with degrees of freedom equal to 2.
gchisq(df = 2)

# Gini indices for the Chi-Squared distribution and different degrees of freedom.
gchisq(df = 5:10)

Gini index for the Dagum distribution with user-defined shape parameters

Description

Calculates the Gini index for the Dagum distribution with shape parameters aa (shape1.a) and pp (shape2.p).

Usage

gdagum(shape1.a, shape2.p)

Arguments

shape1.a

A positive real number specifying the shape1 parameter aa of the Dagum distribution.

shape2.p

A positive real number specifying the shape parameter pp of the Dagum distribution.

Details

The Dagum distribution with scale parameter bb, shape parameters aa (argument shape1.a) and pp (argument shape2.p) and denoted as Dagum(b,a,p)Dagum(b,a,p) , where b>0b>0, a>0a>0 and p>0p>0, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Rodriguez, 1977; Yee, 2022)

f(y)=apy(yb)ap[(yb)a+1]p+1,f(y) = \displaystyle \frac{ap}{y}\frac{\left(\frac{y}{b}\right)^{ap}}{ \left[\left(\frac{y}{b} \right)^{a} + 1 \right]^{p+1} },

and a cumulative distribution function given by

F(y)=[1+(yb)a]p,F(y)= \left[1 + \displaystyle \left( \frac{y}{b}\right)^{-a} \right]^{-p},

where y>0y > 0.

The Gini index can be computed as

G=Γ(p)Γ(2p+1/a)Γ(2p)Γ(p+1/a)1,G = \displaystyle \frac{\Gamma(p)\Gamma(2p+1/a)}{\Gamma(2p)\Gamma(p+1/a)}-1,

where the gamma function is defined as

Γ(α)=0tα1etdt.\Gamma(\alpha) = \int_{0}^{\infty}t^{\alpha-1}e^{-t}dt.

The Dagum distribution is also known the Burr III, inverse Burr, beta-K, or 3-parameter kappa distribution. The Dagum distribution is related to the Fisk (Log Logistic) distribution: Dagum(b,a,1)=Fisk(b,a)Dagum(b,a,1) = Fisk(b,a). The Dagum distribution is also related to the inverse Lomax distribution and the inverse paralogistic distribution (see Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022).

Value

A numeric value with the Gini index. A NA is returned when a shape parameter is non-numeric or non-positive.

Note

The Gini index of the Dagum distribution does not depend on its scale parameter.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.

See Also

gburr, gpareto, gfisk, ggompertz, gfrechet

Examples

# Gini index for the Dagum distribution with shape parameters 'a = 2' and 'p = 20'.
gdagum(shape1.a = 2, shape2.p = 20)

Gini index for the F distribution with user-defined degrees of freedom

Description

Calculates the Gini index for the F distribution with degrees of freedom ν1\nu_1 (df1) and ν2\nu_2 (df2).

Usage

gf(df1, df2)

Arguments

df1

A positive real number specifying the degrees of freedom ν1\nu_1 of the F distribution.

df2

A positive real number higher or equal than two specifying the degrees of freedom ν2\nu_2 of the F distribution.

Details

The F distribution with ν1\nu_1 (argument df1) and ν2\nu_2 (argument df2) degrees of freedom and denoted as Fν1,ν2F_{\nu_1,\nu_2}, where ν1>0\nu_1>0 and ν2>0\nu_2 > 0, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995)

f(y)=Γ(ν12+ν22)Γ(ν12)Γ(ν22)(ν1ν2)ν1/2yν1/21(1+ν1yν2)(ν1+ν2)/2,f(y) = \displaystyle \frac{\Gamma\left(\frac{\nu_{1}}{2} + \frac{\nu_{2}}{2}\right)}{\Gamma\left(\frac{\nu_{1}}{2}\right)\Gamma\left(\frac{\nu_{2}}{2}\right)}\left( \frac{\nu_{1}}{\nu_{2}}\right)^{\nu_{1}/2}y^{\nu_{1}/2-1}\left(1 + \frac{\nu_{1}y}{\nu_{2}}\right)^{-(\nu_{1}+\nu_{2})/2},

and a cumulative distribution function given by

F(y)=Iν1y/(ν1y+ν2)(ν12,ν22),F(y)= \displaystyle I_{\nu_{1}y/(\nu_{1}y + \nu_{2})}\left( \frac{\nu_{1}}{2}, \frac{\nu_{2}}{2} \right),

where y0y \geq 0,

Γ(α)=0tα1etdt\Gamma(\alpha) = \int_{0}^{\infty}t^{\alpha-1}e^{-t}dt

is the gamma function,

Iy(a,b)=B(y;a,b)B(a,b)I_{y}(a,b)=\displaystyle \frac{B(y;a,b)}{B(a,b)}

is the regularized incomplete beta function,

B(a,b)=Γ(a)Γ(b)Γ(a+b)B(a,b) = \displaystyle \frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)}

is the beta function, and

B(y;a,b)=0yta1(1t)b1dtB(y;a,b) = \displaystyle \int_{0}^{y}t^{a-1}(1-t)^{b-1}dt

is the incomplete beta function.

The Gini index, for ν22\nu_2 \geq 2, can be computed as

G=2(0.5ν22ν2010Q(y)yf(y)dy),G = 2\left(0.5 - \displaystyle \frac{\nu_{2} - 2}{ \nu_{2}}\int_{0}^{1}\int_{0}^{Q(y)}yf(y)dy\right),

where Q(y)Q(y) is the quantile function of the F distribution.

Value

A numeric value with the Gini index. A NA is returned when degrees of freedom are non-numeric or df10df1 \leq 0 or df2<2df2 < 2 .

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

See Also

gchisq, ggamma, ggompertz, glnorm

Examples

# Gini index for the F distribution with 'df1 = 10' and 'df2 = 20' degrees of freedom.
gf(df1 = 10, df2 = 20)

Gini index for the Fisk (Log Logistic) distribution with user-defined shape parameters

Description

Calculates the Gini indices for the Fisk (Log Logistic) distribution with shape parameters aa (shape1.a).

Usage

gfisk(shape1.a)

Arguments

shape1.a

A vector of positive real numbers specifying shape parameters aa of the Fisk (Log Logistic) distribution.

Details

The Fisk (Log Logistic) distribution with scale parameter bb, shape parameter aa (argument shape1.a) and denoted as Fisk(b,a)Fisk(b,a), where b>0b>0 and a>0a>0, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022)

f(y)=ay(yb)a[(yb)a+1]2,f(y) = \displaystyle \frac{a}{y}\frac{\left(\frac{y}{b}\right)^{a}}{ \left[\left(\frac{y}{b} \right)^{a} + 1 \right]^{2} },

and a cumulative distribution function given by

F(y)=1[1+(yb)a]1,F(y)=1-\left[1 + \displaystyle \left( \frac{y}{b}\right)^{a} \right]^{-1},

where y0y \geq 0.

The Gini index can be computed as

G={1,0<a<1;1a,a1.G = \left\{ \begin{array}{cl} 1 , & 0< a <1; \\ \displaystyle \frac{1}{a}, & a \geq 1. \end{array} \right.

The Fisk (Log Logistic) distribution is related to the Dagum distribution: Fisk(b,a)=Dagum(b,a,1)Fisk(b,a) = Dagum(b,a,1).

Value

A numeric vector with the Gini indices. A NA is returned when a shape parameter is non-numeric or non-positive.

Note

The Gini index of the Fisk (Log Logistic) distribution does not depend on its scale parameter.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.

See Also

gdagum, gburr, gpareto, ggompertz

Examples

# Gini index for the Fisk distribution with a shape parameter 'a = 2'.
gfisk(shape1.a = 2)

# Gini indices for the Fisk distribution and different shape parameters.
gfisk(shape1.a = 1:10)

Gini index for the Frechet distribution with user-defined shape parameters

Description

Calculates the Gini indices for the Frechet distribution with shape parameters ss.

Usage

gfrechet(shape)

Arguments

shape

A vector of positive real numbers higher or equal than 1 specifying shape parameters ss of the Frechet distribution.

Details

The Frechet distribution with location parameter aa, scale parameter bb, shape parameter ss and denoted as Frechet(a,b,s)Frechet(a,b,s), where a>0a>0, b>0b>0 and s>0s>0, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995)

f(y)=sb(ya)2(bya)s1exp[(bya)s],f(y) = \displaystyle \frac{sb}{(y-a)^{2}} \left(\frac{b}{y-a}\right)^{s-1} \exp\left[- \displaystyle \left(\frac{b}{y-a}\right)^{s} \right],

and a cumulative distribution function given by

F(y)=exp[(bya)s],F(y)= \displaystyle \exp\left[- \displaystyle \left(\frac{b}{y-a}\right)^{s} \right],

where y>ay > a.

The Gini index, for s1s \geq 1, can be computed as

G=21/s1.G = 2^{1/s} -1.

Value

A numeric vector with the Gini indices. A NA is returned when a shape parameter is non-numeric or smaller than 1.

Note

The Gini index of the Frechet distribution does not depend on its location and scale parameters and only is defined when its shape parameter is at least 1.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

See Also

gdagum, gburr, gfisk, gpareto, ggompertz

Examples

# Gini index for the Frechet distribution with a shape parameter 's = 1'.
gfrechet(shape = 1)

# Gini indices for the Frechet distribution and different shape parameters.
gfrechet(shape = 1:10)

Gini index for the Gamma distribution with user-defined shape parameter

Description

Calculates the Gini indices for the Gamma distribution with shape parameters α\alpha.

Usage

ggamma(shape)

Arguments

shape

A vector of positive real numbers specifying the shape parameters α\alpha of the Gamma distribution.

Details

The Gamma distribution with shape parameter α\alpha, scale parameter σ\sigma and denoted as Gamma(α,σ)Gamma(\alpha, \sigma), where α>0\alpha>0 and σ>0\sigma>0, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995)

f(y)=1σαΓ(α)yα1ey/σ,f(y) = \displaystyle \frac{1}{\sigma^{\alpha}\Gamma(\alpha)}y^{\alpha-1}e^{-y/\sigma},

and a cumulative distribution function given by

F(y)=γ(α,yσ)Γ(α),F(y) = \frac{\gamma\left(\alpha, \frac{y}{\sigma}\right)}{\Gamma(\alpha)},

where y0y \geq 0, the gamma function is defined by

Γ(α)=0tα1etdt,\Gamma(\alpha) = \int_{0}^{\infty}t^{\alpha-1}e^{-t}dt,

and the lower incomplete gamma function is given by

γ(α,y)=0ytα1etdt.\gamma(\alpha,y) = \int_{0}^{y}t^{\alpha-1}e^{-t}dt.

The Gini index can be computed as

G=Γ(2α+12)αΓ(α)π.G = \displaystyle \frac{\Gamma\left(\frac{2\alpha+1}{2}\right)}{\alpha\Gamma(\alpha)\sqrt{\pi}}.

The Gamma distribution is related to the Chi-squared distribution: Gamma(n/2,2)=χn2Gamma(n/2, 2) = \chi_{n}^2.

Value

A numeric vector with the Gini indices. A NA is returned when a shape parameter is non-numeric or non-positive.

Note

The Gini index of the Gamma distribution does not depend on its scale parameter.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

See Also

gchisq, gf, gbeta, gweibull, glnorm

Examples

# Gini index for the Gamma distribution with 'shape = 1'.
ggamma(shape = 1)

# Gini indices for the Gamma distribution and different shape parameters.
ggamma(shape = 1:10)

Gini index for the Gompertz distribution with user-defined scale and shape parameters

Description

Calculate the Gini index for the Gompertz distribution with scale parameter β\beta and shape parameter α\alpha.

Usage

ggompertz(
 scale = 1,
 shape
)

Arguments

scale

A positive real number specifying the scale parameter β\beta of the Gompertz distribution. The default value is scale = 1.

shape

A positive real number specifying the shape parameter α\alpha of the Gompertz distribution.

Details

The Gompertz distribution with scale parameter β\beta, shape parameter α\alpha and denoted as Gompertz(β,α)Gompertz(\beta, \alpha), where β>0\beta>0 and α>0\alpha>0, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Rodriguez, 1977; Yee, 2022)

f(y)=αeβyexp[αβ(eβy1)],f(y)= \alpha e^{\beta y} \exp\left[ - \displaystyle \frac{\alpha}{\beta}\left(e^{\beta y} - 1 \right) \right],

and a cumulative distribution function given by

F(y)=1exp[αβ(eβy1)],F(y)= 1 -\exp\left[ - \displaystyle \frac{\alpha}{\beta}\left(e^{\beta y} - 1 \right) \right],

where y0y \geq 0.

The Gini index can be computed as

G=2(0.51E[y]010Q(y)yf(y)dy),G = 2\left(0.5 - \displaystyle \frac{1}{E[y]}\int_{0}^{1}\int_{0}^{Q(y)}yf(y)dy\right),

where Q(y)Q(y) is the quantile function of the Gompertz distribution, and E[y]E[y] is the expectation of the distribution. If scale is not specified it assumes the default value of 1.

Value

A numeric value with the Gini index. A NA is returned when a parameter is non-numeric or non-positive.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.

See Also

ggamma, gbeta, gchisq, gpareto

Examples

# Gini index for the Gompertz distribution with 'scale = 1' and 'shape = 3'.
ggompertz(scale = 1, shape = 3)

Gini index for the Log Normal distribution with user-defined standard deviations

Description

Calculates the Gini indices for the Log Normal distribution with standard deviations σ\sigma (sdlog).

Usage

glnorm(sdlog)

Arguments

sdlog

A vector of positive real numbers specifying standard deviations σ\sigma of the Log Normal distribution.

Details

The Log Normal distribution with mean μ\mu, standard deviation σ\sigma on the log scale (argument sdlog) and denoted as logNormal(μ,σ)logNormal(\mu, \sigma), has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995)

f(y)=12πσyexp[(ln(x)μ)22σ2],f(y)=\displaystyle \frac{1}{\sqrt{2\pi}\sigma y}\exp\left[- \frac{(\ln(x) - \mu)^2}{2\sigma^2} \right],

and a cumulative distribution function given by

F(y)=Φ(ln(x)μσ),F(y)=\displaystyle \Phi\left(\frac{\ln(x) - \mu}{\sigma}\right),

where y>0y > 0 and

Φ(y)=12πyet2/2dt\Phi(y) = \frac{1}{\sqrt{2\pi}}\int_{-\infty}^{y} e^{-t^{2}/2}dt

is the cumulative distribution function of a standard Normal distribution.

The Gini index can be computed as

G=2Φ(σ2)1.G = 2\Phi\left( \displaystyle \frac{\sigma}{\sqrt{2}}\right) - 1.

Value

A numeric vector with the Gini indices. A NA is returned when a standard deviation is non-numeric or non-positive.

Note

The Gini index of the logNormal distribution does not depend on the mean parameter.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

See Also

ggamma, gpareto, gchisq, gweibull

Examples

# Gini index for the Log Normal distribution with standard deviation 'sdlog = 2'.
glnorm(sdlog = 2)

# Gini indices for the Log Normal distribution with different standard deviations.
glnorm(sdlog = c(0.2, 0.5, 1:3))

Gini index for the Pareto distribution with user-defined shape parameters

Description

Calculates the Gini indices for the Pareto distribution with shape parameters α\alpha.

Usage

gpareto(shape)

Arguments

shape

A vector of positive real numbers specifying shape parameters α\alpha of the Pareto distribution.

Details

The Pareto distribution with scale parameter kk, shape parameter α\alpha and denoted as Pareto(k,α)Pareto(k, \alpha), where k>0k>0 and α>0\alpha>0, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022)

f(y)=αkαyα+1,f(y)=\displaystyle \frac{\alpha k^{\alpha}}{y^{\alpha +1}},

and a cumulative distribution function given by

F(y)=1(ky)α,F(y) = \displaystyle 1 - \left(\frac{k}{y}\right)^{\alpha},

where yky \geq k.

The Gini index can be computed as

G={1,0<α<1;12α1,α1.G = \left\{ \begin{array}{cl} 1 , & 0<\alpha <1; \\ \displaystyle \frac{1}{2\alpha-1}, & \alpha \geq 1. \end{array} \right.

Value

A numeric vector with the Gini indices. A NA is returned when a shape parameter is non-numeric or non-positive.

Note

The Gini index of the Pareto distribution does not depend on the shape parameter.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.

See Also

gparetoI, gparetoII, gparetoIII, gparetoIV, gdagum, gburr, gfisk

Examples

# Gini index for the Pareto distribution with 'shape = 2'.
gpareto(shape = 2)

# Gini indices for the Pareto distribution and different shape parameters.
gpareto(shape = 1:5)

Gini index for the Pareto (I) distribution with user-defined scale and shape parameters

Description

Calculate the Gini index for the Pareto (I) distribution with scale parameter bb and shape parameter ss.

Usage

gparetoI(
 scale = 1,
 shape = 1
)

Arguments

scale

A positive real number specifying the scale parameter bb of the Pareto (I) distribution. The default value is scale = 1.

shape

A positive real number specifying the shape parameter ss of the Pareto (I) distribution. The default value is shape = 1.

Details

The Pareto (I) distribution with scale parameter bb, shape parameter s and denoted as ParetoI(b,s), where b>0b>0 and s>0s>0, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022)

f(y)=sb(yb)(s+1),f(y)= \displaystyle \frac{s}{b} \left(\frac{y}{b}\right)^{-(s+1)},

and a cumulative distribution function given by

F(y)=1(yb)s,F(y)=1 - \displaystyle \left(\frac{y}{b}\right)^{-s},

where y>by>b.

The Gini index can be computed as

G=2(0.51E[y]010Q(y)yf(y)dy),G = 2\left(0.5 - \displaystyle \frac{1}{E[y]}\int_{0}^{1}\int_{0}^{Q(y)}yf(y)dy\right),

where Q(y)Q(y) is the quantile function of the Pareto (I) distribution, and E[y]E[y] is the expectation of the distribution. If scale or shape are not specified they assume the default value of 1. The Pareto (I) distribution is related to the Pareto (IV) distribution: ParetoI(b,s)=ParetoIV(b,b,1,s)ParetoI(b,s) = ParetoIV(b,b,1,s)

Value

A numeric value with the Gini index. A NA is returned when a parameter is non-numeric or non-positive.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.

See Also

gpareto, gparetoII, gparetoIII, gparetoIV, gdagum, gburr, gfisk

Examples

# Gini index for the Pareto (I) distribution with scale 'b = 1' and shape 's = 3'.
gparetoI(scale = 1, shape = 3)

Gini index for the Pareto (II) distribution with user-defined location, scale and shape parameters

Description

Calculates the Gini index for the Pareto (II) distribution with location parameter aa, scale parameter bb and shape parameter ss.

Usage

gparetoII(
 location = 0,
 scale = 1,
 shape = 1
)

Arguments

location

A positive real number specifying the location parameter aa of the Pareto (II) distribution. The default value is location = 0.

scale

A positive real number specifying the scale parameter bb of the Pareto (II) distribution. The default value is scale = 1.

shape

A positive real number specifying the shape parameter ss of the Pareto (II) distribution. The default value is shape = 1.

Details

The Pareto (II) distribution with location parameter aa, scale parameter bb, shape parameter ss and denoted as ParetoII(a,b,s)ParetoII(a,b,s), where a0a \geq 0, b>0b>0 and s>0s>0, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022)

f(y)=sb[1+(yab)](s+1),f(y)= \displaystyle \frac{s}{b} \left[1 + \left( \frac{y-a}{b}\right)\right]^{-(s+1)},

and a cumulative distribution function given by

F(y)=1(1+yab)s,F(y)=1-\left(1 + \displaystyle \frac{y-a}{b} \right)^{-s},

where y>ay>a.

The Gini index can be computed as

G=2(0.51E[y]010Q(y)yf(y)dy),G = 2\left(0.5 - \displaystyle \frac{1}{E[y]}\int_{0}^{1}\int_{0}^{Q(y)}yf(y)dy\right),

where Q(y)Q(y) is the quantile function of the Pareto (II) distribution, and E[y]E[y] is the expectation of the distribution. If location is not specified it assumes the default value of 0, and scale and shape assume the default value of 1. The Pareto (II) distribution is related to the Pareto (IV) distribution: ParetoII(a,b,s)=ParetoIV(a,b,1,s)ParetoII(a,b,s) = ParetoIV(a,b,1,s).

Value

A numeric value with the Gini index. A NA is returned when a parameter is non-numeric or positive, except the location parameter that can be equal to 0.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.

See Also

gpareto, gparetoI, gparetoIII, gparetoIV, gdagum, gburr, gfisk

Examples

# Gini index for the Pareto (II) distribution with parameters 'a = 1', 'b = 1' and 's = 3'.
gparetoII(location = 1, scale = 1, shape = 3)

Gini index for the Pareto (III) distribution with user-defined inequality parameters

Description

Calculate the Gini index for the Pareto (III) distribution with inequality parameters gg.

Usage

gparetoIII(
 inequality = 1
)

Arguments

inequality

A vector of positive numbers in the [0,1][0,1] interval specifying inequality parameters gg of the Pareto (III) distribution. The default value is inequality = 1.

Details

The Pareto (III) distribution with location parameter aa, scale parameter bb, inequality parameter g and denoted as ParetoIII(a,b,g)ParetoIII(a,b,g), where a>0a>0, b>0b>0, and g[0,1]g \in [0,1], has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022)

f(y)=1bg(yab)1/g1[1+(yab)1/g]2,f(y)= \displaystyle \frac{1}{bg} \left( \frac{y-a}{b}\right)^{1/g-1} \left[1 + \left( \frac{y-a}{b}\right)^{1/g} \right]^{-2},

and a cumulative distribution function given by

F(y)=1[1+(yab)1/g]1,F(y)=1-\left[1 + \displaystyle \left( \frac{y-a}{b}\right)^{1/g} \right]^{-1},

where y>ay>a.

The Gini index is G=g.G = g.

If inequality is not specified it assumes the default value of 1. The Pareto (III) distribution is related to the Pareto (IV) distribution: ParetoIII(a,b,g)=ParetoIV(a,b,g,1)ParetoIII(a,b,g) = ParetoIV(a,b,g,1).

Value

A numeric vector with the Gini indices. A NA is returned when a inequality parameter is non-numeric or it is out of the interval [0,1][0,1].

Note

The Gini index of the Pareto (III) distribution does not depend on its location and scale parameters.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.

See Also

gpareto, gparetoI, gparetoII, gparetoIV, gdagum, gburr, gfisk

Examples

# Gini index for the Pareto (III) distribution with inequality parameter 'g = 0.3'.
gparetoIII(inequality = 0.3)

# Gini indices for the Pareto (III) distribution with different inequality parameters.
gparetoIII(inequality = seq(0.1, 0.9, by=0.1))

Gini index for the Pareto (IV) distribution with user-defined location, scale, inequality and shape parameters

Description

Calculates the Gini index for the Pareto (IV) distribution with location parameter aa, scale parameter bb, inequality parameter gg and shape parameter ss.

Usage

gparetoIV(
 location = 0,
 scale = 1,
 inequality = 1,
 shape = 1
)

Arguments

location

A non-negative real number specifying the location parameter aa of the Pareto (IV) distribution. The default value is location = 0.

scale

A positive real number specifying the scale parameter bb of the Pareto (IV) distribution. The default value is scale = 1.

inequality

A positive real number specifying the inequality parameter gg of the Pareto (IV) distribution. The default value is inequality = 1.

shape

A positive real number specifying the shape parameter ss of the Pareto (IV) distribution. The default value is shape = 1.

Details

The Pareto (IV) distribution with location parameter aa, scale parameter bb, inequality parameter gg, shape parameter ss and denoted as ParetoIV(a,b,g,s), where a0a \geq 0, b>0b>0, g>0g>0 and s>0s>0, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022)

f(y)=sbg(yab)1/g1[1+(yab)1/g](s+1),f(y)= \displaystyle \frac{s}{bg} \left( \frac{y-a}{b}\right)^{1/g-1} \left[1 + \left( \frac{y-a}{b}\right)^{1/g} \right]^{-(s+1)},

and a cumulative distribution function given by

F(y)=1[1+(yab)1/g]s,F(y)=1- \left[1 + \displaystyle \left( \frac{y-a}{b}\right)^{1/g} \right]^{-s},

where y>ay>a.

The Gini index can be computed as

G=2(0.51E[y]010Q(y)yf(y)dy),G = 2\left(0.5 - \displaystyle \frac{1}{E[y]}\int_{0}^{1}\int_{0}^{Q(y)}yf(y)dy\right),

where Q(y)Q(y) is the quantile function of the Pareto (IV) distribution, and E[y]E[y] is the expectation of the distribution. If location is not specified it assumes the default value of 0, and the remaining parameters assume the default value of 1. The Pareto (IV) distribution is related to:

1. The Burr distribution: ParetoIV(0,b,g,s)=BurrXII(b,1/g,s)ParetoIV(0,b,g,s) = BurrXII(b,1/g,s).

2. The Pareto (I) distribution: ParetoIV(b,b,1,s)=ParetoI(b,s)ParetoIV(b,b,1,s) = ParetoI(b,s).

3. The Pareto (II) distribution: ParetoIV(a,b,1,s)=ParetoII(a,b,s)ParetoIV(a,b,1,s) = ParetoII(a,b,s).

4. The Pareto (III) distribution: ParetoIV(a,b,g,1)=ParetoIII(a,b,g)ParetoIV(a,b,g,1) = ParetoIII(a,b,g).

Value

A numeric value with the Gini index. A NA is returned when a parameter is non-numeric or positive, except for the location parameter that can be equal to 0.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.

See Also

gpareto, gparetoI, gparetoII, gparetoIII, gdagum, gburr, gfisk

Examples

# Gini index for the Pareto (IV) distribution with 'a = 1', 'b = 1',  'g = 0.5', 's = 1'.
gparetoIV(location = 1, scale = 1, inequality = 0.5, shape = 1)

# Gini index for the Pareto (IV) distribution with 'a = 1', 'b = 1',  'g = 2', 's = 3'.
gparetoIV(location = 1, scale = 1, inequality = 2, shape = 3)

Samples from a set of continuous probability distributions with user-defined Gini indices

Description

Draws samples from a continuous probability distribution with Gini indices set by the user.

Usage

gsample(
  n,
  gini,
  distribution = c("pareto", "dagum", "lognormal", "fisk", "weibull", "gamma",
  "chisq", "frechet"),
  scale = 1,
  meanlog = 0,
  shape2.p = 1,
  location = 0
)

Arguments

n

An integer specifying the sample(s) size.

gini

A numeric vector of values between 0 and 1, indicating the Gini indices for the continuous distribution from which samples are generated.

distribution

A character string specifying the continuous probability distribution to be used to generate the sample. Possible values are "pareto", "dagum", "lognormal", "fisk", "weibull", "gamma", "chisq" and "frechet" for the Pareto, Dagum, logNormal, Fisk (Log-logistic), Weibull, Gamma, Chi-Squared and Frechet distributions, respectively.

scale

The scale parameter for the Pareto, Dagum, Fisk, Weibull, Gamma and Frechet distributions. The default value is scale = 1.

meanlog

The mean for the logNormal distribution on the log scale. The default value is meanlog = 0.

shape2.p

The scale parameter p for the Dagum distribution. The default value is shape2.p = 1.

location

The location parameter for the Frechet distribution. The default value is location = 0.

Details

For each continuous probability distribution, parameters involved in the theoretical formulation of the Gini index (GG) are selected such that GG takes the values set in the argument gini. Additional parameters required in the distribution can be set by the user, and default values are provided. scale is the scale parameter for the Pareto, Dagum, Fisk, Weibull, Gamma and Frechet distributions, meanlog is the mean for the Lognormal distribution on the log scale, shape2.p is the scale parameter p for the Dagum distribution, and location is the location parameter for the Frechet distribution. Additional information for the continuous probability distributions used by this function can be seen in Kleiber and Kotz (2003), Johnson et al. (1995) and Yee (2022).

Value

A numeric vector (or matrix of order nn ×\times size(ginigini)) with the samples by columns extracted from the continuous probability distribution stated in distribution and the Gini indices corresponding to the vector gini.

Note

Underestimation problems may appear for large heavy-tailed distributions (Pareto, Dagum, Lognormal, Fisk and Frechet) and large values of gini. A larger sample size may solve/minimize this problem.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.

See Also

gpareto, gdagum, glnorm, gfisk, gweibull, ggamma, gchisq, gfrechet

Examples

# Sample from the Pareto distribution and parameter selected such that the Gini index is 0.3.
gsample(n = 10, gini = 0.3, "pareto")

# Samples from the Pareto distribution and gini indices 0.2 and 0.5.
gsample(n = 10, gini = c(0.2,0.5), "par", scale = 2)

# Samples from the Lognormal distribution and gini indices 0.2 and 0.5.
gsample(n = 10, gini = c(0.2,0.5), "lognormal", meanlog = 5)

# Samples from the Dagum distribution and gini indices 0.2 and 0.5.
gsample(n = 10, gini = c(0.2,0.5), "dagum")

# Samples from the Fisk (Log-logistic) distribution and gini indices 0.3 and 0.6.
gsample(n = 10, gini = c(0.3,0.6), "fisk")

# Sample from the Weibull distribution and parameter selected such that the Gini index is 0.2.
gsample(n = 10, gini = 0.2, "weibull")

# Sample from the Gamma distribution and parameter selected such that the Gini index is 0.3.
gsample(n = 10, gini = 0.2, "gamma")

# Samples from the Chi-Squared distribution and gini indices 0.3 and 0.6..
gsample(n = 10, gini = c(0.3,0.6), "chi")

# Samples from the Frechet distribution and gini indices 0.3 and 0.6.
gsample(n = 10, gini = c(0.3,0.6), "fre")

Gini index for the Uniform distribution with user-defined lower and upper limits

Description

Calculates the Gini index for the Uniform distribution with lower limit min and upper limit max.

Usage

gunif(
 min = 0,
 max = 1
)

Arguments

min

A non-negative real number specifying the lower limit of the Uniform distribution. The default value is min = 0.

max

A positive real number higher than min specifying the upper limit of the Uniform distribution. The default value is max = 1.

Details

The Uniform distribution with lower and upper limits minmin and maxmax, and denoted as U(min,max)U(min,max), where min0\min \geq 0, max>0\max >0, min<max\min < \max and both must be finite, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022)

f(y)=1maxmin,f(y)= \displaystyle \frac{1}{\max - \min},

where y[min,max]y \in [\min, \max]. The cumulative distribution function is given by

F(y)={0,y<min;yminmaxmin,y[min,max];1,y>max.F(y) = \left\{ \begin{array}{cl} 0 , & y < \min; \\ \displaystyle \frac{y-\min}{\max - \min}, & y \in [\min, \max]; \\ 1 , & y > \max. \end{array} \right.

The Gini index can be computed as

G=maxmin3(min+max).G = \displaystyle \frac{\max - \min}{3(\min + \max)}.

If min or max are not specified they assume the default values of 0 and 1, respectively.

Value

A numeric value with the Gini index. A NA value is returned when a limit is non-numeric or non-negative, or minmax\min \geq \max.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Yee, T. W. (2022). VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7, https://CRAN.R-project.org/package=VGAM.

See Also

gbeta, ggamma, gchisq, gf

Examples

# Gini index for the Uniform distribution with lower limit 0 and upper limit 1.
gunif()

# Gini index for the Uniform distribution with lower limit 10 and upper limit 190.
gunif(min = 10, max = 190)

Gini index for the Weibull distribution with user-defined shape parameters

Description

Calculate the Gini indices for the Weibull distribution with shape parameters aa.

Usage

gweibull(shape)

Arguments

shape

A vector of positive real numbers specifying shape parameters aa of the Weibull distribution.

Details

The Weibull distribution with scale parameter σ\sigma, shape parameter aa, and denoted as Weibull(σ,a)Weibull(\sigma, a), where σ>0\sigma>0 and a>0a>0, has a probability density function given by (Kleiber and Kotz, 2003; Johnson et al., 1995; Yee, 2022)

f(y)=aσ(yσ)a1e(y/σ)a,f(y) = \displaystyle \frac{a}{\sigma}\left(\frac{y}{\sigma}\right)^{a-1}e^{-(y/\sigma)^{a}},

and a cumulative distribution function given by

F(y)=1e(y/σ)a,F(y) = \displaystyle 1 - e^{-(y/\sigma)^{a}},

where y0y \geq 0.

The Gini index can be computed as

G=121/a.G = 1-2^{-1/a}.

Value

A numeric vector with the Gini indices. A NA is returned when a shape parameter is non-numeric or non-positive.

Note

The Gini index of the Weibull distribution does not depend on its scale parameter.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences, Hoboken, NJ, USA: Wiley-Interscience.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

See Also

gbeta, ggamma, gchisq, gunif

Examples

# Gini index for the Weibull distribution with 'shape = 1'.
gweibull(shape = 1)

# Gini indices for the Weibull distribution and different shape parameters.
gweibull(shape = 1:10)

Comparisons of variance estimators and confidence intervals for the Gini index in infinite populations

Description

Compares variance estimates and confidence intervals for the Gini index in infinite populations.

Usage

icompareCI(
 y,
 B = 1000L,
 alpha = 0.05,
 plotCI = TRUE,
 digitsgini = 2L,
 digitsvar = 4L,
 cum.sums = NULL,
 na.rm = TRUE,
 precisionEL = 1e-4,
 maxiterEL = 100L,
 line.types = c(1L, 2L),
 colors = c("red", "green"),
 save.plot = FALSE
)

Arguments

y

A vector with the non-negative real numbers to be used for estimating the Gini index. This argument can be missing if argument cum.sums is provided.

B

A single integer specifying the number of bootstrap replicates. The default value is B = 1000L.

alpha

A single numeric value between 0 and 1 specifying the confidence level 1-alpha to be used for computing the confidence interval for the Gini. Some authors call alpha the significance level. The default value is alpha = 0.05.

plotCI

A 'TRUE/FALSE' logical value indicating whether confidence intervals are compared using a plot. The default value is plotCI = TRUE.

digitsgini

A single integer specifying the number of decimals used in the estimation of the Gini index and confidence intervals. The default value is digitsgini = 2L.

digitsvar

A single integer specifying the number of decimals used in the variance estimation of the Gini index. The default value is digitsvar = 4L.

cum.sums

A numeric vector of non-negative real numbers specifying the cumulative sums of the variable used to estimate the Gini index. This argument can be NULL if argument y is provided. The default value is cum.sums = NULL.

na.rm

A 'TRUE/FALSE' logical value indicating whether the NA should be removed before the computation proceeds. The default value is na.rm = TRUE.

precisionEL

A single numeric value specifying the precision for the confidence interval based on the empirical likelihood method. The default value is precisionEL = 1e-4, i.e., limits of the confidence interval have a total of 4 decimal places.

maxiterEL

A single integer specifying the maximum number of iterations allowed for the convergence in the empirical likelihood method. The default value is maxiterEL = 100L.

line.types

A numeric vector with length equal 2 specifying the line types. See the function plot for the different line types. The default value is lty = c(1L,2L).

colors

A numeric vector with length equal 2 specifying the colors for lines of the plot. The default value is colors = c("red", "green").

save.plot

A 'TRUE/FALSE' logical value indicating whether the ggplot object of the plot comparing the confidence intervals should be saved in the output. The default value is save.plot = FALSE.

Details

For a sample SS, with size nn, derived from an infinite population, the Gini index is estimated by two different versions (see Muñoz et al., 2023 for more details):

G^=2yn2iSiy(i)n+1n;\widehat{G} = \displaystyle \frac{2}{\overline{y}n^{2}}\sum_{i \in S}iy_{(i)} - \frac{n+1}{n};

G^bc=2yn(n1)iSiy(i)n+1n1,\widehat{G}^{bc} = \displaystyle \frac{2}{\overline{y}n(n-1)}\sum_{i \in S}iy_{(i)} - \frac{n+1}{n-1},

where the label bcbc indicates that the bias correction is applied. The table below sumarises the various types of variances and confidence intervals that computes this function. Methods based on the jackknife technique use the fast algorithm suggested by Ogwang (2000). The linearization technique for variance estimation (Deville, 1999) has been applied to the following estimators of the Gini index (Berger, 2008; Langel and Tille, 2013):

G^a=12yn2iSjSyiyj\widehat{G}^{a} = \displaystyle \frac{1}{2\overline{y}n^{2}}\sum_{i \in S}\sum_{j\in S} |y_i-y_j|

and

G^b=2yniSyiF^n(yi)1,\widehat{G}^{b} = \displaystyle \frac{2}{\overline{y}n}\sum_{i \in S}y_{i}\widehat{F}_{n}(y_{i}) - 1,

where

F^n(yi)=1njSδ(yjyi).\widehat{F}_{n}(y_i)=\frac{1}{n}\sum_{j \in S}\delta(y_j \leq y_i).

zalinearization and zblinearization linearizate, respectively, the estimators G^a\widehat{G}^{a} and G^b\widehat{G}^{b}. The percentile bootstrap (see Qin et al., 2010) is computed using pbootstrap. Bca is the bias corrected bootstrap confidence interval (Efron and Tibshirani, 1993). ELchisq and ELboot are the confidence intervals based on the empirical likelihood method. The vignette vignette("GiniVarInterval") contains a detailed description of the various methods for variance estimation and confidence intervals for the Gini index.

Interval Variance Critical values References
_______________ ____________ __________________ __________________________
zjackknife Jackknife Normal Berger (2008)
tjackknife Jackknife Studentized bootstrap Biewen (2002); Berger (2008)
zalinearization Linearization Normal Langel and Tille (2013)
zblinearization Linearization Normal Berger (2008)
talinearization Linearization Studentized bootstrap Langel and Tille (2013)
tblinearization Linearization Studentized bootstrap Biewen (2002); Berger (2008)
pBootstrap Bootstrap Percentile bootstrap Qin et al. (2010)
BCa Bootstrap BCa bootstrap Davison and Hinkley (1997)
ELchisq Linearization Chi-Squared Qin et al. (2010)
ELboot Bootstrap Percentile bootstrap Qin et al. (2010)

Value

If save.plot = FALSE, a data frame with columns:

  1. interval. The method used to construct the confidence interval.

  2. bc. A 'TRUE/FALSE' logical value indicating whether the bias correction is applied.

  3. gini. The estimation of the Gini index.

  4. lowerlimit. The lower limit of the confidence interval.

  5. upperlimit. The upper limit of the confidence interval.

  6. var.gini. The variance estimation for the estimator of the Gini index.

If save.plot = TRUE, a list with two components: (i) 'base.CI' a data frame of six columns as just described and (ii) 'plot' a (ggplot) description of the plot, which is a list with components that contain the plot itself, the data, information about the scales, panels, etc. As a side-effect, a plot that compares the various methods for constructing confidence intervals for the Gini index is displayed. **ggplot2** is needed to be installed for this option to work.

If plotCI = TRUE, as a side-effect, a plot that compares the various methods for constructing confidence intervals for the Gini index is displayed. **ggplot2** is needed to be installed for this option to work.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Berger, Y. G. (2008). A note on the asymptotic equivalence of jackknife and linearization variance estimation for the Gini Coefficient. Journal of Official Statistics, 24(4), 541-555.

Biewen, M. (2002). Bootstrap inference for inequality, mobility and poverty measurement. Journal of Econometrics, 108(2), 317-342.

Davison, A. C., and Hinkley, D. V. (1997). Bootstrap Methods and Their Application (Cambridge Series in Statistical and Probabilistic Mathematics, No 1)–Cambridge University Press.

Deville, J.C. (1999). Variance Estimation for Complex Statistics and Estimators: Linearization and Residual Techniques. Survey Methodology, 25, 193–203.

Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman and Hall, New York, London.

Langel, M., and Tille, Y. (2013). Variance estimation of the Gini index: revisiting a result several times published. Journal of the Royal Statistical Society: Series A (Statistics in Society), 176(2), 521-540.

Muñoz, J. F., Moya-Fernández, P. J., and Álvarez-Verdejo, E. (2023). Exploring and Correcting the Bias in the Estimation of the Gini Measure of Inequality. Sociological Methods & Research. https://doi.org/10.1177/00491241231176847

Ogwang, T. (2000). A convenient method of computing the Gini index and its standard error. Oxford Bulletin of Economics and Statistics, 62(1), 123-123.

Qin, Y., Rao, J. N. K., and Wu, C. (2010). Empirical likelihood confidence intervals for the Gini measure of income inequality. Economic Modelling, 27(6), 1429-1435.

See Also

igini, iginindex

Examples

# Sample, with size 50, from a Lognormal distribution. The true Gini index is 0.5.
set.seed(123)
y <- gsample(n = 50, gini = 0.5, distribution = "lognormal")

# Estimation of the Gini index and confidence intervals using different methods.
icompareCI(y)

Gini index, variances and confidence intervals in infinite populations

Description

Estimation of the Gini index and computation of variances and confidence interval for infinite populations.

Usage

igini(
  y,
  bias.correction = TRUE,
  interval = NULL,
  B = 1000L,
  alpha = 0.05,
  cum.sums = NULL,
  na.rm = TRUE,
  precisionEL = 1e-04,
  maxiterEL = 100L,
  large.sample = FALSE
)

Arguments

y

A vector with the non-negative real numbers to be used for estimating the Gini index. This argument can be missing if argument cum.sums is provided.

bias.correction

A 'TRUE/FALSE' logical value indicating whether the bias correction should be applied to the estimation of the Gini index. The default value is bias.correction = TRUE.

interval

A character string specifying the type of variance estimation and confidence interval to be used, or NULL (the default value) to omit the computation of both variance and confidence interval. Possible values are "zjackknife", "tjackknife", "zalinearization", "zblinearization", "talinearization", "tblinearization", "pbootstrap", "BCa", "ELchisq" and "ELboot". The default value is interval = NULL.

B

A single integer specifying the number of bootstrap replicates. The default value is B = 1000L.

alpha

A single numeric value between 0 and 1. If interval is not NULL, the confidence level to be used for computing the confidence interval for the Gini is 1-alpha. Some authors call alpha the significance level. The default value is alpha = 0.05.

cum.sums

A vector with the non-negative real numbers specifying the cumulative sums of the variable used to estimate the Gini index. This argument can be NULL if argument y is provided. The default value is cum.sums = NULL.

na.rm

A 'TRUE/FALSE' logical value indicating whether NA's should be removed before the computation proceeds. The default value is na.rm = TRUE.

precisionEL

A single numeric value specifying the precision for the confidence interval based on the empirical likelihood method. The default value is precisionEL = 1e-4, i.e., limits of the confidence interval have a total of 4 decimal places.

maxiterEL

A single integer specifying the maximal number of iterations allowed for the convergene of the empirical likelihood method. The default value is maxiterEL = 100L.

large.sample

A 'TRUE/FALSE' logical value indicating whether the sample is large to apply a faster algorithm to sort the sample values. The default value is large.sample = FALSE.

Details

For a sample SS, with size nn, derived from an infinite population, the Gini index is estimated by

G^=2yn2iSiy(i)n+1n\widehat{G} = \displaystyle \frac{2}{\overline{y}n^{2}}\sum_{i \in S}iy_{(i)} - \frac{n+1}{n}

when bias.correction = FALSE, and by

G^bc=2yn(n1)iSiy(i)n+1n1\widehat{G}^{bc} = \displaystyle \frac{2}{\overline{y}n(n-1)}\sum_{i \in S}iy_{(i)} - \frac{n+1}{n-1}

when bias.correction = TRUE. For more details, see Muñoz et al. (2023). The table below sumarises the various types of variances and confidence intervals that computes this function. Methods based on the jackknife technique use the fast algorithm suggested by Ogwang (2000). The linearization technique for variance estimation (Deville, 1999) has been applied to the following estimators of the Gini index (Berger, 2008; Langel and Tille, 2013):

G^a=12yn2iSjSyiyj\widehat{G}^{a} = \displaystyle \frac{1}{2\overline{y}n^{2}}\sum_{i \in S}\sum_{j\in S} |y_i-y_j|

and

G^b=2yniSyiF^n(yi)1,\widehat{G}^{b} = \displaystyle \frac{2}{\overline{y}n}\sum_{i \in S}y_{i}\widehat{F}_{n}(y_{i}) - 1,

where

F^n(yi)=1njSδ(yjyi).\widehat{F}_{n}(y_i)=\frac{1}{n}\sum_{j \in S}\delta(y_j \leq y_i).

zalinearization and zblinearization linearizate, respectively, the estimators G^a\widehat{G}^{a} and G^b\widehat{G}^{b}. The percentile bootstrap (see Qin et al., 2010) is computed using pbootstrap. Bca is the bias corrected bootstrap confidence interval (Efron and Tibshirani, 1993). ELchisq and ELboot are the confidence intervals based on the empirical likelihood method. The vignette vignette("GiniVarInterval") contains a detailed description of the various methods for variance estimation and confidence intervals for the Gini index.

Interval Variance Critical values References
_______________ ____________ __________________ __________________________
zjackknife Jackknife Normal Berger (2008)
tjackknife Jackknife Studentized bootstrap Biewen (2002); Berger (2008)
zalinearization Linearization Normal Langel and Tille (2013)
zblinearization Linearization Normal Berger (2008)
talinearization Linearization Studentized bootstrap Langel and Tille (2013)
tblinearization Linearization Studentized bootstrap Biewen (2002); Berger (2008)
pBootstrap Bootstrap Percentile bootstrap Qin et al. (2010)
BCa Bootstrap BCa bootstrap Davison and Hinkley (1997)
ELchisq Linearization Chi-Squared Qin et al. (2010)
ELboot Bootstrap Percentile bootstrap Qin et al. (2010)

Value

When interval = NULL, a single numeric value between 0 and 1, containing the estimation of the Gini index based on the vector y or the vector cum.sums. When interval is not NULL, a list of 3 components: a single numeric value with the estimation of the Gini index; a single numeric value with the variance estimation of the Gini index; and a numeric matrix with 1 row and 2 columns containing the lower and upper limits of the confidence intervals for the Gini index.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Berger, Y. G. (2008). A note on the asymptotic equivalence of jackknife and linearization variance estimation for the Gini Coefficient. Journal of Official Statistics, 24(4), 541-555.

Biewen, M. (2002). Bootstrap inference for inequality, mobility and poverty measurement. Journal of Econometrics, 108(2), 317-342.

Davison, A. C., and Hinkley, D. V. (1997). Bootstrap Methods and Their Application (Cambridge Series in Statistical and Probabilistic Mathematics, No 1)–Cambridge University Press.

Deville, J.C. (1999). Variance Estimation for Complex Statistics and Estimators: Linearization and Residual Techniques. Survey Methodology, 25, 193–203.

Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman and Hall, New York, London.

Langel, M., and Tille, Y. (2013). Variance estimation of the Gini index: revisiting a result several times published. Journal of the Royal Statistical Society: Series A (Statistics in Society), 176(2), 521-540.

Muñoz, J. F., Moya-Fernández, P. J., and Álvarez-Verdejo, E. (2023). Exploring and Correcting the Bias in the Estimation of the Gini Measure of Inequality. Sociological Methods & Research. https://doi.org/10.1177/00491241231176847

Ogwang, T. (2000). A convenient method of computing the Gini index and its standard error. Oxford Bulletin of Economics and Statistics, 62(1), 123-123.

Qin, Y., Rao, J. N. K., and Wu, C. (2010). Empirical likelihood confidence intervals for the Gini measure of income inequality. Economic Modelling, 27(6), 1429-1435.

See Also

icompareCI, iginindex

Examples

# Sample, with size 50, from a Lognormal distribution. The true Gini index is 0.5.
set.seed(123)
y <- gsample(n = 50, gini = 0.5, distribution = "lognormal")

# Bias corrected estimation of the Gini index.
igini(y)

# Estimation of the Gini index and confidence interval based on jackknife and studentized bootstrap.
igini(y, interval = "tjackknife")

Gini index for infinite populations and different estimation methods.

Description

Estimates the Gini index in infinite populations, using different methods.

Usage

iginindex(
  y,
  method = 5L,
  bias.correction = TRUE,
  cum.sums = NULL,
  na.rm = TRUE,
  useRcpp = TRUE
)

Arguments

y

A vector with the non-negative real numbers to be used for estimating the Gini index. This argument can be missing if argument cum.sums is provided.

method

An integer between 1 and 10 selecting one of the 10 methods detailed below for estimating the Gini index in infinite populations. The default method is method = 5L.

bias.correction

A 'TRUE/FALSE' logical value indicating whether the bias correction should be applied to the estimation of the Gini index. The default value is bias.correction = TRUE.

cum.sums

A vector with the non-negative real numbers specifying the cumulative sums of the variable used to estimate the Gini index. This argument can be NULL if argument y is provided. The default value is cum.sums = NULL.

na.rm

A 'TRUE/FALSE' logical value indicating whether NA's should be removed before the computation proceeds. The default value is na.rm = TRUE.

useRcpp

A 'TRUE/FALSE' logical value indicating whether Rcpp (useRcpp = TRUE) or R (useRcpp = FALSE) is used for computation. The default value is UseRcpp = TRUE.

Details

For a sample SS, with size nn, derived from an infinite population, different formulations of the Gini index have been proposed in the literature, but they only provide two different outputs.

This function estimates the Gini index using the various formulations, and both R and ⁠C++⁠ codes are implemented. This can be useful for research purposes, and speed comparisons can be made. The argument cum.sums does not require that the cumulative sums are based on the non-decreasing order of the variable y.

The different methods for estimating the Gini index are (see Wang et al., 2016; Giorgi and Gigliarano, 2017; Mukhopadhyay and Sengupta, 2021; Muñoz et al., 2023):

method = 1

G^1=12yn2iSjSyiyj;\widehat{G}_1 = \displaystyle \frac{1}{2\overline{y}n^{2}}\sum_{i \in S}\sum_{j\in S} |y_i-y_j|;

G^1bc=12yn(n1)iSjSyiyj,\widehat{G}_{1}^{bc} = \displaystyle \frac{1}{2\overline{y}n(n-1)}\sum_{i \in S} \sum_{j \in S} |y_i-y_j|,

where y=n1iSyi\overline{y} = n^{-1}\sum_{i \in S}y_i is the sample mean and the label bcbc indicates that the bias correction is applied to the estimation of the Gini index.

method = 2

G^2=n1ni=1n1(piqi)i=1n1pi;\widehat{G}_{2} = \displaystyle \frac{n-1}{n}\frac{\sum_{i=1}^{n-1}(p_i-q_i)}{\sum_{i=1}^{n-1}pi};

G^2bc=i=1n1(piqi)i=1n1pi,\widehat{G}_{2}^{bc} = \displaystyle \frac{\sum_{i=1}^{n-1}(p_i-q_i)}{\sum_{i=1}^{n-1}pi},

where

pi=in;qi=yi+yn+,p_i= \displaystyle \frac{i}{n}; \quad q_i= \frac{y_{i}^{+}}{y_{n}^{+}},

and yi+=j=1iy(j)y_{i}^{+}=\sum_{j=1}^{i}y_{(j)}, with i={1,,n}i=\{1,\ldots,n\}, are the cumulative sums of the ordered values y(i)y_{(i)} (in non-decreasing order) of the variable of interest yy.

method = 3

G^3=n1n2ni=1n1qi;\widehat{G}_{3} = \displaystyle \frac{n-1}{n} - \frac{2}{n}\sum_{i=1}^{n-1}q_i;

G^3bc=12n1i=1n1qi.\widehat{G}_{3}^{bc} = 1 - \displaystyle \frac{2}{n-1}\sum_{i=1}^{n-1}q_i.

method = 4

G^4=1i=0n1(qi+1+qi)(pi+1pi);\widehat{G}_{4} = 1 - \displaystyle \sum_{i=0}^{n-1}(q_{i+1} + q_i)(p_{i+1} - p_i);

G^4bc=nn1[1i=0n1(qi+1+qi)(pi+1pi)],\widehat{G}_{4}^{bc} = \displaystyle \frac{n}{n-1}\left[1 - \sum_{i=0}^{n-1}(q_{i+1} + q_i)(p_{i+1} - p_i)\right],

where p0=q0=0.p_0=q_0=0.

method = 5

G^5=2yn2iSiy(i)n+1n;\widehat{G}_{5} = \displaystyle \frac{2}{\overline{y}n^{2}}\sum_{i \in S}iy_{(i)} - \frac{n+1}{n};

G^5bc=2yn(n1)iSiy(i)n+1n1.\widehat{G}_{5}^{bc} = \displaystyle \frac{2}{\overline{y}n(n-1)}\sum_{i \in S}iy_{(i)} - \frac{n+1}{n-1}.

method = 6

G^6=2yncov(i,y(i));\widehat{G}_{6} = \displaystyle \frac{2}{\overline{y}n}cov(i,y_{(i)});

G^6bc=2y(n1)cov(i,y(i)).\widehat{G}_{6}^{bc} = \displaystyle \frac{2}{\overline{y}(n-1)}cov(i,y_{(i)}).

method = 7

G^7=1yn2iSjSyiyjF^n(yi)F^n(yj);\widehat{G}_{7} = \displaystyle \frac{1}{\overline{y}n^2}\sum_{i \in S}\sum_{j\in S}|y_i-y_j|\cdot |\widehat{F}_{n}^{\ast}(y_{i})-\widehat{F}_{n}^{\ast}(y_{j})|;

G^7bc=1yn(n1)iSjSyiyjF^n(yi)F^n(yj),\widehat{G}_{7}^{bc} = \displaystyle \frac{1}{\overline{y}n(n-1)}\sum_{i\in S}\sum_{j \in S}|y_i-y_j|\cdot |\widehat{F}_{n}^{\ast}(y_{i})-\widehat{F}_{n}^{\ast}(y_{j})|,

where

F^n(t)=1niS[δ(yi<t)+0.5δ(yi=t)]\widehat{F}_{n}^{\ast}(t)= \displaystyle \frac{1}{n}\sum_{i \in S}[\delta(y_i < t) + 0.5\delta(y_i = t)]

is the smooth (mid-point) distribution function.

method = 8

G^8=11yn2iSjSmin(yi,yj);\widehat{G}_{8} = 1 - \displaystyle \frac{1}{\overline{y}n^2}\sum_{i \in S}\sum_{j \in S}min(y_i,y_j);

G^8bc=11yn(n1)iSjSjimin(yi,yj).\widehat{G}_{8}^{bc} = 1 - \displaystyle \frac{1}{\overline{y}n(n-1)}\sum_{i \in S}\sum_{\substack{j \in S\\ j\neq i} }min(y_i,y_j).

method = 9

G^9=2yniSyiF^n(yi)1;\widehat{G}_{9} = \displaystyle \frac{2}{\overline{y}n}\sum_{i \in S}y_{i}\widehat{F}_{n}^{\ast}(y_{i}) - 1;

G^9bc=2y(n1)iSyiF^n(yi)nn1.\widehat{G}_{9}^{bc} = \displaystyle \frac{2}{\overline{y}(n-1)}\sum_{i \in S}y_{i}\widehat{F}_{n}^{\ast}(y_{i}) - \frac{n}{n-1}.

method = 10

G^10=n12yn(n2)1ii1<i2nyi1yi2;\widehat{G}_{10} = \displaystyle \frac{n-1}{2\overline{y}n}\binom{n}{2}^{-1}\sum_{i \leq i_{1} < i_{2} \leq n}|y_{i_{1}}-y_{i_{2}}|;

G^10bc=12y(n2)1ii1<i2nyi1yi2.\widehat{G}_{10}^{bc} = \displaystyle \frac{1}{2\overline{y}}\binom{n}{2}^{-1}\sum_{i \leq i_{1} < i_{2} \leq n}|y_{i_{1}}-y_{i_{2}}|.

Value

A single numeric value between 0 and 1 containing the estimation of the Gini index based on the vector y or the vector cum.sums.

Author(s)

Juan F Munoz [email protected]

Jose M Pavia [email protected]

Encarnacion Alvarez [email protected]

References

Giorgi, G. M., and Gigliarano, C. (2017). The Gini concentration index: a review of the inference literature. Journal of Economic Surveys, 31(4), 1130-1148.

Mukhopadhyay, N., and Sengupta, P. P. (Eds.). (2021). Gini inequality index: Methods and applications. CRC press.

Muñoz, J. F., Moya-Fernández, P. J., and Álvarez-Verdejo, E. (2023). Exploring and Correcting the Bias in the Estimation of the Gini Measure of Inequality. Sociological Methods & Research. https://doi.org/10.1177/00491241231176847

Wang, D., Zhao, Y., and Gilmore, D. W. (2016). Jackknife empirical likelihood confidence interval for the Gini index. Statistics & Probability Letters, 110, 289-295.

See Also

igini, icompareCI

Examples

# Sample, with size 50, from a Lognormal distribution. The true Gini index is 0.5.
set.seed(123)
y <- gsample(n = 50, gini = 0.5, meanlog = 5)

# Estimation of the Gini index using the method = 5, bias correction, and Rcpp.
iginindex(y)

# Estimation of the Gini index using the method = 5, bias correction, and R.
iginindex(y, useRcpp = FALSE)

#Comparing the computation time for the various estimation methods and using R
microbenchmark::microbenchmark(
iginindex(y, method = 1,  useRcpp = FALSE),
iginindex(y, method = 2,  useRcpp = FALSE),
iginindex(y, method = 3,  useRcpp = FALSE),
iginindex(y, method = 4,  useRcpp = FALSE),
iginindex(y, method = 5,  useRcpp = FALSE),
iginindex(y, method = 6,  useRcpp = FALSE),
iginindex(y, method = 7,  useRcpp = FALSE),
iginindex(y, method = 8,  useRcpp = FALSE),
iginindex(y, method = 9,  useRcpp = FALSE),
iginindex(y, method = 10, useRcpp = FALSE)
)

# Comparing the computation time for the various estimation methods and using Rcpp
microbenchmark::microbenchmark(
iginindex(y, method = 1),
iginindex(y, method = 2),
iginindex(y, method = 3),
iginindex(y, method = 4),
iginindex(y, method = 5),
iginindex(y, method = 6),
iginindex(y, method = 7),
iginindex(y, method = 8),
iginindex(y, method = 9),
iginindex(y, method = 10) )