為什麼pgmm函式每次都會出錯?
我想在 R 中使用延遲為 1 的 pgmm 函式,但每次都會收到以下錯誤:
Error in pdim.default(index[[1L]], index[[2L]]) : duplicate couples (id-time) In addition: Warning message: In pdata.frame(data, index) : duplicate couples (id-time) in resulting pdata.frame to find out which, use, e.g., table(index(your_pdataframe), useNA = "ifany")
我從我所讀到的內容中了解到,這是因為有多個 year-id 組合,其中一些是相同的。但是,我不明白在我的情況下 ID 列是什麼,因此,我不知道我應該如何合併它們(這是我在網際網路上看到的解決問題的方法)。
我執行以下程式碼:
growthregressionpgmm <- pgmm(HDI ~ ND + GDPpcap + Fertility_rate + Life_expectancy + GDP_growth + CO2_emissions | lag(ND, 1) + lag(GDPpcap,1) + lag(Fertility_rate, 1) + lag(Life_expectancy, 1) + lag(GDP_growth, 1) + lag(CO2_emissions, 1), data=fulldata, effect="twoways", model="twosteps")
這是我的部分數據:
> dput(fulldata)
structure(list(Year = c("1990", "1992", "1993", "1991", "1991", "1992", "1993", "1993", "1993", "1993", "1990", "1990", "1990", "1991", "1992", "1992", "1993", "1994", "1996", "1996", "1996", "1997", "1997", "1997", "1997", "1998", "1997", "1998", "1998", "1998", "1998", "1999", "1999", "1999", "2000", "2000", "2000", "1998", "1998", "1999", "2001", "2001", "2001", "2002", "2002", "2002", "2002", "2002", "2002"), `Disaster Type` = c("Storm", "Storm", "Storm", "Storm", "Landslide", "Flood", "Earthquake", "Storm", "Flood", "Storm", "Storm", "Earthquake", "Storm", "Storm", "Storm", "Storm", "Storm", "Volcanic activity", "Volcanic activity", "Volcanic activity", "Landslide", "Drought", "Storm", "Storm", "Wildfire", "Earthquake", "Storm", "Storm", "Storm", "Drought", "Drought", "Storm", "Drought", "Flood", "Earthquake", "Epidemic", "Epidemic", "Drought", "Storm", "Earthquake", "Earthquake", "Epidemic", "Storm", "Earthquake", "Storm", "Storm", "Storm", "Earthquake", "Volcanic activity"), Country = c("Fiji", "Fiji", "Fiji", "Marshall Islands (the)", "Papua New Guinea", "Papua New Guinea", "Papua New Guinea", "Papua New Guinea", "Papua New Guinea", "Solomon Islands", "Tonga", "Vanuatu", "Samoa", "Samoa", "Vanuatu", "Vanuatu", "Vanuatu", "Papua New Guinea", "Papua New Guinea", "Papua New Guinea", "Papua New Guinea", "Papua New Guinea", "Fiji", "Papua New Guinea", "Papua New Guinea", "Papua New Guinea", "Tonga", "Tonga", "Vanuatu", "Fiji", "Micronesia (Federated States of)", "Fiji", "Kiribati", "Papua New Guinea", "Papua New Guinea", "Micronesia (Federated States of)", "Marshall Islands (the)", "Solomon Islands", "Tonga", "Vanuatu", "Papua New Guinea", "Papua New Guinea", "Tonga", "Papua New Guinea", "Micronesia (Federated States of)", "Solomon Islands", "Micronesia (Federated States of)", "Papua New Guinea", "Papua New Guinea"), `Damage-to-GDP` = c(0.00468994375259065, 0.000726874446693152, 0.0444821683115519, NA, NA, NA, 0.000527417655715239, 0.000158225296714572, 0.000263708827857619, NA, 0.0102282455341558, NA, 0.51022968442747, 0.725915383444357, NA, NA, 0.014519768008217, 0.0109523857215256, NA, NA, NA, NA, 0.0108773473588107, NA, NA, NA, NA, NA, NA, NA, NA, 0.00127934872374966, NA, 0.00438688566628981, NA, NA, NA, NA, NA, NA, NA, NA, 0.146837549271082, NA, NA, NA, 0.00167377438200721, NA, NA), `Affected-people-to-total-population` = c(NA, NA, NA, 0.123956697793571, 0.00105807856741241, 0.0186095867906672, NA, NA, NA, 0.260674395588859, NA, 1.3645077879282e-05, NA, NA, 6.44454469291745e-05, 0.00741122639685506, NA, NA, 5.64520135304186e-05, 0.0002877170956267, 1.50538702747783e-06, 0.0917996981993122, NA, 0.00137699547298968, 0.001468795171189, 0.00176772137543665, 0.0310497935188731, 0.00515293923654052, 0.01348413086349, 0.329254133876227, 0.26615407363596, NA, 1.01238972183387, NA, 0.000855053692241551, 0.0319454013891734, 0.00429531259235907, 0.000972630684450451, 0.0316493527908319, 0.0777866659310954, 3.36422562806829e-05, 0.0002334873010525, NA, 0.00073295258059158, 0.00163505559189012, 0.0025491925260431, NA, 0.000163971494539503, 0.00213162942901354 ), ND = c(0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0), HDI = c(0.662, 0.672, 0.675, NA, 0.389, 0.398, 0.411, 0.411, 0.411, NA, 0.654, NA, 0.633, 0.634, NA, NA, NA, 0.417, 0.433, 0.433, 0.433, 0.435, 0.687, 0.435, 0.435, 0.442, 0.675, 0.678, NA, 0.687, NA, 0.692, NA, 0.445, 0.45, 0.546, NA, NA, 0.678, NA, 0.456, 0.456, 0.679, 0.462, 0.559, 0.486, 0.559, 0.462, 0.462), CO2_emissions = c(1.11735406060889, 1.0048343181516, 0.995724293773337, NA, 0.459388934233434, 0.453425890525591, 0.443088179935911, 0.443088179935911, 0.443088179935911, 0.421240986851407, 0.810011675730259, 0.450328505249944, 0.630676338888104, 0.648444788624182, 0.401746471611781, 0.401746471611781, 0.390243139022436, 0.430716741605883, 0.41194897189593, 0.41194897189593, 0.41194897189593, 0.47330106757541, 0.999162394542445, 0.47330106757541, 0.47330106757541, 0.513086548656141, 1.02473633550337, 0.906999752658917, 0.453257822200498, 0.962398629269006, 1.0844299866923, 0.973254471333977, 0.353565058091886, 0.427890152149318, 0.455899066725996, 1.16085361538891, NA, 0.366049558092927, 0.906999752658917, 0.46529114831876, 0.537041714221921, 0.537041714221921, 0.893600170580889, 0.571820580423017, 1.33619545921704, 0.362125429458561, 1.33619545921704, 0.571820580423017, 0.571820580423017), GDPpcap = c(2926.57253822332, 2956.74562527877, 2977.75102732399, NA, 1490.76034619757, 1658.37860177333, 1915.54964077492, 1915.54964077492, 1915.54964077492, 1739.45172258361, 2570.98733940999, 2644.90678139038, 2407.69721594666, 2335.19360351875, 2643.63498560777, 2643.63498560777, 2586.82481239303, 1982.65811460788, 1968.63071447196, 1968.63071447196, 1968.63071447196, 1845.78398296477, 3131.2099301146, 1845.78398296477, 1845.78398296477, 1733.21642938823, 3251.4184493856, 3317.1836235216, 2778.23593092383, 3142.49796380533, 2528.7225952912, 3392.9763183624, 1648.96379800244, 1723.86945516225, 1643.08412576665, 2707.56199905807, NA, 1834.61951190637, 3317.1836235216, 2737.19559474216, 1606.1955026128, 1606.1955026128, 3547.32785335305, 1571.03933225238, 2791.04965138618, 1256.29997638498, 2791.04965138618, 1571.03933225238, 1571.03933225238), Fertility_rate = c(3.398, 3.352, 3.33, NA, 4.756, 4.723, 4.7, 4.7, 4.7, 5.461, 4.644, 4.926, 5.118, 5.034, 4.841, 4.841, 4.798, 4.683, 4.653, 4.653, 4.653, 4.632, 3.209, 4.632, 4.632, 4.604, 4.34, 4.3, 4.573, 3.171, 4.471, 3.132, 4.11, 4.569, 4.525, 4.3, NA, 4.872, 4.3, 4.531, 4.475, 4.475, 4.236, 4.422, 4.105, 4.606, 4.105, 4.422, 4.422), Life_expectancy = c(65.379, 65.278, 65.218, NA, 56.823, 57.152, 57.473, 57.473, 57.473, 64.961, 68.935, 64.721, 66.281, 66.47, 65.349, 65.349, 65.633, 57.781, 58.344, 58.344, 58.344, 58.594, 65.246, 58.594, 58.594, 58.828, 69.471, 69.535, 66.899, 65.36, 64.298, 65.512, 62.829, 59.049, 59.265, 64.55, NA, 66.665, 69.535, 67.134, 59.487, 59.487, 69.725, 59.722, 64.888, 68.175, 64.888, 59.722, 59.722), CPI = c(49.9250037293591, 59.4846188641146, 61.8987673569764, NA, NA, NA, NA, NA, NA, 45.2759095332228, 32.005120180595, 55.8702936919443, 41.5819326210445, 22.8740281444522, 38.2055963679636, 38.2055963679636, 38.5727156476084, NA, NA, NA, NA, NA, 67.6520051713937, NA, NA, NA, NA, NA, 42.9430895548233, 69.5670319917053, NA, 71.8503332005384, 69.3852858719752, NA, NA, NA, NA, 56.7139321922187, NA, 44.8602213817088, 74.0722891566265, 74.0722891566265, NA, 73.1084337349398, NA, 64.4171779141104, NA, 73.1084337349398, 73.1084337349398), GDP_growth = c(5.80000271926605, 6.10000190987678, 2.13003242138274, NA, 9.54689770861241, 13.8490852689481, 18.2022859527298, 18.2022859527298, 18.2022859527298, 3.99925595238095, -2.04409143450813, 11.6956997985937, -4.42145094868896, -2.30000940480883, 2.5854137275348, 2.5854137275348, 0.735447995455772, 5.94210905967769, 7.73369579796399, 7.73369579796399, 7.73369579796399, -3.90438965639359, -2.1999993686925, -3.90438965639359, -3.90438965639359, -3.76911321783457, 1.22344961737785, 2.45875910300609, 1.17685436113621, 1.30000045153811, 2.8505158599975, 8.79999871927239, -1.53846153846153, 1.85555399408817, -2.49484199260023, 4.83390378617111, NA, 1.29870129870129, 2.45875910300609, 0.337293221894313, -0.121288605564772, -0.121288605564772, 3.74892569298142, -0.158900533082658, 0.546997672499344, -2.79654654654654, 0.546997672499344, -0.158900533082658, -0.158900533082658), Health_expenditure = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 189516095.258765, 22580755.7012392, NA, NA, NA, NA, 215735067.486362, 215735067.486362, 11653482.0216929, 253236754.502794, 23591006.3055499, 58972354.1993895, 23591006.3055499, 253236754.502794, 253236754.502794 )), row.names = c(NA, -49L), class = c("tbl_df", "tbl", "data.frame" ))
有人可以為我提供一個將 id 與 year 合併的程式碼,以便回歸工作嗎?這真的對我有幫助,因為我已經為此苦苦掙扎了一個月!…
plm 包用於面板數據。因此,數據集中必須有帶有**(id,time)的變數來指示面板結構。如果在呼叫pgmm()等估計常式時不使用參數索引,則自動假定前兩列指示面板結構,因此是(id,time)**。
這裡發生的錯誤是面板結構中的重複項。**如果在指示(id,time)**的變數中有 NA,則可能會發生這些情況,或者它們可能只是因為重複某些觀察而發生。因此,請執行以下操作:
- 檢查變數id和time中的 NA,並找到一些方法來處理它們,例如刪除它們(從估計的角度來看,這不一定是最好的,但它會讓程式碼執行)。
- 在刪除 NA 並刪除這些重複項後檢查重複項(從估計的角度來看,再次刪除不一定是處理重複項的最佳方法)
一旦沒有 NA 並且沒有重複,您遇到的錯誤就會消失。
要刪除重複項,請使用以下程式碼
library(data.table) dt <- as.data.table(dt) dt[,n:=1] dt[,count:=cumsum(n),by=.(id,time)] dt_no_dups <- dt[count==1,]
但是,如果您沒有使用 index 參數來指示哪些變數是id和time則*pgmm()*常式假定您的前兩個變數給出了面板結構,即使它們不是,當然即使您在這些變數中也可能存在重複已將它們從您知道的變數中刪除以指示面板結構。因此,如果您的變數是按列排序的
$$ id,dependent,time,independent $$然後*pgmm()將在*(id,dependent)**中查找可能會或可能不會引發錯誤的重複項。 這是模擬 Arellano-Bond 模型和估計的 R 程式碼。模擬的模型是 $$ y_{it} = \rho y_{i,t-1} + x_{it}\beta + \mu_i + \epsilon_{it}. $$
如果變數with_duplicates為 TRUE,則在估計之前將重複項添加到數據中,以生成您遇到的錯誤。模擬的模型是
library(data.table) library(plm) # simulate simple model # y_it = \rho y_{i,t-1} + x_{it}^\top\beta + \epsilon_{it} with_duplicates <- TRUE # if TRUE duplicates are added to data throwing error T <- 10 N <- 100 dt <- data.table(id=rep(1:N,each=T),time=rep(1:T,N)) # Choose y0 for all N countries y0 <- rnorm(N) # Set model parameters rho <- 0.5 b <- -0.5 mu <- 2*runif(N) # simulate epsilon and x epsilon <- rnorm(N*T) x <- rnorm(N*T) z <- b*x + epsilon index <- 1 y <- rep(0,N*T) for (i in 1:N) { for (t in 1:T) { if (t==1) { y[index] <- rho*y0[i] + z[index] + mu[i] } else { y[index] <- rho*y[index-1] + z[index] + mu[i] } index <- index + 1 } } dt$y <- y dt$x <- x dt[,lag_y:=shift(y,1),by=id] # Add a duplicate observation to generate error if (with_duplicates) {dt <- rbind(dt,dt[4,])} model <- pgmm(y ~ lag(y,1) + x| lag(y,2:4),effect="individual", model="twosteps",data=dt) lm(y ~ lag_y + x,data=dt) summary(model)