4.8 Groupby 和 Combine

function all_grades()
rename_bob2(data_col) = replace.(data_col, "Bob 2" => "Bob")
df2 = transform(df2, :name => rename_bob2 => :name)
return vcat(df1, df2)
end
all_grades()
Sally 1.0
Bob 5.0
Alice 8.5
Hank 4.0
Bob 9.5
Sally 9.5
Hank 6.0

groupby(all_grades(), :name)
GroupedDataFrame with 4 groups based on key: name
Group 1 (2 rows): name = "Sally"
│ String  Float64
─────┼─────────────────
1 │ Sally       1.0
2 │ Sally       9.5
Group 2 (2 rows): name = "Bob"
│ String  Float64
─────┼─────────────────
1 │ Bob         5.0
2 │ Bob         9.5
Group 3 (1 row): name = "Alice"
│ String  Float64
─────┼─────────────────
1 │ Alice       8.5
Group 4 (2 rows): name = "Hank"
│ String  Float64
─────┼─────────────────
1 │ Hank        4.0
2 │ Hank        6.0

mean 函数来自 Julia 标准库中的 Statistics 模块:

using Statistics

gdf = groupby(all_grades(), :name)
combine(gdf, :grade => mean)
Sally 5.25
Bob 7.25
Alice 8.5
Hank 5.0

4.8.1 Multiple Source Columns

group = [:A, :A, :B, :B]
X = 1:4
Y = 5:8
df = DataFrame(; group, X, Y)
group X Y
A 1 5
A 2 6
B 3 7
B 4 8

gdf = groupby(df, :group)
combine(gdf, [:X, :Y] .=> mean; renamecols=false)
group X Y
A 1.5 5.5
B 3.5 7.5

gdf = groupby(df, :group)
rounded_mean(data_col) = round(Int, mean(data_col))
combine(gdf, [:X, :Y] .=> rounded_mean; renamecols=false)
group X Y
A 2 6
B 4 8

CC BY-NC-SA 4.0 Jose Storopoli, Rik Huijzer, Lazaro Alonso, 刘贵欣 (中文翻译), 田俊 （中文审校）