硬核！聊一聊Flink流计算常用算子

因为Transform算子基于Source算子操作，所以首先构建Flink执行环境及Source算子，后续Transform算子操作基于此：

1. map

将DataSet中的每一个元素转换为另外一个元素：

2. flatMap

将DataSet中的每一个元素转换为0…n个元素：

3. mapPartition

将一个分区中的元素转换为另一个元素：

4. filter

过滤出来一些符合条件的元素，返回boolean值为true的元素：

val source: DataSet[String] = env.fromElements("java", "scala", "java")
val filter:DataSet[String] = source.filter(line => line.contains("java"))//过滤出带java的数据
filter.print()

5. reduce

可以对一个dataset或者一个group来进行聚合计算，最终聚合成一个元素：

6. reduceGroup

将一个dataset或者一个group聚合成一个或多个元素。

// 使用 fromElements 构建数据源
val source: DataSet[(String, Int)] = env.fromElements(("java", 1), ("scala", 1), ("java", 1))
// 根据首个元素分组
val groupData = source.groupBy(_._1)
// 使用reduceGroup聚合
val result: DataSet[(String, Int)] = groupData.reduceGroup {
(in: Iterator[(String, Int)], out: Collector[(String, Int)]) =>
val tuple = in.reduce((x, y) => (x._1, x._2 + y._2))
out.collect(tuple)
}
// 打印测试
result.print()

7. minBy和maxBy

选择具有最小值或最大值的元素：

8. Aggregate

在数据集上进行聚合求最值(最大值、最小值)：

Aggregate只能作用于元组上

注意：

要使用aggregate，只能使用字段索引名或索引名称来进行分组 groupBy(0) ，否则会报一下错误:

Exception in thread "main" java.lang.UnsupportedOperationException: Aggregate does not support grouping with KeySelector functions, yet.

物理服务器和大宽服务器怎么选