ZHYCarge的博客

旧平台,已不再维护,请点击下方链接转至新平台访问

0%

大数据蒙题系列

前言

快考试了,尝试记录一波题型,可能有用,可能无用

大数据可能出现题型汇总

spark系列

spark词频统计

参考内容:

前提准备:

在自己的HDFS服务器中创建/1900301538/spark/input/wordcount.txt文件

文件内容随意,只要是一个个单词就行,也可以使用其余文献资料,例如

wordcount.txt

1
2
3
4
hello world
hello hadoop
hello mapreduce
hello spark

代码如下:

1
2
3
4
5
6
val text = sc.textFile("hdfs://hadoop01:9000/1900301538/spark/input/wordcount.txt")
val counts = text.flatMap(line => line.split(" "))
var wordcount = counts.map(counts => (counts, 1))
wordcount = wordcount.reduceByKey(_ + _)
wordcount.foreach(println)
wordcount.saveAsTextFile("hdfs://hadoop01:9000/1900301538/spark/output")

解释:

1
2
3
4
5
6
7
8
9
10
11
12
val text = sc.textFile("hdfs://hadoop01:9000/1900301538/spark/input/wordcount.txt")
//读取hadoop01的文件到text
val counts = text.flatMap(line => line.split(" "))
//将凡是相隔一个空格的字符对其进行分割
var wordcount = counts.map(counts => (counts, 1))
//将其创建成为(key,1)的格式
wordcount = wordcount.reduceByKey(_ + _)
//将上述所创建的map进行合并
wordcount.foreach(println)
//终端中输出查看
wordcount.saveAsTextFile("hdfs://hadoop01:9000/1900301538/spark/output")
//将其输出到指定目录下进行保存

运行结果如下:

1
2
3
4
5
(spark,1)
(hadoop,1)
(mapreduce,1)
(hello,4)
(world,1)

spark 计算pi值

参考文章:

代码如下:

1
2
3
4
5
6
7
var NUM_SAMPLES = 100000
val count = sc.parallelize(1 to NUM_SAMPLES).filter { _ =>
val x = math.random
val y = math.random
x*x + y*y < 1
}.count()
println(s"Pi is roughly ${4.0 * count / NUM_SAMPLES}")

解释:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
var NUM_SAMPLES = 100000
//定义一个随机值,越大越好
val count = sc.parallelize(1 to NUM_SAMPLES).filter { _ =>
//进行一个for循环?大概
val x = math.random
//定义一个x随机数
val y = math.random
//定义一个y随机数
x*x + y*y < 1
//使其相乘并相加小于1
}.count()
//记1
println(s"Pi is roughly ${4.0 * count / NUM_SAMPLES}")
//输出结果

MapReduce系列

资料来源:

数据去重

前提准备:

创建文件

a.txt

1
2
3
4
5
6
7
8
2012-3-1 a
2012-3-2 b
2012-3-3 c
2012-3-4 d
2012-3-5 a
2012-3-6 b
2012-3-7 c
2012-3-3 c

b.txt

1
2
3
4
5
6
7
8
2012-3-1 b
2012-3-2 a
2012-3-3 b
2012-3-4 d
2012-3-5 a
2012-3-6 c
2012-3-7 d
2012-3-3 c

代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
package hadoopdemo;

import java.io.IOException;
import java.util.Properties;
import java.util.logging.Logger;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class Remove_same {

private static final String HDFS = "hdfs://hadoop01:9000/";

//map将输入中的value复制到输出数据的key上,并直接输出

public static class Map extends Mapper<Object,Text,Text,Text>{

private static Text line=new Text();//每行数据



//实现map函数

public void map(Object key,Text value,Context context)

throws IOException,InterruptedException{

line=value;

context.write(line, new Text(""));

}



}



//reduce将输入中的key复制到输出数据的key上,并直接输出

public static class Reduce extends Reducer<Text,Text,Text,Text>{

//实现reduce函数

public void reduce(Text key,Iterable<Text> values,Context context)

throws IOException,InterruptedException{

context.write(key, new Text(""));

}



}



public static void main(String[] args) throws Exception{

Properties properties = System.getProperties();
properties.setProperty("HADOOP_USER_NAME", "hadoop");

Configuration conf = new Configuration();
conf.set("fs.defaultFS", HDFS);
conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());
conf.set("dfs.client.use.datanode.hostname", "true");



Tools tool = new Tools(HDFS, conf);
if (tool.exists("/1900301538/Remove_same"))
tool.rmr("/1900301538/Remove_same");
tool.mkdirs("/1900301538/Remove_same");
tool.mkdirs("/1900301538/Remove_same/input");
tool.copyFile("D:\\li\\a.txt","/1900301538/Remove_same/input/a.txt");
tool.copyFile("D:\\li\\b.txt","/1900301538/Remove_same/input/b.txt");




// Job job = new Job(conf, "数据去重");
Job job = Job.getInstance(conf, "数据去重");
// job.setJarByClass(Dedup.class);



//设置Map、Combine和Reduce处理类

job.setMapperClass(Map.class);

job.setCombinerClass(Reduce.class);

job.setReducerClass(Reduce.class);



//设置输出类型

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(Text.class);



//设置输入和输出目录

FileInputFormat.addInputPath(job, new Path("/1900301538/Remove_same/input/"));

FileOutputFormat.setOutputPath(job, new Path("/1900301538/Remove_same/output/"));
tool.cat("/1900301538/Remove_same/output/part-r-00000");
System.exit(job.waitForCompletion(true) ? 0 : 1);

}

}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
2012-3-1 a	
2012-3-1 b
2012-3-2 a
2012-3-2 b
2012-3-3 b
2012-3-3 c
2012-3-4 d
2012-3-5 a
2012-3-6 b
2012-3-6 c
2012-3-7 c
2012-3-7 d

求平均成绩

前提准备:

database.txt

1
2
3
4
小明 95
小红 81
小新 89
小丽 85

python.txt

82
1
2
3
小红 83
小新 94
小丽 91

c++.txt

1
2
3
4
小明 92
小红 87
小新 82
小丽 90

代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
package hadoopdemo;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;
import java.util.Properties;
import java.util.StringTokenizer;


public class average_s {

private static final String HDFS = "hdfs://hadoop01:9000/";
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {

// 实现map函数
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

// 将输入的纯文本文件的数据转化成String
String line = value.toString();
// 将输入的数据首先按行进行分割
StringTokenizer tokenizerArticle = new StringTokenizer(line, "\n");
// 分别对每一行进行处理

while (tokenizerArticle.hasMoreElements()) {

// 每行按空格划分

StringTokenizer tokenizerLine = new StringTokenizer(tokenizerArticle.nextToken());



String strName = tokenizerLine.nextToken();// 学生姓名部分

String strScore = tokenizerLine.nextToken();// 成绩部分



Text name = new Text(strName);

int scoreInt = Integer.parseInt(strScore);

// 输出姓名和成绩

context.write(name, new IntWritable(scoreInt));

}

}



}



public static class Reduce extends

Reducer<Text, IntWritable, Text, IntWritable> {

// 实现reduce函数

public void reduce(Text key, Iterable<IntWritable> values,

Context context) throws IOException, InterruptedException {

int sum = 0;
int count = 0;

for (IntWritable value : values) {

sum += value.get();// 计算总分

count++;// 统计总的科目数

}
int average = sum / count;// 计算平均成绩

context.write(key, new IntWritable(average));

}



}



public static void main(String[] args) throws Exception {

Properties properties = System.getProperties();
properties.setProperty("HADOOP_USER_NAME", "hadoop");

Configuration conf = new Configuration();
conf.set("fs.defaultFS", HDFS);
conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());
conf.set("dfs.client.use.datanode.hostname", "true");



Tools tool = new Tools(HDFS, conf);
if (tool.exists("/1900301538/average_s"))
tool.rmr("/1900301538/average_s");
tool.mkdirs("/1900301538/average_s");
tool.mkdirs("/1900301538/average_s/input");
tool.copyFile("D:\\li\\python.txt","/1900301538/average_s/input/python.txt");
tool.copyFile("D:\\li\\c++.txt","/1900301538/average_s/input/c++.txt");
tool.copyFile("D:\\li\\database.txt","/1900301538/average_s/input/database.txt");



Job job = Job.getInstance(conf, "求平均");

// 设置Map、Combine和Reduce处理类

job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);

// 设置输出类型

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

// 设置输入和输出目录

FileInputFormat.addInputPath(job, new Path("/1900301538/average_s/input/"));
FileOutputFormat.setOutputPath(job, new Path("/1900301538/average_s/output/"));
job.waitForCompletion(true);
tool.cat("/1900301538/average_s/output/part-r-00000");

}

}

输出结果:

1
2
3
4
小丽	88
小新 88
小明 89
小红 83

多表关联

准备文件:

factory.txt

1
2
3
4
5
6
7
8
factoryname                    addressed
Beijing Red Star     1
Shenzhen Thunder     3
Guangzhou Honda     2
Beijing Rising     1
Guangzhou Development Bank 2
Tencent         3
Back of Beijing      1

address.txt

1
2
3
4
5
addressID    addressname
1     Beijing
2     Guangzhou
3     Shenzhen
4     Xian

代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
package hadoopdemo;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;


public class Factory_where {
private static final String HDFS = "hdfs://hadoop01:9000/";
public static int time = 0;

/*

* 在map中先区分输入行属于左表还是右表,然后对两列值进行分割,

* 保存连接列在key值,剩余列和左右表标志在value中,最后输出

*/
public static class Map extends Mapper<Object, Text, Text, Text> {



// 实现map函数

public void map(Object key, Text value, Context context)

throws IOException, InterruptedException {

String line = value.toString();// 每行文件

String relationtype = new String();// 左右表标识



// 输入文件首行,不处理

if (line.contains("factoryname") == true

|| line.contains("addressed") == true) {

return;

}



// 输入的一行预处理文本

StringTokenizer itr = new StringTokenizer(line);

String mapkey = new String();

String mapvalue = new String();

int i = 0;

while (itr.hasMoreTokens()) {

// 先读取一个单词

String token = itr.nextToken();

// 判断该地址ID就把存到"values[0]"

if (token.charAt(0) >= '0' && token.charAt(0) <= '9') {

mapkey = token;

if (i > 0) {

relationtype = "1";

} else {

relationtype = "2";

}

continue;

}

// 存工厂名

mapvalue += token + " ";

i++;

}

// 输出左右表

context.write(new Text(mapkey), new Text(relationtype + "+"+ mapvalue));

}

}


/*

* reduce解析map输出,将value中数据按照左右表分别保存,

* 然后求出笛卡尔积,并输出。

*/

public static class Reduce extends Reducer<Text, Text, Text, Text> {



// 实现reduce函数

public void reduce(Text key, Iterable<Text> values, Context context)

throws IOException, InterruptedException {

// 输出表头

if (0 == time) {

context.write(new Text("factoryname"), new Text("addressname"));

time++;

}


int factorynum = 0;

String[] factory = new String[10];

int addressnum = 0;

String[] address = new String[10];


Iterator ite = values.iterator();

while (ite.hasNext()) {

String record = ite.next().toString();

int len = record.length();

int i = 2;

if (0 == len) {

continue;

}

// 取得左右表标识

char relationtype = record.charAt(0);

// 左表

if ('1' == relationtype) {

factory[factorynum] = record.substring(i);

factorynum++;

}


// 右表

if ('2' == relationtype) {

address[addressnum] = record.substring(i);

addressnum++;

}

}



// 求笛卡尔积

if (0 != factorynum && 0 != addressnum) {

for (int m = 0; m < factorynum; m++) {

for (int n = 0; n < addressnum; n++) {

// 输出结果

context.write(new Text(factory[m]),

new Text(address[n]));

}

}

}



}

}



public static void main(String[] args) throws Exception {

Properties properties = System.getProperties();
properties.setProperty("HADOOP_USER_NAME", "hadoop");

Configuration conf = new Configuration();
conf.set("fs.defaultFS", HDFS);
conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());
conf.set("dfs.client.use.datanode.hostname", "true");

Tools tool = new Tools(HDFS, conf);
if (tool.exists("/1900301538/Factory_where"))
tool.rmr("/1900301538/Factory_where");
tool.mkdirs("/1900301538/Factory_where");
tool.mkdirs("/1900301538/Factory_where/input");
tool.copyFile("D:\\li\\factory.txt","/1900301538/Factory_where/input/factory.txt");
tool.copyFile("D:\\li\\address.txt","/1900301538/Factory_where/input/address.txt");
Job job = Job.getInstance(conf, "公司位置汇总");



// 设置Map和Reduce处理类

job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);



// 设置输出类型

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);



// 设置输入和输出目录

FileInputFormat.addInputPath(job, new Path("/1900301538/Factory_where/input/"));
FileOutputFormat.setOutputPath(job, new Path("/1900301538/Factory_where/output/"));
job.waitForCompletion(true);
tool.cat("/1900301538/Factory_where/output/part-r-00000");

}

}

输出结果:

1
2
3
4
5
6
7
8
factoryname	addressname
Back of Beijing     Beijing
Beijing Rising     Beijing
Beijing Red Star     Beijing
Guangzhou Development Bank     Guangzhou
Guangzhou Honda     Guangzhou
Tencent     Shenzhen
Shenzhen Thunder     Shenzhen

单表关联(爷孙表)

需要文件:

c_p.txt

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
child        parent
Tom Lucy
Tom Jack
Jone Lucy
Jone Jack
Lucy Mary
Lucy Ben
Jack Alice
Jack Jesse
Terry Alice
Terry Jesse
Philip Terry
Philip Alma
Mark Terry
Mark Alma

代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
package hadoopdemo;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;


public class c_p{
private static final String HDFS = "hdfs://hadoop01:9000/";
public static int time = 0;
/*
* map将输出分割child和parent,然后正序输出一次作为右表,
* 反序输出一次作为左表,需要注意的是在输出的value中必须
* 加上左右表的区别标识。
*/
public static class Map extends Mapper<Object, Text, Text, Text> {
// 实现map函数
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String childname = "";// 孩子名称
String parentname = "";// 父母名称
String relationtype = "";// 左右表标识
// 输入的一行预处理文本

StringTokenizer itr=new StringTokenizer(value.toString());

String[] values=new String[2];

int i=0;

while(itr.hasMoreTokens()){

values[i]=itr.nextToken();

i++;

}



if (values[0].compareTo("child") != 0) {

childname = values[0];

parentname = values[1];



// 输出左表

relationtype = "1";

context.write(new Text(values[1]), new Text(relationtype +

"+"+ childname + "+" + parentname));



// 输出右表

relationtype = "2";

context.write(new Text(values[0]), new Text(relationtype +

"+"+ childname + "+" + parentname));

}

}



}



public static class Reduce extends Reducer<Text, Text, Text, Text> {



// 实现reduce函数

public void reduce(Text key, Iterable<Text> values, Context context)

throws IOException, InterruptedException {



// 输出表头

if (0 == time) {

context.write(new Text("grandchild"), new Text("grandparent"));

time++;

}



int grandchildnum = 0;

String[] grandchild = new String[10];

int grandparentnum = 0;

String[] grandparent = new String[10];



Iterator ite = values.iterator();

while (ite.hasNext()) {

String record = ite.next().toString();

int len = record.length();

int i = 2;

if (0 == len) {

continue;

}



// 取得左右表标识

char relationtype = record.charAt(0);

// 定义孩子和父母变量

String childname = new String();

String parentname = new String();



// 获取value-list中value的child

while (record.charAt(i) != '+') {

childname += record.charAt(i);

i++;

}



i = i + 1;



// 获取value-list中value的parent

while (i < len) {

parentname += record.charAt(i);

i++;

}



// 左表,取出child放入grandchildren

if ('1' == relationtype) {

grandchild[grandchildnum] = childname;

grandchildnum++;

}



// 右表,取出parent放入grandparent

if ('2' == relationtype) {

grandparent[grandparentnum] = parentname;

grandparentnum++;

}

}



// grandchild和grandparent数组求笛卡尔儿积

if (0 != grandchildnum && 0 != grandparentnum) {

for (int m = 0; m < grandchildnum; m++) {

for (int n = 0; n < grandparentnum; n++) {

// 输出结果

context.write(new Text(grandchild[m]), new Text(grandparent[n]));

}

}

}

}

}



public static void main(String[] args) throws Exception {

Properties properties = System.getProperties();
properties.setProperty("HADOOP_USER_NAME", "hadoop");

Configuration conf = new Configuration();
conf.set("fs.defaultFS", HDFS);
conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());
conf.set("dfs.client.use.datanode.hostname", "true");

Tools tool = new Tools(HDFS, conf);
if (tool.exists("/1900301538/c_p"))
tool.rmr("/1900301538/c_p");
tool.mkdirs("/1900301538/c_p");
tool.mkdirs("/1900301538/c_p/input");
tool.copyFile("D:\\li\\c_p.txt","/1900301538/c_p/input/c_p.txt");

Job job = Job.getInstance(conf, "爷孙关系");



// 设置Map和Reduce处理类

job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);


// 设置输出类型

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(Text.class);



// 设置输入和输出目录
FileInputFormat.addInputPath(job, new Path("/1900301538/c_p/input/"));
FileOutputFormat.setOutputPath(job, new Path("/1900301538/c_p/output/"));
job.waitForCompletion(true);
tool.cat("/1900301538/c_p/output/part-r-00000");

}

}

运行结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
grandchild	grandparent
Tom Alice
Tom Jesse
Jone Alice
Jone Jesse
Tom Ben
Tom Mary
Jone Ben
Jone Mary
Philip Alice
Philip Jesse
Mark Alice
Mark Jesse

尾言

如有其余问题,请留言

-------------我也是有底线的哦如需更多,欢迎打赏-------------