博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Hadoop/MapReduce 好友推荐解决方案
阅读量:2491 次
发布时间:2019-05-11

本文共 8199 字,大约阅读时间需要 27 分钟。

目标:如果用户A与用户C同时都跟B是好友,但用户A与用户C又不是好友,则向用户A推荐C,向用户C推荐A,同时说明A与C的共同好友有哪些

例如:

有如下的好友关系:

1 2,3,4,5,6,7,8

2 1,3,4,5,7
3 1,2
4 1,2,6
5 1,2
6 1,4
7 1,2
8 1

其中每一行空格前的元素为用户ID,空格后的元素为用户的好友ID列表

其对应的好友关系图为

期望输出为:

1

2 6(2:[4, 1]),8(1:[1]),
3 4(2:[1, 2]),5(2:[2, 1]),6(1:[1]),7(2:[1, 2]),8(1:[1]),
4 3(2:[2, 1]),5(2:[1, 2]),7(2:[1, 2]),8(1:[1]),
5 3(2:[2, 1]),4(2:[1, 2]),6(1:[1]),7(2:[1, 2]),8(1:[1]),
6 2(2:[1, 4]),3(1:[1]),5(1:[1]),7(1:[1]),8(1:[1]),
7 3(2:[1, 2]),4(2:[2, 1]),5(2:[2, 1]),6(1:[1]),8(1:[1]),
8 2(1:[1]),3(1:[1]),4(1:[1]),5(1:[1]),6(1:[1]),7(1:[1]),

对于用户1,因为它以及跟2,3,4,5,6,7,8都是好友,则不向其推荐任何好友

对于用户2,向其推荐6,因为2跟6可以通过4或者1认识;向其推荐8,因为2和8可以通过1认识

对于用户3,向其推荐4,因为3跟4可以通过1或者2认识;向其推荐5,因为3和5可以通过2或者1认识;向其推荐6,因为3和6可以通过1认识;向其推荐7,因为3和7可以通过1或者2认识;想起推荐8,因为3跟8可以通过1认识

...

思路:

对于每一行,例如4 1,2,6

map操作:

生成直接好友键值对(4,[1,-1]) (4,[2,-1])  (4,[6,-1])

生成间接好友键值对(1,[2,4])    (2,[1,4])    (1,[6,4])    (6,[1,4])    (2,[6,4])    (6,[2,4]]),其中(1,[2,4]),连接为向1推荐2,因为可以通过4认识,其他类似

reduce操作:

所有对于同一个用户的直接好友键值对和间接好友键值对能够到达同一个规约器

例如:对于用户4

key=4

以下键值对集合会到达同一个reduce

t2= FriendPair [user1=7, user2=1]

t2= FriendPair [user1=3, user2=2]
t2= FriendPair [user1=2, user2=-1]
t2= FriendPair [user1=6, user2=-1]
t2= FriendPair [user1=1, user2=2]
t2= FriendPair [user1=8, user2=1]
t2= FriendPair [user1=6, user2=1]
t2= FriendPair [user1=5, user2=1]
t2= FriendPair [user1=3, user2=1]
t2= FriendPair [user1=1, user2=6]
t2= FriendPair [user1=2, user2=1]
t2= FriendPair [user1=1, user2=-1]
t2= FriendPair [user1=7, user2=2]
t2= FriendPair [user1=5, user2=2]

对于用户4,维护一个Map<Long,List<Long>>,用来保存用户4的推荐好友以及跟该好友的共同好友列表

显然,对于4的直接好友:即user2为-1的,应该直接不对其推荐,只需要将<user1,null>放入Map中

对于4的间接好友,应该把推荐ID相同的记录的共同好友进行累加,如

t2= FriendPair [user1=3, user2=2]

t2= FriendPair [user1=3, user2=1]

则应将给用户4推荐的用户3的所有共同好友:用户2和用户1进行累加,将<3,[2,1]>放入Map中

代码实现:

1、自定义好友对

package FriendRecommendation;import java.io.DataInput;import java.io.DataOutput;import java.io.IOException;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Writable;import org.apache.hadoop.io.WritableComparable;public class FriendPair implements Writable, WritableComparable
{ private LongWritable user1 = new LongWritable(); private LongWritable user2 = new LongWritable(); public FriendPair(){} public LongWritable getUser1() { return user1; } public void setUser1(LongWritable user1) { this.user1 = user1; } public LongWritable getUser2() { return user2; } public void setUser2(LongWritable user2) { this.user2 = user2; } public FriendPair(Long user1,Long user2) { /*if(user1 > user2) { this.user1.set(user1); this.user2.set(user2); } else { this.user1.set(user2); this.user2.set(user1); }*/ this.user1.set(user1); this.user2.set(user2); } @Override public int compareTo(FriendPair pair) { int compareValue = this.user1.compareTo(pair.user1); if (compareValue == 0) {//如果年月相等,再比较温度 compareValue = this.user2.compareTo(pair.user2); } //return compareValue; // to sort ascending return -1*compareValue; // to sort descending } @Override public void readFields(DataInput in) throws IOException { user1.readFields(in); user2.readFields(in); } @Override public void write(DataOutput out) throws IOException { user1.write(out); user2.write(out); } @Override public String toString() { return "FriendPair [user1=" + user1.get() + ", user2=" + user2.get() + "]"; }}

2、

package FriendRecommendation;import java.io.IOException;import java.util.ArrayList;import java.util.HashMap;import java.util.Iterator;import java.util.List;import java.util.Map;import org.apache.hadoop.conf.Configured;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;import org.apache.hadoop.util.Tool;import org.apache.hadoop.util.ToolRunner;import scala.Tuple2;public class Main extends Configured implements Tool  {    public static class FriendRecommendationMapper extends Mapper
{ LongWritable outputKey = new LongWritable(); @Override protected void map(LongWritable key, Text value,Context context) throws IOException, InterruptedException { //1 2,3,4,5,6,7,8 System.out.println("map key=" + key); System.out.println("map value=" + value); Long userID = Long.valueOf( value.toString().split(" ")[0]); String[] friends_str = value.toString().split(" ")[1].split(","); //发出所有的直接好友关系 for(String friend : friends_str) { FriendPair directFriend = new FriendPair(Long.valueOf(friend), -1L); outputKey.set(userID); System.out.println("directFriend:" + userID + ","+ directFriend); context.write(outputKey, directFriend); } //发出所有可能的好友关系 for(int i=0;i
{ @Override protected void reduce( LongWritable key, Iterable
values, Context context) throws IOException, InterruptedException { System.out.println("reduce key = " + key); Map
> mutualFriends = new HashMap
>(); Iterator
iterator = values.iterator(); while(iterator.hasNext()) { FriendPair t2 = iterator.next(); System.out.println("t2= " + t2); Long toUser = t2.getUser1().get(); Long mutualFriend = t2.getUser2().get(); boolean alreadyFriend = (mutualFriend == -1); if(mutualFriends.containsKey(toUser)) { if(alreadyFriend) { mutualFriends.put(toUser, null); } else if(mutualFriends.get(toUser) != null) { mutualFriends.get(toUser).add(mutualFriend); } } else { if(alreadyFriend) { mutualFriends.put(toUser, null); } else { List
list = new ArrayList
(); list.add(mutualFriend); mutualFriends.put(toUser,list); } } } String reducerOutput = buildOutput(mutualFriends); Text outputKey = new Text(); Text outputValue = new Text(); outputKey.set("" + key); outputValue.set(reducerOutput); context.write(outputKey, outputValue); } } public static String buildOutput(Map
> map) { String output = ""; for(Map.Entry
> entry : map.entrySet()) { Long K = entry.getKey(); List
V = entry.getValue(); if(V!=null) output += K + "(" + V.size() + ":" + V + "),"; } return output; } public static void main(String[] args) throws Exception { args = new String[2]; args[0] = "input/friends2.txt"; args[1] = "output/friends2"; int jobStatus = submitJob(args); System.exit(jobStatus); } public static int submitJob(String[] args) throws Exception { int jobStatus = ToolRunner.run(new Main(), args); return jobStatus; } @Override public int run(String[] args) throws Exception { Job job = new Job(getConf()); job.setJobName("CommonFriendsDriver"); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); job.setOutputKeyClass(LongWritable.class); job.setOutputValueClass(FriendPair.class); job.setMapperClass(FriendRecommendationMapper.class); job.setReducerClass(FriendRecommendationReducer.class); // args[0] = input directory // args[1] = output directory FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); boolean status = job.waitForCompletion(true); return status ? 0 : 1; }}
结果:

1

2 6(2:[4, 1]),8(1:[1]),
3 4(2:[1, 2]),5(2:[2, 1]),6(1:[1]),7(2:[1, 2]),8(1:[1]),
4 3(2:[2, 1]),5(2:[1, 2]),7(2:[1, 2]),8(1:[1]),
5 3(2:[2, 1]),4(2:[1, 2]),6(1:[1]),7(2:[1, 2]),8(1:[1]),
6 2(2:[1, 4]),3(1:[1]),5(1:[1]),7(1:[1]),8(1:[1]),
7 3(2:[1, 2]),4(2:[2, 1]),5(2:[2, 1]),6(1:[1]),8(1:[1]),
8 2(1:[1]),3(1:[1]),4(1:[1]),5(1:[1]),6(1:[1]),7(1:[1]),

给出一个另一个简单输入的执行过程详解:

你可能感兴趣的文章
linux下源的相关笔记(suse)
查看>>
linux系统分区文件系统划分札记
查看>>
Linux(SUSE 12)安装Tomcat
查看>>
Linux(SUSE 12)安装jboss4并实现远程访问
查看>>
Neutron在给虚拟机分配网络时,底层是如何实现的?
查看>>
netfilter/iptables全攻略
查看>>
Overlay之VXLAN架构
查看>>
Eclipse : An error occurred while filtering resources(Maven错误提示)
查看>>
在eclipse上用tomcat部署项目404解决方案
查看>>
web.xml 配置中classpath: 与classpath*:的区别
查看>>
suse如何修改ssh端口为2222?
查看>>
详细理解“>/dev/null 2>&1”
查看>>
suse如何创建定时任务?
查看>>
suse搭建ftp服务器方法
查看>>
centos虚拟机设置共享文件夹并通过我的电脑访问[增加smbd端口修改]
查看>>
文件拷贝(IFileOperation::CopyItem)
查看>>
MapReduce的 Speculative Execution机制
查看>>
大数据学习之路------借助HDP SANDBOX开始学习
查看>>
Hadoop基础学习:基于Hortonworks HDP
查看>>
为什么linux安装程序 都要放到/usr/local目录下
查看>>