最近在读Recsys2021上的paper:Transformers4Rec: Bridging the Gap between NLP and Sequential / Session-Based Recommendation,文章也聊到了NLP发展和推荐系统发展的关系,可以说推荐系统是在NLP的肩膀上前进的。结论是,关注不同领域,尤其是NLP领域的发展,对于推荐系统的研究和工作很可能都会有帮助。
我们不妨以这篇论文为例,来看看作者的研究动机以及推荐系统和NLP发展的关系。
近年来,序列/会话推荐上的进展很多都源自于NLP模型和预训练技术的发展,可以说推荐系统是在NLP的肩膀上前进的。尤其是Transformers架构,在BERT等预训练模型中广泛应用,也在序列推荐中初露端倪。然而,推荐系统的发展实际上是滞后于NLP的。NLP中花式Transformers架构层出不穷,比如:邱锡鹏组的survey[2]集中展示了Transformer架构的演进;NLP中的开源社区也十分活跃,比如HuggingFace的开源库Transformers[4]涵盖了大部分主流的Transformer实现,总之,Transformers在NLP研究中正如火如荼地开展着。
然而,在推荐系统中的应用很多只停留在最原始的Transformer[3]。很大程度上是缺乏一个类似HuggingFace的统一轮子,研究者想在此基础上做迭代实际上相比于NLP会困难不少。
为了弥补这种发展鸿沟,作者开源了一个基于HuggingFace开源库[4]的序列推荐包Transformers4Rec,目的是希望推荐系统社区能够更快地follow到NLP社区在Transformers中的进展,并在序列/会话推荐任务中实现开箱即用。完整的代码开源在了github。
近年来,关于序列推荐的工作也是层出不穷,survey[5]集中展示了基于深度学习的序列推荐研究进展。在大部分的序列推荐场景中,可能都只用了用户最新的交互行为数据,或者由于用户是匿名的,所以只能用到当前会话session下的序列行为,这也就是典型的session会话推荐场景,属于序列推荐中的一种。
在最近10年内,大部分序列推荐的工作是在NLP发展的基础上开展的,个人认为主要是因为三方面:
NLP发展始终快推荐系统一步,而二者的抽象问题又非常相似,且文本是推荐系统中重要的side information,因此很容易出现推荐系统社区研究人员在NLP社区的研究进展的基础上,直接应用或改进后用到推荐系统中。
早期非深度学习时代,推荐系统受NLP的影响工作例如:
进入深度学习时代后,NLP对推荐系统的影响就更加深远了,先看一张图:
除了上述研究之外,还有不少最新前沿的推荐系统研究实际上也是深受NLP领域的影响,有一些还有着推荐系统领域"独特的味道"。
Transfromers4Rec工作实际上也是受到Transfomers、BERT以及NLP开源社区的影响所产生的。不同领域之间互相影响当然也是件好事,例如NLP领域也经常借鉴CV领域的成果,例如残差网络、CNN等等。
从上面的介绍,我们可以看到NLP研究进展如何深远的影响着推荐系统的研究,可以说推荐系统的发展很大程度上是在NLP的肩膀上前进的。
到目前为止,我们看到的大多是单向影响,即:起步较早的领域深度影响着新兴领域,那么是否存在反哺的现象呢?也算是本篇文章的一个遗留问题,例如:推荐系统社区是否有反哺NLP社区呢? 这个问题可能也等价于推荐系统是否有一些独创/开创性的研究,是已经或有可能潜在影响其他领域的研究?(可能是个知乎好问题)。
这个问题也很大,需要花一些时间去学习和积累才能感受到。以读者目前的浅薄认知而言,我觉得有,至少我认为推荐系统在工程架构方面的进展是前沿的,是领先其他领域的。比如:实时推理引擎如TF Serving那一套,最早是服务于大规模实时推荐系统的,但对NLP尤其是预训练模型做实时serving还是有帮助的,小规模场景开箱即用,大规模场景做TF Serving优化后,也能使用。还有比如推荐系统中算首创的工作协同过滤或特征交互的研究进展,经典的FM等,也能一定程度上启发NLP研究。还有比如搜索/推荐场景常用的多阶段pipeline,召回+粗排+精排,也经常用在对话系统中。更多的以后有机会可以一起探讨探讨。
[1] de Souza Pereira Moreira G, Rabhi S, Lee J M, et al. Transformers4Rec: Bridging the Gap between NLP and Sequential/Session-Based Recommendation[C]//Fifteenth ACM Conference on Recommender Systems. 2021: 143-153.
[2] Lin T, Wang Y, Liu X, et al. A Survey of Transformers[J]. arXiv preprint arXiv:2106.04554, 2021.
[3] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Advances in neural information processing systems. 2017: 5998-6008.
[4] State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow: https://github.com/huggingface/transformers
[5] Hui Fang, Danning Zhang, Yiheng Shu, and Guibing Guo. 2020. Deep Learning for Sequential Recommendation: Algorithms, Influential Factors, and Evaluations. ACM Transactions on Information Systems (TOIS) 39, 1 (2020), 1–42.
[6] 研究推荐系统要对NLP很了解吗?付鹏的回答:https://www.zhihu.com/question/317441966/answer/633112869
[7] 自然语言处理 NLP 的百年发展史, http://imgtec.eetrend.com/blog/2020/100052025.html
[8] 一文尽览推荐系统模型演变史, https://cloud.tencent.com/developer/article/1652169
[9] Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).
[10] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[J]. arXiv preprint arXiv:1409.0473, 2014.
[11] Zhou G, Zhu X, Song C, et al. Deep interest network for click-through rate prediction[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018: 1059-1068.
[12] Wu S, Tang Y, Zhu Y, et al. Session-based recommendation with graph neural networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 33(01): 346-353.
[13] Transformer 在美团搜索排序中的实践: https://tech.meituan.com/2020/04/16/transformer-in-meituan.html
[14] Mihajlo Grbovic, Vladan Radosavljevic, Nemanja Djuric, Narayan Bhamidipati, Jaikit Savla, Varun Bhagwan, and Doug Sharp. 2015. E-commerce in your inbox: Product recommendations at scale. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 1809–1818.
[15] Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, and Jun Ma. 2017. Neural Attentive Session-based Recommendation. arXiv:1711.04725 [cs.IR]
[16] Jun Xiao, Hao Ye, Xiangnan He, Hanwang Zhang, Fei Wu, and Tat-Seng Chua. Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks. arXiv:1708.04617 [cs.LG]
[17] Fuyu Lv, Taiwei Jin, Changlong Yu, Fei Sun, Quan Lin, Keping Yang, and Wilfred Ng. 2019. SDM: Sequential deep matching model for online large-scale recommender system. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2635–2643.
[18] Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. 2019. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management. 1441–1450.
[19] Qiu Z, Wu X, Gao J, et al. U-BERT: Pre-training User Representations for Improved Recommendation[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2021, 35(5): 4320-4327.
[20] Kang W C, Cheng D Z, Yao T, et al. Learning to Embed Categorical Features without Embedding Tables for Recommendation[C]//Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021: 840-850.
[21] Guo H, Chen B, Tang R, et al. An Embedding Learning Framework for Numerical Features in CTR Prediction[J]. arXiv preprint arXiv:2012.08986, 2020.
[22] Wu J, Wang X, Feng F, et al. Self-supervised graph learning for recommendation[C]//Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2021: 726-735.
[23] KDD2021, Deep Learning on Graphs for Natural Language Processing: https://dlg4nlp.github.io/
[24] Gao T, Yao X, Chen D. SimCSE: Simple Contrastive Learning of Sentence Embeddings[J]. arXiv preprint arXiv:2104.08821, 2021.
[25] Zhou C, Ma J, Zhang J, et al. Contrastive learning for debiased candidate generation in large-scale recommender systems[C]//Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021: 3985-3995.
[26] Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[J]. arXiv preprint arXiv:1503.02531, 2015.
[27] 张俊林,知识蒸馏在推荐系统的应用:https://zhuanlan.zhihu.com/p/143155437
[28] 张俊林,利用Contrastive Learning对抗数据噪声:对比学习在微博场景的实践,https://zhuanlan.zhihu.com/p/370782081
[29] Zeng Z, Xiao C, Yao Y, et al. Knowledge transfer via pre-training for recommendation: A review and prospect[J]. Frontiers in big Data, 2021, 4
[30] Wang C, Blei D M. Collaborative topic modeling for recommending scientific articles[C]//Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. 2011: 448-456.
[31] Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation[J]. the Journal of machine Learning research, 2003, 3: 993-1022.
[32] Bordes A, Usunier N, Garcia-Duran A, et al. Translating embeddings for modeling multi-relational data[J]. Advances in neural information processing systems, 2013, 26.
[33] Wang X, He X, Cao Y, et al. Kgat: Knowledge graph attention network for recommendation[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019: 950-958.
[34] Hongwei Wang, https://scholar.google.com/citations?user=3C__4wsAAAAJ&hl=zh-CN
原文也可参见我的知乎专栏:蘑菇先生学习记。