首页

如何评价微软和英伟达推出的迄今为止训练最大最强的语言模型 MT-NLG？第1页

huangzhe 网友的相关建议:

谢邀 @lokinko @Serendipity。

首先说个题外话。这也许是最强的语言模型，但其实并不是最大的语言模型。

去年谷歌出了个Switch Transformer，具有1.6万亿参数个参数。

不过Switch Transformer不是单体模型，是混合模型。就单体模型而言，MT-NLG确实是暂时最大的。

NLP任务大致可以分为NLU（自然语言理解）和NLG（自然语言生成）两种。准确地讲，应该这么说，MT-NLG是最大最强的生成语言模型（Generative Language Model）。

英伟达官网的这篇博客^[1]也是这么说的

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model

让我们回到问题上来。

我喜欢这个模型的名字。这个模型继承于Megatron-LM^[2]和Turing—NLG^[3]两个模型。翻译成中文，就是「威震天」和「图灵」。这两个模型结合后，在我脑海里浮现出来这么一副形象。

参数方面，领先GPT-3一些。不过比起前几年每年都翻一个两个数量级（十倍，百倍），这几年仅仅是翻两三倍，给我感觉，靠scale-up来取得最优性能的路将不再那么好走。

根据英伟达的博客介绍，MT-NLG在这几个方面都取得了「无与伦比」(unmatched)的成就

完成预测（Completion prediction)
阅读理解（Reading comprehension)
常识推理（Commonsense reasoning）
自然语言推论（Natural language inferences）
词义消歧（Word sense disambiguation）

例如模型可以推断运算符，可以在不同语法结构下进行推断。感觉水平可以啊。

（先工作会，晚点再回来摸鱼继续写）

参考

^Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model https://developer.nvidia.com/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/
^ https://github.com/NVIDIA/Megatron-LM
^ https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/

jzwa 网友的相关建议:

感谢 @lokinko @Serendipity

仅仅做一点翻译工作，原文移步这里

MT-NLG，全称 Megatron-powered Megatron-Turing Natural Language Generation model ，这是迄今为止训练的最大、最强大的单片 Transformer 语言模型，拥有 5300 亿个参数。这是 Microsoft 和 NVIDIA 共同努力推进自然语言生成 AI 最先进技术的结果。

之前很火的模型GPT-3 ，拥有1700亿个参数，而MT-NLG是其三倍。

基本上可以算是翻了三倍，左边的坐标轴很有意思，是以10倍为一个区间。

它的训练数据集有15个，分别有不同的权重和训练次数

为了衡量其性能，团队设计了五个领域的八个问题：

In the text prediction task LAMBADA, the model predicts the last word of a given paragraph.
In the reading comprehension tasks RACE-h and BoolQ, the model generates answers to questions based on a given paragraph.
In the commonsense reasoning tasks PiQA, HellaSwag, and Winogrande, each required some level of commonsense knowledge beyond statistical patterns of language to solve.
For natural language inference, two hard benchmarks, ANLI-R2 and HANS target the typical failure cases of past models.
The word sense disambiguation task WiC evaluates polysemy understanding from context.

同时还开源了这个库

以供人们方便重复。

在其准确度测试方面，meta-learning 的Zero，one和few shot策略被应用到以下9个数据集上，在Lambda和PiQA数据集三个策略分别都达到了sota。

以上

如何评价微软和英伟达推出的迄今为止训练最大最强的语言模型 MT-NLG？的其他答案点击这里