{"id":1808,"date":"2021-10-19T17:24:19","date_gmt":"2021-10-19T22:24:19","guid":{"rendered":"https:\/\/singularityumexicosummit.com\/?p=1808"},"modified":"2021-10-19T17:24:19","modified_gmt":"2021-10-19T22:24:19","slug":"microsofts-massive-new-language-ai-is-triple-the-size-of-openais-gpt-3","status":"publish","type":"post","link":"https:\/\/singularityumexico.com\/en\/microsofts-massive-new-language-ai-is-triple-the-size-of-openais-gpt-3\/","title":{"rendered":"Microsoft\u2019s Massive New Language AI Is Triple the Size of OpenAI\u2019s GPT-3"},"content":{"rendered":"<p>Just under a year and a half ago OpenAI announced completion of&nbsp;<a href=\"https:\/\/singularityhub.com\/2020\/06\/18\/openais-new-text-generator-writes-even-more-like-a-human\/\">GPT-3<\/a>, its natural language processing algorithm that was, at the time, the largest and most complex model of its type. This week, Microsoft and Nvidia&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model\/\">introduced<\/a>&nbsp;a new model they\u2019re calling \u201cthe world\u2019s largest and most powerful generative language model.\u201d The Megatron-Turing Natural Language Generation model (MT-NLG) is more than triple the size of GPT-3 at 530 billion parameters.<\/p>\n\n\n\n<p>GPT-3\u2019s 175 billion parameters was already a lot; its predecessor,&nbsp;<a href=\"https:\/\/singularityhub.com\/2019\/03\/07\/openais-eerily-realistic-new-text-generator-writes-like-a-human\/\">GPT-2<\/a>, had a mere 1.5 billion parameters, and Microsoft\u2019s&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/turing-nlg-a-17-billion-parameter-language-model-by-microsoft\/\">Turing Natural Language Generation<\/a>&nbsp;model, released in February 2020, had 17 billion.<\/p>\n\n\n\n<p>A parameter is an attribute a machine learning model defines based on its training data, and tuning more of them requires upping the amount of data the model is trained on. It\u2019s essentially learning to predict how likely it is that a given word will be preceded or followed by another word, and how much that likelihood changes based on other words in the sentence.<\/p>\n\n\n\n<p>As you can imagine, getting to 530 billion parameters required quite a lot of input data and just as much computing power. The algorithm was trained using an Nvidia supercomputer made up of 560 servers, each holding eight 80-gigabyte GPUs. That\u2019s 4,480 GPUs total, and an&nbsp;<a href=\"https:\/\/www.nextplatform.com\/2021\/02\/11\/the-billion-dollar-ai-problem-that-just-keeps-scaling\/\">estimated cost<\/a>&nbsp;of over $85 million.<\/p>\n\n\n\n<p>For training data, Megatron-Turing\u2019s creators used&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2101.00027\" target=\"_blank\" rel=\"noreferrer noopener\">The Pile<\/a>, a dataset put together by open-source language model research group Eleuther AI. Comprised of everything from PubMed to Wikipedia to Github, the dataset totals 825GB, broken down into 22 smaller datasets. Microsoft and Nvidia curated the dataset, selecting subsets they found to be \u201cof the highest relative quality.\u201d They added data from&nbsp;<a href=\"https:\/\/commoncrawl.org\/\">Common Crawl<\/a>, a non-profit that scans the open web every month and downloads content from billions of HTML pages then makes it available in a special format for large-scale data mining. GPT-3 was also trained using Common Crawl data.<\/p>\n\n\n\n<p>Microsoft\u2019s&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model\/\">blog post<\/a>&nbsp;on Megatron-Turing says the algorithm is skilled at tasks like completion prediction, reading comprehension, commonsense reasoning, natural language inferences, and word sense disambiguation. But stay tuned\u2014there will likely be more skills added to that list once the model starts being widely utilized.<\/p>\n\n\n\n<p>GPT-3 turned out to have capabilities beyond what its creators anticipated, like writing code, doing math, translating between languages, and autocompleting images (oh, and writing a&nbsp;<a href=\"https:\/\/singularityhub.com\/2020\/10\/23\/an-ai-wrote-this-short-film-and-its-sort-of-fascinating\/\">short film<\/a>&nbsp;with a twist ending). This led some to&nbsp;<a href=\"https:\/\/venturebeat.com\/2020\/09\/03\/were-entering-the-ai-twilight-zone-between-narrow-and-general-ai\/\">speculate<\/a>&nbsp;that GPT-3 might be the gateway to&nbsp;<a href=\"https:\/\/singularityhub.com\/2018\/07\/22\/from-here-to-human-level-artificial-general-intelligence-in-four-not-all-that-simple-steps\/\">artificial general intelligence<\/a>. But the algorithm\u2019s variety of talents, while unexpected, still fell within the language domain (including programming languages), so that\u2019s a bit of a stretch.<\/p>\n\n\n\n<p>However, given the tricks GPT-3 had up its sleeve based on its 175 billion parameters, it\u2019s intriguing to wonder what the Megatron-Turing model may surprise us with at 530 billion. The algorithm likely won\u2019t be commercially available for some time, so it\u2019ll be a while before we find out.<\/p>\n\n\n\n<p>The new model\u2019s creators, though, are highly optimistic. \u201cWe look forward to how MT-NLG will shape tomorrow\u2019s products and motivate the community to push the boundaries of natural language processing even further,\u201d they wrote in the\u00a0<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model\/\">blog post<\/a>. \u201cThe journey is long and far from complete, but we are excited by what is possible and what lies ahead.\u201d<\/p>\n\n\n\n<hr class=\"wp-block-separator has-text-color has-background has-black-background-color has-black-color is-style-wide\"\/>\n\n\n\n<p><em>Image Credit:\u00a0<a href=\"https:\/\/pixabay.com\/users\/kranich17-11197573\/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=6213537\" target=\"_blank\" rel=\"noreferrer noopener\">Kranich17<\/a>\u00a0from\u00a0<a href=\"https:\/\/pixabay.com\/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=6213537\" target=\"_blank\" rel=\"noreferrer noopener\">Pixabay<\/a><\/em><\/p>\n\n\n\n<p><strong>Author:<\/strong><\/p>\n\n\n\n<p>Vanessa is senior editor of Singularity Hub. She&#8217;s interested in renewable energy, health and medicine, international development, and countless other topics. When she&#8217;s not reading or writing you can usually find her outdoors, in water, or on a plane. <a href=\"https:\/\/singularityhub.com\/author\/vbatesramirez\/\">Learn <\/a><a href=\"https:\/\/singularityhub.com\/author\/vbatesramirez\/\" target=\"_blank\" rel=\"noreferrer noopener\">More<\/a><\/p>\n\n\n\n<p class=\"has-text-align-center\"><a href=\"https:\/\/singularityhub.com\/2021\/10\/13\/microsofts-massive-new-language-ai-is-triple-the-size-of-openais-gpt-3\/\" target=\"_blank\" rel=\"noreferrer noopener\">Original Article<\/a><\/p>","protected":false},"excerpt":{"rendered":"<p>Just under a year and a half ago OpenAI announced completion of&nbsp;GPT-3, its natural language processing algorithm that was, at the time, the largest and most complex model of its type. This week, Microsoft and Nvidia&nbsp;introduced&nbsp;a new model they\u2019re calling \u201cthe world\u2019s largest and most powerful generative language model.\u201d The Megatron-Turing Natural Language Generation model [&#8230;]\n","protected":false},"author":1,"featured_media":1809,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"episode_type":"","audio_file":"","podmotor_file_id":"","podmotor_episode_id":"","cover_image":"","cover_image_id":"","duration":"","filesize":"","filesize_raw":"","date_recorded":"","explicit":"","block":"","footnotes":""},"categories":[13],"tags":[26,27],"series":[],"class_list":["post-1808","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-articulos-ingles","tag-artificial-intelligence","tag-inteligencia-artificial-2"],"episode_featured_image":"https:\/\/singularityumexico.com\/wp-content\/uploads\/2021\/10\/natural-language-processing-Microsoft-Megatron-Turing-model-AI.jpg","episode_player_image":"https:\/\/singularityumexico.com\/wp-content\/uploads\/2023\/05\/11711533-1673157178559-89a95be153719-4-scaled.jpg","download_link":"","player_link":"","audio_player":false,"episode_data":{"playerMode":"dark","subscribeUrls":{"apple_podcasts":{"key":"apple_podcasts","url":"","label":"Apple Podcasts","class":"apple_podcasts","icon":"apple-podcasts.png"},"stitcher":{"key":"stitcher","url":"","label":"Stitcher","class":"stitcher","icon":"stitcher.png"},"google_podcasts":{"key":"google_podcasts","url":"","label":"Google Podcasts","class":"google_podcasts","icon":"google-podcasts.png"},"spotify":{"key":"spotify","url":"","label":"Spotify","class":"spotify","icon":"spotify.png"}},"rssFeedUrl":"https:\/\/singularityumexico.com\/en\/feed\/podcast\/the-feedback-loop-by-singularity","embedCode":"<blockquote class=\"wp-embedded-content\" data-secret=\"M4CwrMvRSZ\"><a href=\"https:\/\/singularityumexico.com\/en\/microsofts-massive-new-language-ai-is-triple-the-size-of-openais-gpt-3\/\">Microsoft\u2019s Massive New Language AI Is Triple the Size of OpenAI\u2019s GPT-3<\/a><\/blockquote><iframe sandbox=\"allow-scripts\" security=\"restricted\" src=\"https:\/\/singularityumexico.com\/en\/microsofts-massive-new-language-ai-is-triple-the-size-of-openais-gpt-3\/embed\/#?secret=M4CwrMvRSZ\" width=\"500\" height=\"350\" title=\"&#8220;Microsoft\u2019s Massive New Language AI Is Triple the Size of OpenAI\u2019s GPT-3&#8221; &#8212; Singularity Mexico\" data-secret=\"M4CwrMvRSZ\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\" class=\"wp-embedded-content\"><\/iframe><script type=\"text\/javascript\">\n\/* <![CDATA[ *\/\n\/*! This file is auto-generated *\/\n!function(d,l){\"use strict\";l.querySelector&&d.addEventListener&&\"undefined\"!=typeof URL&&(d.wp=d.wp||{},d.wp.receiveEmbedMessage||(d.wp.receiveEmbedMessage=function(e){var t=e.data;if((t||t.secret||t.message||t.value)&&!\/[^a-zA-Z0-9]\/.test(t.secret)){for(var s,r,n,a=l.querySelectorAll('iframe[data-secret=\"'+t.secret+'\"]'),o=l.querySelectorAll('blockquote[data-secret=\"'+t.secret+'\"]'),c=new RegExp(\"^https?:$\",\"i\"),i=0;i<o.length;i++)o[i].style.display=\"none\";for(i=0;i<a.length;i++)s=a[i],e.source===s.contentWindow&&(s.removeAttribute(\"style\"),\"height\"===t.message?(1e3<(r=parseInt(t.value,10))?r=1e3:~~r<200&&(r=200),s.height=r):\"link\"===t.message&&(r=new URL(s.getAttribute(\"src\")),n=new URL(t.value),c.test(n.protocol))&&n.host===r.host&&l.activeElement===s&&(d.top.location.href=t.value))}},d.addEventListener(\"message\",d.wp.receiveEmbedMessage,!1),l.addEventListener(\"DOMContentLoaded\",function(){for(var e,t,s=l.querySelectorAll(\"iframe.wp-embedded-content\"),r=0;r<s.length;r++)(t=(e=s[r]).getAttribute(\"data-secret\"))||(t=Math.random().toString(36).substring(2,12),e.src+=\"#?secret=\"+t,e.setAttribute(\"data-secret\",t)),e.contentWindow.postMessage({message:\"ready\",secret:t},\"*\")},!1)))}(window,document);\n\/\/# sourceURL=https:\/\/singularityumexico.com\/wp-includes\/js\/wp-embed.min.js\n\/* ]]> *\/\n<\/script>\n"},"_links":{"self":[{"href":"https:\/\/singularityumexico.com\/en\/wp-json\/wp\/v2\/posts\/1808","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/singularityumexico.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/singularityumexico.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/singularityumexico.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/singularityumexico.com\/en\/wp-json\/wp\/v2\/comments?post=1808"}],"version-history":[{"count":0,"href":"https:\/\/singularityumexico.com\/en\/wp-json\/wp\/v2\/posts\/1808\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/singularityumexico.com\/en\/wp-json\/wp\/v2\/media\/1809"}],"wp:attachment":[{"href":"https:\/\/singularityumexico.com\/en\/wp-json\/wp\/v2\/media?parent=1808"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/singularityumexico.com\/en\/wp-json\/wp\/v2\/categories?post=1808"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/singularityumexico.com\/en\/wp-json\/wp\/v2\/tags?post=1808"},{"taxonomy":"series","embeddable":true,"href":"https:\/\/singularityumexico.com\/en\/wp-json\/wp\/v2\/series?post=1808"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}