你需要了解的50个ChatGPT统计数据和事实
Rest assured that with the ChatGPT statistics you’re about to read, you’ll confirm that the popular chatbot from OpenAI is just the beginning of something bigger. Since its launch in November 2022, ChatGPT has broken unexpected records. For example, it reached 100 million active users in January, just two months after its release, making it the fastest-growing consumer app in history.
请放心,通过即将呈现的 ChatGPT 统计数据,您将确认 OpenAI 这款热门聊天机器人只是一个更大事物的开端。自 2022 年 11 月发布以来,ChatGPT 取得了令人意想不到的突破。例如,它在发布后的短短两个月内,即 2023 年 1月,就达到了 1 亿活跃用户,成为史上增长最快的消费者应用程序。
This chatbot has revolutionized the field of AI by using deep learning techniques to generate human-like text and answer a wide range of questions with high accuracy. The versatility of the responses goes from the generation of code to the creation of memes. One of its most common uses is for customer service, though ChatGPT can also be helpful for IT support.
这款聊天机器人通过使用深度学习技术生成类似人类的文本,并以高准确性回答各种问题,从而在人工智能领域引起了革命性的变化。它的回复非常多样化,可以涵盖从代码生成到创作迷因的各种需求。其中,最常见的用途之一是客户服务,不过 ChatGPT 在 IT 支持方面也非常有帮助。
But what makes ChatGPT so disruptive? In this article, you’ll discover some of the most significant ChatGPT statistics and facts that will help you understand how far AI has come. Read on to find out!
但是是什么使得 ChatGPT 如此具有颠覆性?在本文中,您将了解一些最重要的 ChatGPT 统计数据和事实,这将帮助您了解人工智能的发展程度。请继续阅读以获取更多信息!
What is ChatGPT?
什么是 ChatGPT?
Developed by OpenAI, ChatGPT is an innovative artificial intelligence chatbot based on the open-source GPT-3 natural language processing (NLP) model.
ChatGPT 是由 OpenAI 开发的创新人工智能聊天机器人,基于开源的 GPT-3 自然语言处理(NLP)模型。
The chatbot can understand what users say, anticipate their needs, and respond accurately. It interacts conversationally, so users can feel like they are talking to a real person.
这款聊天机器人能够理解用户的言辞,预测他们的需求并准确地作出回应。它以对话的方式进行交互,让用户有一种在与真实人交谈的感觉。
50 ChatGPT statistics and facts you need to know
你需要了解的 50 个 ChatGPT 统计数据和事实
With the rising popularity of ChatGPT, we began to wonder about the numbers behind such a massive phenomenon. So, we started digging, and here’s everything we found out.
随着 ChatGPT 的日益流行,我们开始好奇这一巨大现象背后的数据。于是,我们开始挖掘,以下是我们发现的一切信息。
ChatGPT history
ChatGPT 历史
1. OpenAI has been developing GPT (Generative Pre-Train) since 2018.
OpenAI 自 2018 年开始开发 GPT(Generative Pre-trained Transformer,生成式预训练转换器)技术。
2. GPT-1 and GPT-2 laid the foundations for GPT-3.
GPT-1 和 GPT-2 为 GPT-3 奠定了基础。
3. GPT-1 was trained with BooksCorpus dataset (5GB), whose primary focus was language understanding.
GPT-1 是使用 BooksCorpus 数据集(5GB)进行训练的,该数据集的主要关注点是语言理解。
4. On Valentine’s Day 2019, GPT-2 was launched with the slogan too dangerous to release. It was trained with Reddit articles with over 3 likes (40GB). The training cost was $43,000.
在 2019 年情人节,GPT-2 发布了口号为太危险以至于不能发布。它是使用超过 3 个赞的 Reddit 文章进行训练的(40GB)。训练成本为 43,000 美元。
5. Later, GPT-2 was used to generate music in MuseNet and JukeBox.
后来,GPT-2 被用于在 MuseNet 和 JukeBox 中生成音乐。
6. In June 2020, GPT-3 was released, which was trained by a much more comprehensive dataset.
在 2020 年 6 月,GPT-3 发布了,它是通过一个更加全面的数据集进行训练的。
7. Some of the applications that were developed based on GPT-3 are:
一些基于 GPT-3 开发的应用包括:
DALL-E: creating images from text.
DALL-E:从文本生成图像。
CLIP: connecting text and images.
CLIP:连接文本和图像。
Whisper: multi-lingual voice-to-text.
Whisper:多语言语音转文本。
ChatGPT: chatbot, article writer, code writer.
ChatGPT:聊天机器人、文章撰写、代码生成。
ChatGPT general facts
ChatGPT 的一般事实
8. OpenAIs GPT-4 is the largest language model created to date. ChatGPT-4 was released on March 14, 2023.
OpenAI 的 GPT-4 是迄今为止最大的语言模型。ChatGPT-4 于2023 年3 月 14 日发布。
9. It has 175 billion parameters and receives 10 million queries per day.
它拥有 1750 亿个参数,并每天接收 1000 万次查询。
10. It was trained on a massive corpus of text data, around 570GB of datasets, including web pages, books, and other sources.
它是在大规模的文本数据集上进行训练的,其中包括约 570 GB的数据集,包括网页、书籍和其他来源。
11. GPT-3 has been fine-tuned for a variety of language tasks, such as translation, summarization, and question-answering.
GPT-3 已经进行了微调,用于各种语言任务,如翻译、摘要和问答。
12. It can generate human-like text, including poems, stories, and even code.
它能够生成类似人类的文本,包括诗歌、故事,甚至代码。
13. The response time of ChatGPT is typically less than a second, making it well-suited for real-time conversations.
ChatGPT 的响应时间通常在一秒以内,非常适合实时对话。
14. ChatGPT has been integrated into a variety of platforms and applications, including websites, messaging apps, virtual assistants, and other AI applications.
ChatGPT 已被集成到各种平台和应用程序中,包括网站、消息应用、虚拟助手和其他人工智能应用。
15. Some experts have called GPT-3 a major step in developing artificial intelligence.
一些专家称 GPT-3 是发展人工智能的重要一步。
16. GPT-3 has been praised for its ability to understand the context and produce relevant responses.
GPT-3 因其理解上下文并产生相关回答的能力而受到赞赏。
17. It has been shown to outperform previous language models and even humans on certain language tasks.
已经证明在某些语言任务上,GPT-3 能够超越以前的语言模型甚至人类表现。
18. GPT-3 has also been criticized for its lack of common sense knowledge and susceptibility to producing biased or misleading responses.
GPT-3 也因缺乏常识知识和易受偏见或误导性回答的问题而受到批评。
19. OpenAI has made GPT-3 available through an API, allowing developers to create their own AI applications.
OpenAI 通过 API 提供了 GPT-3,使开发者能够创建自己的人工智能应用程序。
ChatGPT performance
ChatGPT 性能
20. An area where ChatGPT excels is in the generation of text. The model can generate coherent and fluent text on a wide range of topics, making it a popular choice for applications such as chatbots, language translation, and content generation.
ChatGPT 在文本生成方面表现出色。该模型可以在各种主题上生成连贯流畅的文本,使其成为聊天机器人、语言翻译和内容生成等应用的热门选择。
21. The text generated by ChatGPT is often difficult to distinguish from text written by a human, as demonstrated by its performance on the DALL-E 2.0 test, where it generated high-quality images from textual descriptions.
ChatGPT 生成的文本往往难以与人类写作的文本区分开,这在其在 DALL-E 2.0 测试中的表现中得以体现,它能够根据文本描述生成高质量的图像。
22. ChatGPTs performance is also influenced by the amount of training data it has been exposed to. The more data a language model has been trained on, the more information it has available to generate accurate and relevant responses.
ChatGPT 的性能也受到其所接触的训练数据量的影响。语言模型所经过的训练数据越多,它就能够获得更多信息来生成准确和相关的回答。
23. OpenAI has reported that the models performance improves significantly when it is fine-tuned on specific domains or tasks, demonstrating flexibility and adaptability.
OpenAI 报告称,当在特定领域或任务上对模型进行微调时,其性能显著提高,展示出了灵活性和适应性。
ChatGPT limitations
ChatGPT 的局限性
24. One of the biggest challenges is its computational requirements. The model requires significant computational resources to run, making it challenging to deploy in real-world applications.
其中一个最大的挑战是其计算要求。该模型需要大量的计算资源来运行,这使得在实际应用中部署它变得具有挑战性。
25. Despite its large size and high accuracy, ChatGPT still makes mistakes and can generate biased or inaccurate responses, particularly when the model has not been fine-tuned on specific domains or tasks.
尽管 ChatGPT 的规模庞大且准确性较高,但它仍然会犯错误并可能生成有偏见或不准确的回答,特别是当模型在特定领域或任务上没有进行精细调整时。
26. ChatGPT cannot access the internet or external links.
ChatGPT 无法访问互联网或外部链接。
27. ChatGPT’s knowledge is limited to its training data, which has the cutoff year of 2021.
ChatGPT 的知识仅限于其训练数据,而其训练数据截止于 2021 年。
ChatGPT statistics: research warns of risk of malicious use
ChatGPT 统计数据:研究警告,存在恶意使用的风险
28. According to research released by BlackBerry, 51% of IT decision-makers believe a successful cyberattack will be credited to ChatGPT within the year.
根据黑莓公司发布的研究报告,51% 的 IT 决策者认为,在未来一年内,成功的网络攻击将归因于 ChatGPT。
29. 71% of those surveyed figure that nation-state actors are already using the technology for malicious purposes.
在接受调查的人中,71% 的人认为国家级行为者已经在将这项技术用于恶意目的上。
30. Regarding how exactly ChatGPT will be used to help foster cyberattacks, 53% of people said it would help hackers create more believable phishing emails.
关于 ChatGPT 将如何帮助促成网络攻击,53% 的人表示它将帮助黑客创建更加真实可信的钓鱼邮件。
31. 49% of respondents pointed to its ability to help hackers improve their coding abilities.
49% 的受访者指出 ChatGPT 的能力将帮助黑客提升他们的编码能力。
32. The survey also revealed that 49% of people believe ChatGPT will be used to spread misinformation and disinformation.
调查还显示,49% 的人认为 ChatGPT 将被用于传播错误信息和虚假信息。
33. 48% of those surveyed think it could be used to craft entirely new malware strains.
调查显示,48% 的受访者认为 ChatGPT 可能被用于创造全新的恶意软件变种。
34. 46% of respondents said ChatGPT could help improve existing attacks.
46% 的受访者表示,ChatGPT 可能有助于改进现有的攻击手段。
35. The same research also revealed that 95% believe governments are responsible for regulating advanced technologies, such as ChatGPT.
同一项研究还揭示出,95% 的受访者认为政府有责任对诸如 ChatGPT 等先进技术进行监管。
ChatGPT statistics: costs
ChatGPT 统计数据:成本
36. ChatGPT is free for users during the research phase while the company gathers feedback.
ChatGPT 在研究阶段对用户是免费的,公司在此期间会收集用户的反馈意见。
37. For developers who want to incorporate this tool into other software, the cost is around a penny for 20,000 words of text and approximately 2 cents for images.
对于希望将 ChatGPT 集成到其他软件中的开发者来说,每 2 万个文字的费用大约为一美分,图片的费用大约为两美分。
38. The GPT-3 AI model reportedly cost OpenAI $12 million for a single training run.
据报道,GPT-3 AI 模型的单次训练就需要 OpenAI 投入 1200 万美元。
39. Tom Goldstein, an AI ML Professor at Maryland University, has estimated the daily cost of running ChatGPT to be approximately $100,000 and the monthly cost to be $3 million. His estimates are based on Azure Cloud costs (server infrastructure on which ChatGPT runs).
马里兰大学的人工智能和机器学习教授 Tom Goldstein(汤姆·戈尔茨坦)估计,运行 ChatGPT 的每日成本约为 10 万美元,月度成本为 300 万美元。他的估算是基于 Azure Cloud 的费用(ChatGPT 运行的服务器基础设施)。
40. OpenAI has recently launched a pilot subscription price of $20. It is invite-only, promises access even during peak times, and provides faster responses and priority access to new features and improvements.
OpenAI 最近推出了每月 20 美元的试点订阅价格。该订阅是邀请制的,承诺在高峰时段也能获得访问权限,并提供更快的响应速度以及对新功能和改进的优先访问。
OpenAI background and investments
OpenAI 的背景和投资情况
41. OpenAI was founded in San Francisco in 2015 by Carlos Virella, Elon Musk, Greg Brockman, Ilya Sutskever, James Greene, Sam Altman, and Wojciech Zaremba.
OpenAI 成立于 2015 年,由 Carlos Virella(卡洛斯·维雷拉)、Elon Musk(埃隆·马斯克)、Greg Brockman(格雷格·布洛克曼)、Ilya Sutskever(伊利亚·苏茨克弗)、James Greene(詹姆斯·格林)、Sam Altman(萨姆·奥尔特曼)和 Wojciech Zaremba(伍伊切赫·扎仁巴) 在旧金山创立。
42. The startup has successfully attracted a series of high-profile investors.
这家初创公司成功吸引了一些备受瞩目的投资者。
43. After the launch of ChatGPT, OpenAI is valued at $29 billion.
在推出 ChatGPT 之后,OpenAI 的估值达到了 290 亿美元。
44. Microsoft invested $10 billion recently. It represents the third investment that the company founded by Bill Gates made. It plowed $1 billion into OpenAI in 2019 and reinvested in 2021.
微软最近投资了 100 亿美元,这是这家由比尔·盖茨创立的公司对 OpenAI 的第三次投资。微软在 2019 年投资了 10 亿美元,并在 2021 年再次进行了投资。
45. OpenAI ranks among the most funded machine-learning startup firms in the world, with funding of over 1 billion U.S. dollars as of January 2023.
截至 2023 年 1 月,OpenAI 在全球最受资助的机器学习初创公司中名列前茅,获得的资金超过 10 亿美元。
46. In a recent pitch to investors, OpenAI expects $200 million in revenue this year and $1 billion by 2024.
在最近向投资者的推介中,OpenAI 预计今年的收入为 2 亿美元,到 2024 年将达到 10 亿美元。
47. Companies in the technology and education sectors are most likely to take advantage of OpenAI’s solutions. At the same time, business services, manufacturing, and finance are also high on the list of industries utilizing artificial intelligence in their business processes.
技术、教育领域的公司最有可能利用 OpenAI 的解决方案。与此同时,商业服务、制造业和金融业也在利用人工智能来开展业务流程。
ChatGPT statistics: users
ChatGPT统计数据:用户
48. ChatGPT gained 1 million users in under a week.
ChatGPT 在不到一周的时间内吸引了 100 万用户。
49. The chatbot accumulated 57 million monthly active users in its first month of availability.
这个聊天机器人在上线的第一个月积累了 5700 万的月活跃用户。
50. It surpassed 100 million active users in January.
它在 2023年 1 月份就超过了 1 亿的活跃用户。
Key takeaways
要点总结
ChatGPT has swept the world, and there are reasons why it has happened:
ChatGPT 的兴起在全球范围内引起了轰动,并有其原因:
With its 175 billion parameters, it has converted into the most prominent language model.
拥有 1750 亿个参数,成为目前规模最大的语言模型。
It has demonstrated impressive performance on various NLP tasks and applications.
在各种自然语言处理任务和应用中展现出令人印象深刻的性能。
Its large size and diverse training data allow it to generate high-quality text and answer a wide range of questions with high accuracy.
庞大的规模和多样化的训练数据使其能够生成高质量的文本,并能高准确性地回答各种问题。
However, the models computational requirements and potential for bias and error are essential considerations when deploying it in real-world applications. Moreover, cybercriminals could use it to carry out successful attacks.
然而,在将其应用于实际场景时,模型的计算需求和潜在的偏见与错误是必须考虑的重要因素。此外,网络犯罪分子可能会利用它来进行成功的攻击。
But lets face it, with its pros and cons, ChatGPT has a promising future ahead. The investment it recently received from Microsoft and the launch of the subscription pilot demonstrate it.
然而,让我们面对现实,尽管存在一些优点和缺点,ChatGPT 的未来充满希望,最近获得微软的投资以及订阅试点的推出都表明了这一点。
说明:本文转自 Brenda Gratas 2013 年 2 月 14 日的博文并进行了英中对照翻译,欲查看原文的,请点击左下方的阅读原文。
点赞有美意,赞赏是鼓励
-
上一篇
如您希望下载PDF版本,请点击文末阅读原文获取。
引言
ChatGPT,一款由美国科技公司OpenAI于2022年11月30日发布的AI聊天机器人,一经面世便引发全球热议。随着其热度不断升高,与之相关的诸多版权争议受到广泛关注,训练数据侵权问题便是其中之一。
作为语言生成式模型,ChatGPT训练数据由大量文本数据组成。目前各国对生成式AI训练数据的使用仍未单独制定成文法规定,但域外对文本与数据挖掘(Text Data Mining,后称TDM)技术的法律规制却具有重要借鉴意义。TDM指的是利用自动分析技术分析文本与数据的模式、趋势以及其他有价值的信息,是以计算机为基础的,从文本或数据导出或组织信息的过程。[1]从技术原理来看,ChatGPT训练数据库的建构与TDM均以文本和数据输入为基础,二者在著作权法上具有相似意义。而在法律层面上,基于制度衔接与法律秩序稳定性的考量,针对使用主体、使用目的、使用方式、限制条件等问题,二者的法律适用应当存在一定程度上的延续与联系。因此,本文将围绕ChatGPT训练数据之合理使用展开分析,从比较法视野分析英国、欧盟、美国及中国对TDM所制定的合理使用制度,继而分析现行法律框架下ChatGPT所实施的数据挖掘行为是否具有合法性依据。
01
ChatGPT数据挖掘原理与侵权风险
ChatGPT是一种基于自然语言处理(NLP)的AI系统,使用了深度神经网络和自然语言处理技术来生成文本,其工作原理可分为三个阶段:数据输入——机器学习——结果输出。自然语言处理AI的训练数据通常由大量文本数据组成,当中包含了语言的各种形式和用法。
浅析ChatGPT训练数据之合理使用
-
下一篇
1800亿参数,支持中文,3.5万亿训练数据!开源类ChatGPT模型
原标题:1800亿参数,支持中文,3.5万亿训练数据!开源类ChatGPT模型