Current state-of-the-art natural language processing (NLP) systems struggle to generalize to work on different tasks. Every time the task is slightly changed they need to be fine-tuned again on datasets of thousands of examples. In contrast, humans are capable of shifting to a new language task with only seeing a few examples.
The goal behind GPT-3, and the paper Language Models are Few-Shot Learners, was to address this issue. Namely, to improve the task-agnostic characteristic of language models.
GPT-3 is a text-generating model, the most popular ML model of the year, created by a team at OpenAI. The model was trained on a dataset of half a trillion words for over 150 billion parameters. That’s 10 times more than previous language models.
Afterward, no extra fine-tuning is performed, only few-shot demonstrations. That’s to say, the general model is given some examples of the specific task at hand to give it context.
While the GPT-3 model achieved promising results in the zero-shot and one-shot settings, in the few-shot setting, it occasionally surpassed state-of-the-art models.
The applications of the GPT-3 model are quite varied given its generalized design. People have already demonstrated its use for tasks such as generating emails from descriptions or summarizing them, generating Python code from descriptions, and even generating faces from descriptions.
The API is currently in the private beta stage. OpenAI is still determining the commercial applications of the technology and determining the longer-term pricing. As a result access to the API is currently free, but given the popularity of the model, there is a waitlist for gaining access.
You can access the full research paper on Arxiv.org: https://arxiv.org/abs/2005.14165v2