With 2020 in the rear-view mirror it’s a good time to reflect on what’s happened during the year. More particularly for us, how the data science and artificial intelligence fields have developed. 

Despite the challenges that faced the world in 2020, tech and software have continued to thrive. Lots of amazing research was released in the field of artificial intelligence. Arxiv.org, a popular open-archive repository, received roughly 600 uploads of machine learning research papers. That should give you some idea of the scope of how much research is happening.

The research that generates the biggest buzz typically comes from well-established companies, universities, and research labs. With all that content it can be difficult to keep track of it all. That’s why conferences on machine learning and computer vision are a great way to discover new work.

For this article, we’ve compiled some of the most interesting machine learning and artificial intelligence papers of 2020.

KEY ML MODELS OF 2020

  • GPT-3 (NATURAL LANGUAGE PROCESSING)
  • PULSE (IMAGE SUPER-RESOLUTION PROCESSING)
  • PIFuHD (HD 3D MODELING FROM 2D IMAGES)
  • YOLOv4 (REAL-TIME OBJECT DETECTION)
  • GameGAN (SECOND-HAND LEARNING FROM VIDEO FOOTAGE)

LANGUAGE MODELS ARE FEW-SHOT LEARNERS

Current state-of-the-art natural language processing (NLP) systems struggle to generalize to work on different tasks. Every time the task is slightly changed they need to be fine-tuned again on datasets of thousands of examples. In contrast, humans are capable of shifting to a new language task with only seeing a few examples. 

The goal behind GPT-3, and the paper Language Models are Few-Shot Learners, was to address this issue. Namely, to improve the task-agnostic characteristic of language models. 

GPT-3 is a text-generating model, the most popular ML model of the year, created by a team at OpenAI. The model was trained on a dataset of half a trillion words for over 150 billion parameters. That’s 10 times more than previous language models. 

Afterward, no extra fine-tuning is performed, only few-shot demonstrations. That’s to say, the general model is given some examples of the specific task at hand to give it context.

While the GPT-3 model achieved promising results in the zero-shot and one-shot settings, in the few-shot setting, it occasionally surpassed state-of-the-art models.

The applications of the GPT-3 model are quite varied given its generalized design. People have already demonstrated its use for tasks such as generating emails from descriptions or summarizing them, generating Python code from descriptions, and even generating faces from descriptions.

The API is currently in the private beta stage. OpenAI is still determining the commercial applications of the technology and determining the longer-term pricing. As a result access to the API is currently free, but given the popularity of the model, there is a waitlist for gaining access.

You can access the full research paper on Arxiv.org: https://arxiv.org/abs/2005.14165v2

 

PULSE: SELF-SUPERVISED PHOTO UPSAMPLING VIA LATENT SPACE EXPLORATION OF GENERATIVE MODELS

The main goal of single-image super-resolution is to take a low-resolution input and produce a high-resolution output image from it. There are models that already exist which can perform this task but PULSE takes it a step further.

Previous approaches to the task generally use supervised learning to train their networks. They would start with the low-resolution images and infer the upscaled version. They would then measure the average distance between the new image and the high-resolution ground truth. This technique fails to account for important details leading to a blurring of the results. This occurs most in areas where the differences between the high and low-resolution images are the greatest.

PULSE addresses that issue through the use of self-supervised techniques. The model traverses the high-resolution image manifold and searches for images that downscale to match the low-resolution input. It then evaluates the best ones based on a loss function. Since a low-resolution image is more general, PULSE returns several high-resolution images that could possibly be correct.

The results of this paper could be utilized in the fields of medicine, astronomy, satellite imagery, etc. Namely, for any application where sharp images are required but achieving high-resolution images is quite limited by cost or hardware.

You can try the model out for yourself with your own images here: https://colab.research.google.com/drive/1-cyGV0FoSrHcQSVq3gKOymGTMt0g63Xc 

You can access the full research paper on Arxiv.org: https://arxiv.org/abs/2003.03808

PIFuHD: MULTI-LEVEL PIXEL_ALIGNED IMPLICIT FUNCTION FOR HIGH-RESOLUTION 3D HUMAN DIGITIZATION

PIFuHD was developed by researchers at the University of Southern California and Facebook. It uses a 2D image of someone to reconstruct a high-resolution 3D model of that person. The goal of the research was to achieve a high-fidelity 3D model of clothes humans that included detailed information such as facial features and clothing creases. 

Previous approaches to this problem didn’t use full high-resolution images. Instead, the downscaled to reduce memory requirements. The results were losses in important and minute information. 

PIFuHD achieves its results through a multi-level architecture model. A coarse level observes the whole image at a lower resolution, focusing on holistic reasoning. This provides a larger spatial context for the picture capturing the global 3D structure. Then, higher-solution images are observed to estimate the fine-level and detailed geometry of the person.

Since the fine level details don’t require seeing the entire image at higher-resolution, the background information can be removed. This reduces the load on the computer and helps solve the memory and computation issues of using higher-resolution images.

Just as with the previous paper mentioned, you can try this model out for yourself through a public demo: https://colab.research.google.com/drive/11z58bl3meSzo6kFqkahMa35G5jmh2Wgt 

You can access the full research paper here: https://arxiv.org/abs/2004.00452

YOLOv4: OPTIMAL SPEED AND ACCURACY OF OBJECT DETECTION

The You Only Look Once (YOLO) algorithm is a real-time object detection model. The first version was released in 2015 and last year the 4th version was released. 

The main goal of the algorithms was to make a super-fast and high-quality object detector. YOLOv4 has achieved that with the best real-time object detection accuracy to date.

Object detector architectures are typically composed of several components. This includes the image input, feature extraction, and predictions using a detector algorithm. The researchers introduced a new data augmentation method for YOLOv4. Furthermore, they performed exhaustive testing with a multitude of hyperparameters to optimize the model. The results yielded a significant speed and performance upgrade compared to previous versions and other detectors.

With the rise of automated machinery, like autonomous vehicles for example, these advances are quite important. In those types of applications, fast-performing real-time detection is crucial, especially with high levels of accuracy.

You can access the full research paper here: https://arxiv.org/abs/2004.10934

LEARNING TO SIMULATE DYNAMIC ENVIRONMENTS WITH GameGAN

Researchers at NVIDIA created a powerful AI model called GameGAN. GameGAN is the first neural network model designed to utilize GANs (generative adversarial networks) to mimic a computer game engine.

GANs are models that are composed of a generator and a discriminator. When combined they allow the models to learn how to create new content similar to the original. 

What this means is that GameGAN was born out of research to emulate a game engine using only screenplay. The projected used 50,000 gameplay sessions of the classic arcade game Pac-Man. The AI was trained on this to see if it could learn the rules of the game just by visuals alone. And it did.

GameGAN was capable of generating new frames of the game environment in real-time. Furthermore, it was capable of generating new layouts when trained from games with multiple levels.

The significance of this paper again comes when we look at automation. Autonomous robots are typically trained in a simulator where the AI can learn the environment’s rules. Creating these types of training simulators is a costly and time-consuming process. The developers need to pay close attention to how objects and light interact. The closer the simulation is to reality the better the training will be for the robot. 

With further development of GameGAN, the development of simulation systems could be eased. Additionally, if AI becomes trainable through real-world footage, rather than simulations, the development of those systems might not be needed altogether.

You can access the full research paper here: https://arxiv.org/abs/2005.12126

Interested in learning more about the artificial intelligence and data science? With the continuous growth and developments in the fields, it’s never a bad time to enter the industry. Whether you just want to pick up a few new skills or launch a new career, Lantern Institute provides programs in data science to help you get started.

5 1 vote
Article Rating