Python is a highly-praised and commonly utilized programming language across industries, academia, and the enthusiast scene. This is in big part due to the vast open-source community that continues to make top-notch and freely available libraries.

For this article, we’ve presenting a compilation of some of the best Python libraries that have been released or gained popularity in the past year.

2020 PYTHON LIBRARIES

  • Diagrams
  • Dear PyGui
  • HiPlot
  • Hummingbird
  • PyCaret
  • PyTorch Lightning
  • Rich
  • Stanza
  • Typer

Diagrams

Whether you’re a developer or data scientist, proper project documentation and reporting is an important part of the work. Typically when having to explain system architecture we resort to traditional software for creating diagrams. With Diagrams that’s no longer the case.

Diagrams lets you draw the cloud system architecture in Python code. The library was born for prototyping a new system architecture design without any design tools. You can also describe or visualize the existing system architecture as well. 

Diagrams currently supports main major providers including AWS, Azure, GCP, Kubernetes, Alibaba Cloud, Oracle Cloud, etc. Many general icons, and product specific icons, are included to make the process incredibly simple. 

Having diagrams as code also allows you to easily track the diagram changes using any version control system.

Diagrams GitHub: https://github.com/mingrammer/diagrams

Dear PyGui

Some Python projects aren’t quite complete without a proper, good-looking user interface. Dear PyGui is a simple to use, but powerful, Python GUI framework. It provides a wrapping of the popular Dear ImGui for C++.

Dear PyGui is fundamentally different than other Python GUI frameworks. Under the hood, Dear PyGui uses the immediate mode paradigm and your computer’s GPU to facilitate extremely dynamic interfaces. Dear PyGui is supported on Windows 10, macOs, and Linux through the use of DirectX 11, Metal, and OpenGL 3, respectively.

In the same manner that Dear ImGui provides a simple way to create tools for game developers, Dear PyGui provides a simple way for python developers to create quick and powerful GUIs for scripts.

HiPlot

Machine learning models are continuously getting more complex with a growing number of hyperparameters. Furthermore, working with high-dimension data can make it difficult to intuitively understand. Facebook AI’s HiPlot library is one solution for dealing with this issue.

HiPlot is a lightweight interactive visualization tool. It was created to help AI researchers discover correlations and patterns in high-dimensional data. It does this through the use of parallel plots and other graphical ways to represent information more clearly.

Before release, Facebook AI was using HiPlot for just that. Namely, to explore and efficiently analyze hyperparameter tuning of deep neural networks with over 100,000 experiments.

HiPlot can easily be used in a standard Jupyter Notebook without extra setup required. Alternatively, it also provides the option of running its own server for displaying its outputs.

HiPlot GitHub: https://github.com/facebookresearch/hiplot

Hummingbird

When working with machine learning, models typically fall under two types of categories: deep-learning and traditional algorithms. These types of models operate very differently, in particular, deep-learning utilizes what’s called tensor computations. Given the capabilities of the models, the field of deep-learning is rapidly expanding with new advancements in software and specific hardware. Hardware that can perform tensor computations very quickly and efficiently.

But what happens if we want to use the new hardware with traditional models that don’t use tensors? That’s something that the Hummingbird library aims to tackle. 

A simple decision tree to be transformed [Left] and the resulting neural network after transformation [Right].

Microsoft’s Hummingbird is a library for compiling trained traditional models into tensor computations. This allows users to seamlessly leverage neural network frameworks like PyTorch to accelerate the performance of traditional machine learning models. This provides the benefits of native hardware acceleration, framework optimizations, and removing the need to re-engineer your models.

Currently, you can use Hummingbird to convert your trained traditional ML models into PyTorch, TorchScript, ONNX, and TVM. Hummingbird supports a variety of ML models and featurizers. These models include scikit-learn Decision Trees and Random Forest, and also LightGBM and XGBoost Classifiers/Regressors. Support for other neural network backends and models is on Microsoft’s roadmap.

Hummingbird GitHub: https://github.com/microsoft/hummingbird

PyCaret

PyCaret is a great library for a wide variety of data scientists, whether you want to increase productivity, build rapid prototypes, or just prefer low code solutions. It’s a streamlined machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that speeds up the experiment cycle exponentially, reduces the code you need to write, and makes you more productive.

In comparison with the other open-source machine learning libraries, PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few words only. This makes experiments exponentially fast and efficient. PyCaret is essentially a Python wrapper around several machine learning libraries and frameworks such as scikit-learn, XGBoost, LightGBM, CatBoost, spaCy, Optuna, Hyperopt, Ray, and many more.

The design and simplicity of PyCaret is inspired by the emerging role of citizen data scientists, a term first used by Gartner. Citizen Data Scientists are power users who can perform both simple and moderately sophisticated analytical tasks that would previously have required more expertise. Seasoned data scientists are often difficult to find and expensive to hire but citizen data scientists can be an effective way to mitigate this gap and address data-related challenges in the business setting.

PyCaret GitHub: https://github.com/pycaret/pycaret

PyTorch Lightning

PyTorch Lightning is a lightweight PyTorch wrapper meant to boost productivity in high-performance AI research. It disentangles PyTorch code to decouple the science from the engineering. In other words, it makes it even easier for data scientists to build complex AI models by providing a high-level interface for PyTorch.

Lightning is designed with these principles in mind:

  • Enable maximal flexibility. 
  • Abstract away unnecessary boilerplate, but make it accessible when needed. 
  • Systems should be self-contained (ie: optimizers, computation code, etc). 
  • Deep learning code should be organized into 4 distinct categories. Namely, research code, engineering code, non-essential research code, and data.

With this library, teams can create easily scalable deep learning models that can easily be run on different hardware such as multiple-GPUs, TPUs, and CPUs. All without having to change the code!

PyTorch Lightning GitHub: https://github.com/PyTorchLightning/PyTorch-lightning

Rich

We’ve already mentioned a GUI library for Python, but what about the command-line interface? By default when writing Python CLI programs you just have access to singularly styled and colored outputs. Rich is a Python library for rich text and beautiful formatting in the terminal.

The Rich API makes it easy to add color and style to terminal output. Rich can also render pretty tables, progress bars, markdown, syntax highlighted source code, tracebacks, and more.

It runs on Windows, Linux, and OSX, featuring 8 colors on the classic terminal and true color & emoji functionality on newer terminals.

Rich GitHub: https://github.com/willmcgugan/rich

Stanza

Stanza is a Python natural language processing library created by The Stanford NLP Group. It contains a collection of accurate and efficient NLP tools for 60+ human languages. It also supports accessing the Java Stanford CoreNLP software from Python.

The library fully supports the NLP pipeline used for text analysis. This includes tokenization, multi-word token expansion, lemmatization, part-of-speech and morphological features tagging, dependency parsing, and named entity recognition.

Stanza also comes with several pretrained neural models for different tasks. This means scientists looking to implement NLP functions in a project will be able to do so quickly and effectively in a variety of languages.

An overview of Stanza’s neural network NLP pipeline.

Stanza GitHub: https://github.com/stanfordnlp/stanza

Typer

To finish off our list we have the Typer library made by the same creator as FastAPI. Typer is another library aimed at the CLI, however, Typer aims at improving the user interaction with the program on the command-line rather than improving the output design. 

Typer is intended to make CLI applications intuitive to write through its easy to use functionality and autocompletion. Its minimal design means developers productivity isn’t impacted and the number of bugs are minimized. It also allows the complexity to easily grow for more elaborate command trees as your program demands.

The end results are programs that are easy to use for final uses. Through use of the library, applications can provide users automatic help and automatic completion in the shell!

Typer GitHub: https://github.com/tiangolo/typer

Interested in learning more Python and data science? With the continuous growth and developments in the fields, it’s never a bad time to enter the industry. Whether you just want to pick up a few new skills or launch a new career, Lantern Institute provides programs in data science to help you get started.

5 1 vote
Article Rating