The pillar of programming is the one that beginners tend to focus on most. This is especially true for any data science program and/or course that you take. Even our data science program at Lantern starts new students off with a course in Python.
The reason for this is that is considered a core necessity for modern data scientists. It’s typically one of the first things that hiring managers will filter and test candidates on. What’s more, of the four pillars introduced, programming is the easiest to teach. This makes it an ideal starting point and allows us at Latern to incorporate topics of math and communication in a more natural manner.
There are also low-level programming languages that are sometimes used in data science. For example, many machine learning libraries (and even most Python libraries) are built using C++ or C. The benefit of these languages is that they can be used to perform faster computations which may be required for massive amounts of data.
Knowing low-level languages can be beneficial for pioneers in the data science space who are writing new libraries and algorithms. However, for most data scientists, especially at the start of their career, a high-level language like Python (which has many data science and machine learning libraries) is more useful to learn. Once you become intimately familiar with programming you can move on to more languages, and you’ll also find that all core concepts are the same for all languages.
In addition to languages like Python, there are a few more languages that you will no doubt encounter in your data science journey and should pick up along the way. The first is SQL (Structured Query Language) and the second is HTML (HyperText Markup Language).
As a data scientist you will be working with data, and sometimes lots of it! While working with Excel and CSV files is no doubt the simplest, there are better ways of storing data. Namely, in databases that can store thousands of tables and millions of records. And to access these records in a clear and systemic way we can employ SQL to send requests, or queries, for data and receive the corresponding tables and records.
Given the importance of collecting, managing, and formatting data, SQL should be among the first few languages that you learn and continue to practice throughout your studies. If you’re looking for an accelerated course, Lantern offers one as part of our Data Science curriculum.
And let’s not forget HTML. HTML is a language that is used for frontend development. The frontend refers to what a typical user would see, i.e. the user interface. In this case, HTML is a language that is used to give web pages their layout and structure. It defines how elements on the page should be ordered and where the content (e.g. text, values, images) should be displayed.
While HTML is not a crucial part of study data science, being familiar with its basic workings can unlock new areas of the workflow that make you a more functional scientist (and a more attractive candidate).
For example, there is a ton of data that is available on the internet. Sometimes this data is conveniently placed and easy to download (e.g. a database or CSV files). However, there is also a lot of data that is readily displayed but cannot be downloaded will a single click of a button. For this, you can write programs that utilize data scrapping techniques to collect useful data and place it into a local database for later use. Knowing how data is stored on web pages (i.e. HTML) can make the whole process much easier to figure out.
Furthermore, on the other side of the workflow, we have the reporting process. Data scientists will often use reporting dashboards to display data and key information. Creating such dashboards involves frontend developments and requires a functional understanding of HTML.
The takeaway: programming is an important aspect of the job which you should work on learning first. Start by focusing on understanding the basics of Python, SQL, and some HTML before moving on to more advanced topics.