Friday, January 22, 2021

Introduction to SQL


 SQL is a standard language for accessing and manipulating databases.

What is SQL?

  • SQL stands for Structured Query Language.
  • SQL lets you access and manipulate databases.
  • SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of the International Organization for Standardization (ISO) in 1987.

What Can SQL do?

  • SQL can execute queries against a database
  • SQL can retrieve data from a database
  • SQL can insert records in a database
  • SQL can update records in a database
  • SQL can delete records from a database
  • SQL can create new databases
  • SQL can create new tables in a database
  • SQL can create stored procedures in a database
  • SQL can create views in a database
  • SQL can set permissions on tables, procedures, and views

RDBMS

  • RDBMS stands for Relational Database Management System.

  • The data in RDBMS is stored in database objects called tables. A table is a collection of related data entries and it consists of columns and rows.

  • RDBMS is the basis for SQL, and for all modern database systems such as MS SQL Server, IBM DB2, Oracle, MySQL, and Microsoft Access.

Wednesday, September 30, 2020

The 8 Data Science Skills That Will Get You Hired

Programming Skills

Data Scienct Skills - Programming- UdacityNo matter what type of company or role you’re interviewing for, you’re likely going to be expected to know how to use the tools of the trade. This means a statistical programming language, like R or Python, and a database querying language like SQL.

Statistics
Data Science Skills - Statistics - UdacityA good understanding of statistics is vital as a data scientist. You should be familiar with statistical tests, distributions, maximum likelihood estimators, etc. This will also be the case for machine learning, but one of the more important aspects of your statistics knowledge will be understanding when different techniques are (or aren’t) a valid approach. Statistics is important at all company types, but especially data-driven companies where stakeholders will depend on your help to make decisions and design / evaluate experiments.

Machine Learning
Data Science Skills - Machine Learning - UdacityIf you’re at a large company with huge amounts of data, or working at a company where the product itself is especially data-driven (e.g. Netflix, Google Maps, Uber), it may be the case that you’ll want to be familiar with machine learning methods. This can mean things like k-nearest neighbors, random forests, ensemble methods, and more. It’s true that a lot of these techniques can be implemented using R or Python libraries—because of this, it’s not necessary to become an expert on how the algorithms work. More important is to understand the broad strokes and really understand when it is appropriate to use different techniques.

Multivariable Calculus & Linear Algebra
Understanding these concepts is most important at companies where the product is defined by the data, and small improvements in predictive performance or algorithm optimization can lead to huge wins for the company. In an interview for a data science role, you may be asked to derive some of the machine learning or statistics results you employ elsewhere. Or, your interviewer may ask you some basic multivariable calculus or linear algebra questions, since they form the basis of a lot of these techniques. You may wonder why a data scientist would need to understand this when there are so many out of the box implementations in Python or R. The answer is that at a certain point, it can become worth it for a data science team to build out their own implementations in house.

Data Wrangling
Data Science Skills - Data Wrangling - UdacityOften, the data you’re analyzing is going to be messy and difficult to work with. Because of this, it’s really important to know how to deal with imperfections in data. Some examples of data imperfections include missing values, inconsistent string formatting (e.g., ‘New York’ versus ‘new york’ versus ‘ny’), and date formatting (‘2017-01-01’ vs. ‘01/01/2017’, unix time vs. timestamps, etc.). This will be most important at small companies where you’re an early data hire, or data-driven companies where the product is not data-related (particularly because the latter has often grown quickly with not much attention to data cleanliness), but this skill is important for everyone to have.

Data Visualization & Communication
Data Science Skills - Data Visualization & Communication - UdacityVisualizing and communicating data is incredibly important, especially with young companies that are making data-driven decisions for the first time, or companies where data scientists are viewed as people who help others make data-driven decisions. When it comes to communicating, this means describing your findings, or the way techniques work to audiences, both technical and non-technical. Visualization-wise, it can be immensely helpful to be familiar with data visualization tools like matplotlib, ggplot, or d3.js. Tableau has become a popular data visualization and dashboarding tool as well. It is important to not just be familiar with the tools necessary to visualize data, but also the principles behind visually encoding data and communicating information.

Software Engineering
Data Science Skills - Software Engineering - UdacityIf you’re interviewing at a smaller company and are one of the first data science hires, it can be important to have a strong software engineering background. You’ll be responsible for handling a lot of data logging, and potentially the development of data-driven products.

Data Intuition
Data Science Skills - Data Intuition - UdacityCompanies want to see that you’re a data-driven problem-solver. At some point during the interview process, you’ll probably be asked about some high level problem—for example, about a test the company may want to run, or a data-driven product it may want to develop. It’s important to think about what things are important, and what things aren’t. How should you, as the data scientist, interact with the engineers and product managers? What methods should you use? When do approximations make sense?

Data Science Skills - Udacity - Matrix

 

Careers in data science


  • Machine Learning Scientist: Machine learning scientists research new methods of data analysis and create algorithms.
  • Data Engineer: Data Engineers prepare the “big data” infrastructure to be analyzed by Data Scientists. They are software engineers who design, build, integrate data from various resources, and manage big data.
  • Data Analyst: Data analysts utilize large data sets to gather information that meets their company’s needs. Data Consultant: Data consultants work with businesses to determine the best usage of the information yielded from data analysis.
  • Data Architect: Data architects build data solutions that are optimized for performance and design applications.
  • Applications Architect: Applications architects track how applications are used throughout a business and how they interact with users and other applications.

Tuesday, September 15, 2020

What is Data Science ?

Data science is an interdisciplinary field focused on extracting knowledge from data sets, which are typically large. 


The field encompasses analysis, preparing data for analysis, and presenting findings to inform high-level decisions in an organization. As such, it incorporates skills from computer science, mathematics, statistics, information visualization, graphic design, complex systemscommunication and business.

Introduction to SQL

  SQL is a standard language for accessing and manipulating databases. What is SQL? SQL stands for Structured Query Language. SQL lets you a...