DAND – Python Introduction continued

I finished the control flow and functions lecture of the python section today. The lections are definitely not designed for complete programming newbies, but they are rather a real quick refresher on the most important concepts which are needed for the program. The quizzes are also a little bit trickier than you would expect for normal Python introductory course. For me, this fits quite well to my profile since I have already some knowledge of the key concepts like loops, if conditions, etc, but I definitely have a lot more to learn.

In the control flow lecture, if statements, for and while loops were topics which were already familiar to me. Hence, I will not go into more detail here. New to me were Break & Continue, Zip and enumerate and list comprehensions.

Break & Continue:

Break stops the loop when a condition is met.

y = [“a”,”b”,”c”,”d”]

for e in y:
if e==”c”:
break
print (e)

This will output:

a,b

c and  d will not be printed since the break statement is run when e = c

The continue statement works similarly. If a certain condition is met, the loop stops, but different from the break statement, it will continue with the next iteration of the loop.

for e in y:
if e==”c”:
continue
print (e)

The output is a,b,d. C is not printed but the loop is continued after the c iteration. Therefore, the last item of the list, d, is printed as well.

Enumerate and zip

Enumerate is a built-in function which returns the index of the current position in the loop.

names = [“Dennis, “Hans”, “Rick”]

for i, name in enumerate(names):

print (i, name)

output: 0 Dennis, 1 Hans, 2 Rick

The zip function is used to combine multiple lists.

names  = [“Dennis”, “Hans”, “Rick”]

age = [22,44,55]

personalinfo = list(zip(names,age))

print (personalinfo)

Output:

[(‘Dennis’, 22), (‘Hans’, 44), (‘Rick’, 55)]

It returns a list of tuples which combines the information of both lists, names and age.

To return this new element it has to be put in a list. Otherwise, the print statement will just return the zip object. Another option is to loop of the zip element.

names  = [“Dennis”, “Hans”, “Rick”]

age = [22,44,55]

personalinfo = zip(names,age) # zip element is not turned into a list!

for info in personalinfo:

print (info)

Output is the 3 different tuples:

(‘Dennis’, 22)
(‘Hans’, 44)
(‘Rick’, 55)

It is also possible to unzip a tuple with the asterisks symbol: *

name, age = zip(*personalinfo)

This will store the different elements of the zipped object as a list in the variables name and age and undo the prior executed zipping.

list comprehensions

This is a concise way of writing a for loop in one sentence. The structure is the following:

newlist = [x+2 for x in range(5)]

print (newlist)

Output: [2, 3, 4, 5, 6] # For every x in the range 0-4 (5 is non inclusive) 2 is added to x.

The general structure of the list comprehension is, that for ist specified what is done to the element over which is iterated: x+2

Then it is specified how many iterations should occur: for x in range(5)

As the last step, the arguments are embedded in [] brackets to create the list. This has the big advantage that no empty list has to be created first to which then elements have to be appended.

This basic structure can vary a bit when taking if statements into consideration.

newlistif = [x+2 for x in range(5) if x%2==0]
print (newlistif)

Output: [2,4,6]

Here, the if statement is added at the very end, behind the range argument. Now, +2 is only added to even numbers.

However, if and else statement is necessary. This order changes.

newlistifelse = [x+2 if x%2==0 else x+1 for x in range(5)]
print (newlistifelse)

Output: [2, 2, 4, 4, 6]

Now the if statement is located directly behind the argument which specifies the action (x+2). This makes sense since it is also the argument the if statement refers to. If the if statement is true, x+2 will be added to x. The if statement is then followed by the else statement. The else statement is then followed by the action which refers to the else statement (x+1).

Functions:

I won’t go into much detail regarding the general functioning of functions since this was already a familiar concept to me. Functions work the following:

Multiple inputs can be put into a function and inside the functions body, it is defined how these inputs are handled. In my exemplary function my_function, the 2 inputs a and b are given into the function and then multiplied with each other. The return statement returns the results of the function’s

def my_function(a,b):

return a*b
result = my_function(5,2)

print (result)

Output: 10

New to me was the lambda function. This is a quick way of defining short functions. The prior function can be rewritten as a lambda function in the following way:

resultlambda= lambda a,b: a*b
print (resultlambda(2,3))

Output: 6

As you can see. The function can be written and assigned to a variable in only one line instead of 3.

Iterators and generators were the last concepts introduced in the functions lecture. This, I have to admit, I did not quite get. Generators are functions which create iterators. They are an alternative to lists which also returns iterators. The yield statement makes this function a generator. It makes a return statement obsolete and yields the element which is iterated over.

def generate(x):

for x in range(x):

yield x
var1 = generate(5)

for var in var1: print (var)

output: 0,1,2,3,4

DAND – Term 1 finished!

As planned I finished the Term 1 section this weekend. The last remaining section introduced SQL as a tool to access big amount of data. The basic statements were taught which are needed to query the required data from a database. These included the following statements: SELECT, WHERE, LIMIT, OR, AND, LIKE and ORDER BY. The SELECT statement is used to select the columns you are interested in from the database. With the WHERE statement, you specify the table from the database. For example, with SELECT channel FROM web_events you would select the channel column from the web_events table.

SQLStatement

Since databases can contain large amounts of rows which are outputted with the SELECT statement it could make sense to limit this output. This is, where the LIMIT statement comes in handy:

LIMKITPNG

LIMIT 5 limits the Output to 5 rows.

The WHERE Statement can be used as a filter. When using the WHERE statement only rows which fulfill the criteria specified in the WHERE Statement will be returned:

WHERE

The order of the statement is important as well. The WHERE statement has to follow the FROM statement. A reverse order would result in an error message.

The LIKE statement is kind of similar to the WHERE statement, but as the name says, it also returns matches which are “like” the keyword but do not match its criteria to 100%. To take our facebook example – a WHERE channel = ‘face’ statement would return an empty output since the criteria given does not exactly match the row now name ‘facebook’. The LIKE statement, however, would return all rows which contain the string ‘face’. But to get the right result we need to add a wildcard operator as well. The ‘%’ specifies which characters should be ignored after or before the specified input. In our example this would look like this:

LIKE

All characters after the ‘face’ will be ignored. Therefore, all rows with a Facebook entry will be returned.

Term 1 Project – Explore Weather Data

The task was to pull weather data of the closest city data from a database with a SQL statement and then compare these temperatures with the global average data. The global average data could be queried with a SQL statement and then downloaded in csv sheet. To pass the project a line graph which depicts a comparison of the city close to where you live and the global average over time (the measurements start in the 1750s) should be provided. Also, the graph should show the running average of the respective year to smoothen out the yearly variance.

Overall, the tasks did not seem to difficult and my feeling is that this project was designed to no not overwhelm the learners right in the beginning with a challenging project. The focus lies more on making oneself familiar with the way projects are reviewed, the project rubric, etc.

The tool used to solve this challenge could be chosen yourself. Python, Excel, R – all are valid options. I chose R as a tool. I just got started with R, (I completed a class on Udemy last week) so I wanted to prove my new acquired R knowledge right away. It probably took me a lot longer to complete the project with R than just doing this in Excel, but I think it was worth it.

I was a little bit concerned about calculating the moving average since I was not sure how to do this in a data frame, but it turned out that there is already a package which does all the work for you. It is called TTR. With the SMA function you can just specify the measure and the interval and the moving average is calculated. The rest was pretty straightforward. The only complication was that I had the data for Hamburg (the nearest city) and the global average in two different data frames. So plotting both lines in one graph was a challenge I had to solve. I found out that the data for each geom_line function can be specified individually, so I could plot the final line graph like this: ggplot(NULL,aes(x=as.Date(ISOdate(year,1,1))))+geom_point(data=df_1,aes(y=df_1temp), color= red) +geom_point(data=df_2,aes(y=df_2temp),color= blue)

The result looks like this:

Rplot04

Will continue with the next section – Introduction to Python tomorrow.

Cheers!

Data Analysts Nanodegree has started!

Yesterday, the content for the Data Analyst Nano Degree (DAND) was released. I could not wait to get started so I already dug into the material. The Syllabus for Term1 is the following:

The first part, Welcome to Term1, consists of 3 video lectures and a final capstone project which needs to be passed to complete this section. 

 

WelcometoND

Welcome to the Nandodree Program! is the name of the first video series in the welcome to Term1 section. It provides an overview over the course syllabus, an introduction of the different teachers as well as study tips, and encouragement on joining the available study communities (slack and the forums) in which you can ask for help from your peers who are part of the same cohort. Udacity also provides, and this is very different from the other MOOC platforms I know, an own career portal. Here, you can get advice on improving your CV/Linkedn page, etc. Udacity also fosters collaborations with possible employers and a career profile page which can be activated if you are actively looking for a new job.

LifeOfDataAnalyst

The second video series is called the Life of a Data Analyst. In this section, there are 3 videos about women who work in the field of data analytics. There are 2 Data Scientists, one works for Hire and one works for Facebook. And one data analytics manager who works for Summit Schools. Her job is to create dashboards which show the performance of the different students. In this section, you can already see that Udacity is truly trying to erase the gap between studying a new skill and really getting the chance to apply this knowledge in a new job. By given context and showing the work of data analysts in their day jobs a first link between the for the job required skill set and the syllabus is established.

I haven’t started yet with the next section. It will be about SQL. So after the introductory part, this will be the first section where data analytics skills are taught. I hope to finish this section by this week.

SQL

Overall, the first impression is good. The quality of the content is very high. The videos are very professional and engaging. Since the cost for the program is quite high (499€) I expected a noticeable difference to the other programs which are available on platforms like Coursera or Udemy. And on a first glance, the higher quality is definitely visible. Especially, in the additional support provided for the students (Interviews with professionals, Career Portal). There is also a mentor assigned to you who checks in with you regularly. I will write more about the quality of the video lectures in my next post.

See you then!

First steps in the DAND and prep

I purchased the DAND (Data Analyst Nano Degree) today. The first section will cover data analysis with SQL and Python. There is an official start date for the cohort which is in 12 days on the 13th of February. Till then I will refresh my statists knowledge with the free intro to descriptive statistics course. As I understand it, the course is itself part of the Nanodegree syllabus so I might be able to already make up some time and work ahead.

Additionally, I started the R Programm A-Z course on Udemy. Kirill Eremenko teaches the course. I purchased his Advanced Tableau course on Udemy as well and found him to be a fun and enthusiastic teacher. I have no prior experiences with R so I hope completing this course will equip me with basic knowledge of this tool and enable me to leverage its advantages which makes it so popular in the data science community.

I will also try to solve one python quiz on codewars.com to expand my python coding skills.