3/19/2019

A recap

Back to last week… Good Names are…

  • Informative
  • Distinct
  • Consistent
  • Searchable

Functions: Friend or Foe?

Question: Functions can be used to do which of the following:


  1. Create descriptive landmarks in the flow of logic
  2. Provide concise summary of inputs for a task
  3. Encapsulate logic into small, testable blocks
  4. Create reusable blocks of application logic


  • It can do all of these things, but only if you try…

Qualities of a good function

What are some things that make a good function?


  • Has a descriptive name
  • Does One Thing
  • Uses Fewer Arguments
  • Provides a single layer of abstraction
  • Is Side Effect Free

Use descriptive names

get_em = function(orders){
  flagged_orders = list()
  for (order in orders){
    if (is_flagged_order(order)){
      flagged_orders = append(flagged_orders, order)
    }
  }
  return(flagged_orders)
}


What are some better names for this function?

extract_flagged_orders = function(orders){
  flagged_orders = list()
  for (order in orders){
    if (is_flagged_order(order)){
      flagged_orders = append(flagged_orders, order)
    }
  }
  return(flagged_orders)
}

Do one thing

Single responsibility principle

The single responsibility principle is a computer programming principle that states that every module, class, or function should have responsibility over a single part of the functionality provided by the software, and that responsibility should be entirely encapsulated by the class.


a.k.a. Curly’s Law
a.k.a. Don’t let your code turn into a rat’s nest

Limit your inputs

Fewer inputs require less brain-power to understand.

# Zero-Inputs (niladic function)
data = get_data()
# One Input (monadic function)
data = get_data(path = "../data/test.csv")
# Two Inputs (dyadic function)
data = get_data(path = "../data/test.csv", 
                is_train = FALSE)
# Three Inputs (triadic function)
data = get_data(path = "..data/test.csv", 
                is_train = FALSE, 
                scale = FALSE)


  • What might each additional parameter tell us about get_data?
  • What might be some alternative names for some of these functions?
    • Why did you decided to rename them? What ‘rule’ are they violating?

Abstract a single layer

def get_transformed_credit_card_data(path_to_data, is_training):
  """Get and transform credit card data into f"""
    data = get_credit_card_data(path_to_data)
    data = handle_duplicate_records(data)
    data = transform_categoical_features(data)
    data = transform_bill_and_pay_amount(data)
    data = generate_total_months_delinqient(data)
    if is_training:
        data = transform_target_variable(data)
    return data


If code == writing, then functions can be framed as structured paragraphs. They have topic sentences, supporting details, and a conclusion.

Don’t create side-effects

Side-effects, a definition

In computer science, an operation, function or expression is said to have a side effect if it modifies some state variable value(s) outside its local environment, that is to say has an observable effect besides returning a value (the main effect) to the invoker of the operation.


Common Side Effects:

  • reliance on a global variable being set
  • updating the value of a global variable
  • saving data to a file, database, or another system

Side-effects, a simple example

def calculate_sum(a, b):
    print(a)
    pickle.dump(b, open("second_argument.p", "wb" ))
    sum_vals = a + b
    return(sum_vals)
    
total = calculate_sum(1, 2)
print(total)
print(sum_vals)
  • Which lines create a side-effect?
  • What should we expect to happen if we were to run this code?

In summary…

Good functions do the following…


  • use descriptive names
  • follow the single responsibility principle
  • use fewer input arguments
  • abstract away logic, one layer at a time
  • don’t create side-effects

Another exercise

Description: Find a partner and exchange a block of code you have written. 50-100 lines. Try and refactor the code based on our discussion while maintaining functionality. For the first 10 minutes try not to ask any questions, and allow the code to “speak” for itself.

Time: 20 Minutes Total

  • 10 Minutes - Refactor partner code, no questions
  • 10 Minutes - Review your results with your partner