4/2/2019

Don’t jump around…

Nomenclature: Exceptions v. Errors

Error:

  • indicates serious problems that a reasonable application should not try to catch.
  • i.e., you can’t fix it, give up now (think syntax error)

Exception:

  • indicates a condition that we can catch, and possibly recover from
  • Your program runs, but runs into a condition it doesn’t know how to handle, so you throw an exception

python, has InsertNameHereError, R has stop() and warning(). The names may change, but the idea is still the same. Errors can’t be handled, exceptions can be handled.

Clean Exception Handling

  • Start with the Try-Catch-Finally
  • Provide context
  • DON’T RETURN NULL
  • DON’T PASS NULL

Begin with the end in mind

# main.py
try:
    bids = calculate_bid(adgroup_id=an_id)
except ValueError as err:
    print("An invalid value provided, defaulting to 0 bid. ", err)
    bids = 0
except:
    print("Unexpected error:", sys.exc_info()[0])
    raise
  • Which exception block handles the broadest category of things?
  • Which block would you consider exception v. error handling?
  • What does raise do?

Now for the next turtle down…

# bid_calculator.py
def calculate_bid(adgroup_id):

    if type(adgroup_id) != int:
        raise ValueError("adgroup_id must be an integer")

    if adgroup_id < 0:
        raise ValueError("adgroup_id must be greater than 0")

    raise NotImplementedError("Get to work.")

  • What conditions are required to hit:
    • ValueError("adgroup_id must be an integer")
    • ValueError("adgroup_id must be greater than 0")
    • NotImplementedError("This method is not implemented. Get to work.")
  • What is the caller expected to do for each option?

What causes null exceptions?

  • “uninitialized” reference type variables — variables which are initialized with nulls and are assigned their real value afterward. A bug can cause them never to be reassigned.
  • uninitialized reference-type class members.
  • explicit assignment to null or returning null from a function

Why are they bad?

  • It makes tracing (and debugging) the real problem harder
  • It hides the original error





If you don’t believe me, believe Sir Charles Antony Richard Hoare, developer of the QuickSort algorithm. Null References: The Billion Dollar Mistake

An contrived example…

def get_reciprocal(x):
    if x == 0:
        return None
    return 1 / x

def double_a_number(x):
    return 2 * x

def pass_a_number(x):
    return x

def manipulate_number(x):
    reciprocal = get_reciprocal(x)
    another_step = pass_a_number(reciprocal)
    double_reciprocal = double_a_number(another_step)

    return double_reciprocal
  • What would we expect the stack-trace to look like?
  • Where would be a better location for an explicit check and error?
  • Why is it bad?

Boundaries

Boundaries, why do we care?

How can we protect ourselves?

Boundaries are effective construct that can help protect from headaches. Some migraine reducing techniques include:

  • Wrapping, centralizing usage of 3rd party code
  • Interfaces and Dependency Injection Patterns
  • Use of learning tests

Wrapping 3rd Party Code: Part 1

# main.py
def is_most_recent_version(file):
    existing_file_meta_data = dbx.get_meta_data(file.name)
    if existing_file_meta_data is None:
        return True
    if existing_file_meta_data.date > file.date: 
        return False
    return True

if __name__ == "__main__":
    dbx = dropbox.Dropbox(config.access_token)
    file_list = get_files_for_upload()
    for file in file_list:
        if (is_most_recent_version(file)):
            dbx.file_upload(file)
  • What external service is our code dependent on?
  • How could we insulate ourselves from these changes?

Wrapping 3rd Party Code: Part 2

# DocumentWarehouse.py
class DocumentWarehouse:
    def __init__(self, access_token):
        self.dbx = dropbox.Dropbox(access_token)

    def should_be_updated(self, file):
        meta  = self.dbx.get_meta_data(file.name)
        if meta is None: # No file in DBX
            return False
        if meta.date > file.date # existing file is more recent
            return False
        return True

    def upload_file(self, file):
        self.dbx.upload_file(file)

Wrapping 3rd Party Code: Part 3

# main.py (refactored)
import DocumentWarehouse

if __name__ == "__main__":
    dw = DocumentWarehouse(config.access_token)
    file_list - get_files_for_upload()
    for file in file_list:
        if dw.file_should_be_updated(file):
            dw.upload_file(file)
  • What challenges would switching to AWS provide?
  • How would that change if you were asked to maintain access to both AWS and dropbox?

Wrapping 3rd Party Code: Part 4

from abc import abstractmethod

class DocumentWarehouse:
    @abstractmethod
    def should_be_updated(self, file=str):
        pass

    @abstractmethod
    def upload_file(self, file=str):
        pass

from DocumentWarehouse import DocumentWarehouse
from dropbox import dropbox

class DropboxWarehouse(DocumentWarehouse):

    def __init__(self, access_key):
        self.dbx = dropbox.Dropbox(access_key)

    def should_be_updated(self, file=str):
        meta  = self.dbx.get_meta_data(file.name)
        if meta is None: # No file in DBX
            return False
        if meta.date > file.date: # existing file is more recent
            return False
        return True

    def upload_file(self, file=str):
        self.dbx.upload_file(file)

from DocumentWarehouse import DocumentWarehouse
from aws import s3

class S3Warehouse(DocumentWarehouse):

    def __init__(self, access_key):
        self.s3 = s3(access_key)

    def should_be_updated(self, file=str):
        meta  = self.s3.get_details(file.name)
        if meta.file_exists == False:
            return True
        if meta.date > file.date: # existing file is more recent
            return False
        return True

    def upload_file(self, file=str):
        self.s3.save_file(file)

But Why?

Now, we can swap our storage provider with a single line of code…

if __name__ == "__main__":
    dw = DropboxWarehouse(config.access_token)
    file_list - get_files_for_upload()
    for file in file_list:
        if dw.file_should_be_updated(file):
            dw.upload_file(file)
##########################
if __name__ == "__main__":
    dw = S3Warehouse(config.access_token)
    file_list - get_files_for_upload()
    for file in file_list:
        if dw.file_should_be_updated(file):
            dw.upload_file(file)

Think contracts

polymorphism - is the provision of a single interface to entities of different types or the use of a single symbol to represent multiple different types

design-by-contract - prescribes that software designers should define formal, precise and verifiable interface specifications for software components, which extend the ordinary definition of abstract data types


Each language may (or may not) implement some form of this. Some terms you might hear:

  • Abstract Classes (Python, Java, C#)
  • Interfaces (Java, C#, Python, Go)
  • Traits (Scala)
  • S3 + S4 Objects, and RefClases (R)

Comments, Questions, Concerns?