Automating Work with Dictionaries in Python

After some time off to soak in the post-MBA life, take a vacation and spending more time just trying to wrap my head around the basics of Python programming, I’m back to writing and sharing my experiences. I’ve used several sources to supplement my learning of the language, first by wrapping up Python Crash Course (which I wrote a bit about in a previous post), working through Learn Python 3 the Hard Way (not my favorite book, but it might work for other people), and finally settling on a course with the University of Michigan and Coursera, which I’ve really been liking. I think the most impactful strategy for learning a bit more about Python is just consistently practicing, even if it’s only 20 minutes a day.

Dictionaries in Python

Beyond just trying to learn a new language and developing some general programming frameworks and philosophies, I think it’s important to focus on how you can apply what you’re learning to your workflow, your career, or even your whole organization if it’s that impactful. With that in mind, one component of Python that I’ve really enjoyed and can be used to automate processes are dictionaries. This topic has come up before in a previous post, but the concept is so powerful that it bears repeating. What makes dictionaries really powerful is that they allow you to store key-value pairs, and by gathering those pairs (or items, if you want to get technical), you can perform a ton of different functions on them. Dictionaries, when combined with other frameworks like while and for loops, also make life easier if you want to gather information from a flat-file source, like a CSV file or a spreadsheet. I know that I’ve been spoiled by all of the incredible search and analytics applications out there that make my life way easier when I’m trying to find or analyze some data, but when I have a file that’s in a different format, I tend to get stuck.

With that in mind, I wanted to share an incredibly simple program that I created for an assignment. In a nutshell, the program parses through all the different lines of the file, looks for any mention of the word ‘From:’,  and then collects whatever email address comes after that. After it does this, it checks to see who sent the most emails, and how many they sent. It’s simple, fast, and effective. I think that anyone with a little bit more programming experience could write functionality that’d do even more with the file, but you can clearly see how just this one script could be super helpful in automating your workflow and speeding things up a bit.

name = input("Enter file:")
if len(name) < 1 : name = "mbox-short.txt"
handle = open(name)

counts = dict()
for line in handle:
    if not line.startswith("From:"):continue
    line = line.split()
    line = line[1]
    counts[line] = counts.get(line, 0) + 1

bigcount = None
bigword = None
for word, sender in counts.items():
    if bigcount == None or sender > bigcount:
        bigword = word
        bigcount = sender

print (bigword, bigcount)

If you want to dig a little deeper into the code and how it works, feel free to check it out on GitHub.