Apache Spark is a distributed computing environment enabling data analytics tasks to run on a variety of computing platforms and languages. In this talk Jonathan covers how to write a data processing program in Clojure and deploy it to a spark cluster on Kubernetes.
One of the challenges of writing distributed programs is bridging the gap between the development environment and the production cluster. These operational aspects of the developer experience are a focus of this talk, illustrating how Clojure's REPL-based, test driven development approach can be applied to spark programs.
Jonathan, is a Software Engineer at Democracy Works. originally from Lake Worth, Florida received his bachelor’s degree in mathematics from the University of New Orleans. Aside from being an avid Saints fan (Who Dat!), he has specialized in web and data processing applications in Clojure for the past five years. Jonathan works on TurboVote to bring modern web technologies to bear on the problem of voter participation.