Writing Files to Persisted Storage from PySpark


Hello World!

So here is the big ticket item; How in the world do I write files to persisted storage from PySpark?  There are tons of docs on RDD.toTextFile() or things of that nature; but that only matters if you are dealing with RDD’s or .csv files.  What if you have a different set of needs.  In this case; I wanted to visualize a decision decision forest I had built; but there are no good bindings that I could find between PySpark’s MLLIB and Matplot lib (or similiar) to visualize the decision forest.

Continue reading