I have had to put a solid block on gym time for my calendar. I feel in a professional environment, that this warrants an explanation as to why I am making zero adjustments to my gym schedule for “important meetings”. I had a block on my calendar about 6 months ago. Since I removed that block I have actually gained close to 25 lbs. Since the removal of that block, my personally observed performance at work has degraded significantly. Exercise per my experiences appears to be a critical component to the optimal performance of myself, particularly my intelligence.
So I can’t disclose the problem statement, but what I can disclose is that the solution was a “Siamese LSTM”. I was recommended to read Yan LeCun’s paper on “Siamese Networks” in which a shared weight convolutional neural network was used for facial recognition. What I did was modify this concept for a forward reverse sweep recurrent (LSTM) network to match things together.
This article is kinda exciting for me; because once you can internalize how this works, the world really becomes your oyster as far as what you can model with what kind of data. In this example we are going to take some sample images and some random vector features and merge them together. In a more realistic example you may take something like an image as well as some contextual tabular data and want to merge those two data sets together into a single prediction.
This article we will do a light touch on Cosmos DB; specifically the Mongo API from Cosmos DB and using that API from Mongo Engine. I think one of the great things about Cosmos DB’s Mongo API is that I simply swap out my connection strings and guess what; it works! This means not only can I use Mongo Engine, but I can use PyMongo or any other framework for any language that connects to Mongo.
So Jupyter is a great tool for experimental science. Running a jupyter notebook though can be tricky; especially if you want to maintain all of the data that is stored in it. I have seen many strategies; but I have come up with one that I like best of all. It is based on my “Micro Services for Data Science” strategy. By using decoupled data and compute we can literally thrash our Jupyter notebook and all of our data and notebooks still live. So why not put it in a self healing orchestrater and deploy via Kubernetes :D.
This article we will do a bit of a review of the technology stack required to enable this as well as the logistics behind setting it all up and operating against it. The solution uses Azure Container Services, Docker, Kubernetes, Azure Storage, Jupyter, Tensor Flow and Tensorboard. WOW! That is a lot of technology; so we won’t do a deep dive how to but rather some pointers on how to get provisioned and then the high level process on how to use it.
Today we are going to do a little exercise around optimizing an algorithm. I was working with a customer who was using open data (and we know how that can be) to perform an initial set of predictions to show some value while adding in some collection capabilities so they can roll one with more reliable data later.
Alright; so this whole input pipeline thing in pretty much every framework is the most undocumented thing in the universe. So this article is about demystifying it. We can break down the process into a few key steps:
Acquire & Label Data
Process Label Files for Record Conversions
Process Label Files for Training a Specific Network Interface
Train the Specific Network Interface
This is part 1. We will focus on the 3rd item in this list; processing the files into TF Records. Note you can find more associated code in the TensorFlow section of this git repository: https://github.com/drcrook1/CIFAR10