Grit & Grid Search A Data Science Blog:
    About     Archive     Feed

DecadeStyles


Overview

For this task, I wanted to build a clothing style classifier by decade. I could not find an existing dataset labeling attire style by decade, so I built one manually from Google images. Once this process was complete, I used transfer learning on a pre-trained deep-learning model to build my classifier.

Data Collection

With any machine learning application, dataset quality ultimately determines the quality of results. There are plenty of well optimized open source ML models, quality data sets are the real limiting agent. For this project I could not find a pre-existing data set. Also, I could not simply pull results from a Google search because there was too much variation across images. I needed relatively standardized images, individuals with a full outfit with minimal background. The more variation the more challenging the learning task and ultimately more images required to increase accuracy. I built a dataset containing 100 images per category. I restricted the images to women’s attire and combined decades because there were too few image results for men, and relatively few per each individual decade. To further maximize my small data pool, I augmented existing data by applying horizontal flips to images. With such limited data, the project can only serve as a proof of concept and framework to build a truly functional application.

Machine Learning

Since I was building an image classifier without tens of thousands of images, I used transfer learning. Transfer learning works off of a pre-trained model and replaces the final classification layers so as to apply it to a new set of images. Selecting the pre-trained model and structure of the final layers are hyperparameters to optimize on. In this case, I selected the VGG16 model with final layers of flatten, dropout (.75), and finally a dense layer (5, activation=softmax). The dropout layer was added to combat model overfitting. My selection process was manual, but advanced methods of Bayesian optimization and genetic algorithms can perform a more robust hyperparameter search. In this case, there was little marginal benefit of these methods because there was a cap on performance given the available data.

Results

The model achieved a categorical accuracy of 73% on the validation set. Accuracy is an appropriate performance metric since the categories were balanced. For more detail see the classification report below:

Image test

As this report shows, the major driver of error in the model is the 1940-1950s category, which has low recall (lots of false negatives).

Image Preprocessing

Most of the data I am collecting is rectangular. To fit the VGG16 mode, I need to reshape it into a 244 by 244 square matrix. Without resizing the image, this will distort the aspect ratio and potentially hurt the model’s learning. Luckily, Tensorflow has an inbuilt function “tf.resize” that can resize an image without aspect distortion. It does so by shrinking the image and then padding the surrounding areas. The padding needs to be smoothed to avoid hard edges that would confound the model. This padding is computed using three possible functions, Nearest Neighbor, BIlinear and Bicubic. I ran a test on these three methods and graphed the results below.

Image test Image test

It appears the model using the bicubic function to pad achieved a higher validation accuracy after five epochs. Given the small size of this dataset, I am not confident this result would be generally true, but it suggests it might be a good idea to use tf.resize when working with non-square or variable images sizes

Future Work

In addition to building a larger dataset, focusing on specific elements of clothing could achieve better results. For example, it would be powerful to train the machine to locate pant legs and identify bell bottoms, marking the style as 70s. With enough data, an ML algorithm would learn this on its own, but directed training achieves a better result with fewer images. Training a model to identify pant legs and collars is not trivial, and with time I can build on work that has been done in this area (see DeepFashion).

I could also further expand my data labels and training to other types of categories, like subcultures (hippie, punk) or levels of formality (semi-formal, business-casual) This model forms a nice base of work to expand on.

Skytron


Overview

Skytron was built to detect anomalous flight paths. However, it has a general use as a pipeline to analyze data as a stream and to train a model as a stream. Skytron learns to predict flight paths using the real-time data from the Open Sky Network. With the premise that anomalous flights will be harder to predict, the model logs anomalies based on prediction error.

Stream Vs Batch

Skytron’s real-time anomaly detection allows immediate detection of problems and thus faster (cheaper) error correction. In the absence of historical data, rather than wait for a sufficiently sized batch you can hit the ground running and train off each piece of data as it comes in. This also saves on storage, because instead of requiring a large stored batch of training data, the model trains as a loop, continuously overwriting a small pool of data.

Training Loop

The training loop makes an API call to the Open Sky Network collecting a flight vector (Longitude, Latitude, Altitude, Velocity, Heading) for every flight in the network; waits 10 seconds to collect a second set of vectors; then trains on the two sets; then predicts off the second set and overwrites the first for new set, evaluates prediction against new set, log errors, overwrites second set with newer set and repeats by training on the two new sets. To help clarify, see the diagram below.

Image test

Threading

While the training loop runs, I want to view current anomalies, extract a dictionary of all anomalies detected and extract the trained model once stable. I also want to display a streaming set of plots for model evaluation. Threading enables these process to run concurrently on the same set of variables. My diagram below may be helpful. Each box is an individual thread. I can extract from the Global variables in two ways…

Image test

Model

Skytron’s general framework is simple and modular, so it can be applied to various applications. For example, I could use a tensor instead of a vector and convolutional neural net instead of a multilayer perceptron to detect anomalies in video streams instead of flight paths. One thing that shouldn’t change is the use of a neural net. Otherwise, we need a function comparing deviation in say latitude compared to deviation velocity. A neural net saves us from this, because its algorithm, backpropagation, will not optimize for one variable over another, so the errors will be on similar scales. We can then reasonably take an average of errors for each prediction and use it to check for anomalies.

Image test

Streaming Model Evaluation Plots

With streaming data, there is no classic train test split because were are not waiting around for batches to accumulate and worse we don’t have data labels. To evaluate the model, I created two live streaming plots and then judged if the results matched expectations. Below, at the early stage, the model is unstable with high error. I am not confident. Later, the model stabilizes and there is less variance in the anomalies detected. I have more confidence. The results still do not make perfect sense, there were not 1,000 anomalous flights, at least in a meaningful sense. This could be an artifact of the anomaly criterion being just one standard deviation. If I used three, the result might be more in line with what I am looking for.

Image test

Future Work

  • Add the ability to run multiple models at once, select for the best performing model.
  • Be able to tune anomaly criteria, in certain cases we might want to catch more or fewer anomalies
  • Test on a different kind of data stream with a different neural net. Ex. Golf swing video