Can we get machine learning to detect corner kicks in football videos?
That was the question we asked ourselves when our friends at Swedish Elite Football, HDR and Vidispine provided us with a catalogue of Swedish football matches and accompanying logging data. In the end the answer was a solid yes, but getting there was not straightforward.
There are already some very good tools for celebrity and object recognition in videos, such as AWS Rekognition, but we wanted to do something different. We wanted to detect custom events in sports videos and for that we needed a general purpose machine learning platform. For this we turned to the AWS SageMaker platform that comes with a choice of algorithms and makes it easy to build, train and deploy models that can scale.
Before we could start on our model we needed data for the training. This was one of the first stumbling blocks: although the videos from Vidispine had been logged by HDR, they were logged for humans to search. They were not the frame accurate logs we needed to train a machine learning model.
First we discarded all existing logs and retagged everything to get frame-accurate log points
Having good input data is key to training any machine learning model and we should perhaps have known that we would have the same problem. To create as much input data as possible images were also mirrored and flipped to create variants to train the model on.
With the data ready we started working on training the model and tweaking the hyperparameters to increase the accuracy of the model using rounds of training and validation. For information on how to get started with a very simple image classification model, see this getting started guide on AWS SageMaker.
Using only around 100 football matches selected for containing a larger number of yellow cards and with frames downsized from 1280x720 to only 399x224 pixels, we could relatively quickly create a model that made accurate classifications in 70% of the cases.
The trained model picked out yellow cards and corner kicks with 70% accuracy
This obviously isn’t good enough for production use and increasing the accuracy by another 20-30% is likely to be more work than getting to the initial 70%. Nevertheless, initial tests are very promising and it is also clear that deploying machine learning is no longer a big project that requires a large investment. It’s within the reach of every organisation.
There are many more things that can be done to improve the accuracy of event detection. Here are a couple of simple next steps that can be done.
More frames – More frames for training the model means more events can be captured with better accuracy. These need to be representative of the content though to not introduce biases in the model.
More events – More sample events for goals, yellow cards, etc. This prototype only used 100 games and there is obviously a limit to how many goals and yellow cards you will get to train the model on in a single game.
Single camera – Using a single wide angle camera would make it easier to ensure that the model picks up on the actual events, for example a yellow card on a field, as opposed to training on a particular camera angle often used when there is a yellow card.
Higher resolution – The source material was 1280x720 and then downscaled to 399x224 to save time and cost when training the model. Even better accuracy could probably have been achieved if the original image resolution is used.
From Vidispine API an extract of video files and corresponding tags was made into an AWS S3 bucket. Logging timestamps were refined to be usable for training the machine learning model.
From these new timestamps, frames for events and non-events were extracted, with a certain interval using a Python script with the OpenCV library. The frames are saved into folders named after the events in the S3 bucket. Here the yellow card frames are also duplicated and augmented by mirroring them.
The frames, previously in .jpg format were resized and converted to .rec format by running a script, "Im2Rec" from Apache. Here 15% of the frames were set aside as a validation set. By now the data was ready for learning with SageMaker. Amazon offers many ready made Jupyter notebooks as examples for their algorithms and one of these were used for the Image classification algorithm.
Interacting with the SageMaker services could also have been done by other methods such as using the AWS SDK for Python, Boto3. This would be more suitable if the learning process would be updated on a regular basis in a deployed system. In the Jupyter notebook instance we set our parameters for the input data and the hyperparameters for the model training and started a training job. When the job was finished we created a model and an endpoint from which we are able to make request to and get inferences from.
With this endpoint created, another script was written that took a football game (video file), extracted frames, sent those to the endpoint and pieced together a table of timestamps for the events of interest based on the answers from the endpoint.
Cover photo courtesy of Hayden Schiff/Flickr