The recent uptake of smarter manufacturing through industry 4.0 and digital manufacturing paradigms has brought with it significant improvements in productivity and reliability of the processes that underlies several industries. In a number of areas, the capture of data from embedded sensors within these manufacturing processes is certainly possible, and often used to provide value through data analytics and machine learning. Unfortunately, some processes require modalities such as imaging or vision in order to perform a task, such as condition or process monitoring, and in some cases the required data can be hard to acquire. This is prevalent in anomaly or process failure detection where the occurrence of such events is rare and therefore hard to capture. The application of machine learning models for vision based tasks can be hampered by this as they often require significant amounts of data to accurately train.

Looking more specifically within electrical machine manufacture we can find a number of processes and activities in which it is beneficial to record and track their performance or tolerances over time or identify any anomalies that may occur that give rise to manufacturing error. The first concerns itself with the layup and winding of coils, whether performed via a linear winding or needle winding process, the tracking and characterising of the coil can help us identify when something goes wrong. An example could be the deformation in the expected shape of the coil or events such as the crossover of wires within a single layer. Another area of interest is in the manipulation of cables and wires during the connections and assembly phase of machine manufacture, where tracking and traceability can provide value through operator feedback, or anomaly detection.

Figure 1: Cable and wire manipulation activities in electric machine manufacture, (a) Assembly of cable connections, (b) placement of wires during manual trickle fed coil insertion.

The challenge in building systems for solving some of the activities above, particularly when constrained with limited data has led to a number of strategies within machine learning.

Specific architectures have been engineered to handle ‘few-shot’ and ‘one-shot’ learning to allow for accurate training when presented with novel examples. Alternatively, approaches that focus on the data itself through traditional data augmentation or where possible the use of synthetic data generation have also taken hold as a means to improve performance. It is here that we will look to explore two approaches to try and boost performance on a specific task, the first through the use of a model for creating novel images and the second through a more direct method using CAD tools to auto generated synthetic data that matches our use-case.

One thing that is always interesting about the field of machine learning and AI, in particular within deep neural networks is that new methods or architectures are created which shift the needle on performance or open up new abilities that seemed once impossible. In the past starting off with classification problems we had AlexNet and subsequent residual convolution neural networks, while the recent Transformer models showcase a rapid dominance of natural language processing. Another class of model, ‘Generative Adversarial Networks’ or GANs brought new ways to generate data from a specific distribution, in essence when trained on human faces, we are then able to produce novel ones of extraordinary accuracy. So, starting with this area of generative models, we can look to train a model whose task it is to create new images from a particular distribution, and if possible, under additional conditions, for example if we were to target a specific class of failure.

Figure 2: Some example test coils wound on a 3D printed single tooth bobbin. The example on the right shows one of the failure classes we are interested in, a wire double crossing fault.
Figure 3: GAN generated images through time. A pre-trained GAN model on a ‘faces’ dataset through transfer learning is adapted to generate new images of coil failure. The results improve over training time as you go right.

An example can be seen in figure 2, where we have a number of classes (pass, gap failure, crossover failure) and we wish to eventually train a classification model to correctly identify each during a manufacturing process in real life. As discussed earlier gathering this data is time consuming and often difficult, even in a toy example such as this, but we can exploit the power of GANs to take over this task and create novel images from these three example classes, even with a small dataset starting point. In figure 3 we can see such a model being trained over time to produce the images we seek, in this case a re-purposed model originally trained to generate faces is now able to produce a variety of examples of our target set of coil failures.

Figure 4: Example of how a semantic mask (red=background, green=bobbin, yellow=coil, pink=gap failure) can be used to generate specific images

The next question arises in what way does this improve the performance of the more traditional deep neural network (DNN) classification models. Well, to keep it short, when a small dataset of original images has been ‘boosted’ with additional data, in this case our GAN generated images we find an increase in accuracy from 65% to 87% over the three classes of coil winding failure. This is a significant improvement, and we can also look to go further in the future with models that can be semantically aware, and when fed with a specific ‘mask’ of values we can produce targeted images that contain the faults we want spatially within an image as shown in figure 4.

The first example using GANs can work, but the degree of accuracy in both texture and resolution of the generated images can sometimes mean the output is not as accurate as we would like. The image datasets used to train the model are often constructed of images that are spatially similar, in this case the coils are central with little variation beyond the original faults. In some problems this uniformity will not be present and can make such models harder to train. To overcome this, we can look to produce synthetic images using computer software, for example as has recently been done within the area of autonomous driving to use driving games such as GTA5. Alternatively, we can use CAD software such as Blender to render 3D scenes, and script this image generation process to greatly vary the outcome to aid in training of machine vision models.

Figure 5: Cable assembly tasks image dataset for (a) single cable and (b) three cable assembly. For both examples, a number of images were generated, and then appropriate masks and bounding boxes were added for objects of interest.

Producing and labelling images for training of machine vision models can be a laborious task. An example of a hand labelled dataset for a simple cable assembly task can be seen in figure 5 and includes object detection bounding boxes and individual semantic masks (cable / hand). Automating this brings with it numerous benefits, particularly in cost and time, however we want to ensure that the outcomes match reality as best as possible when training. Adding complex lighting, textures and shadows can help, and varying the camera angle and distance can help generalize the images dataset created to what may exist in the wild. Examples of such images can be seen in figure 6.

Figure 6: Blender generated images of single and three cable assembly examples across a number of backgrounds and textures. The Blender script is able to produce a number of outputs, including masks, normal and depth images.

Figure 7:  Examples results for models trained with a limited ‘natural’ cable image dataset or with a mixture of ‘natural’ and ‘synthetic’ cable images on the far right.

Identifying cables and wires in a natural scene, for example during production and manufacture of an electric motor can be performed using machine vision models trained to semantically mask out where they are within an image. This then allows us to undertake process monitoring of these tasks or activities and track progress, provide feedback or detect anomalies. Like the previous example, training such models using a smaller, limited dataset can be done, however it often will not perform as well given the lack of data. Whilst boosting the available image data can once again increase performance and generalisation ability of the model significantly. We used the blender script to generated an additional set of thousands of images to add to the more smaller dataset created by hand which contain examples in the hundreds. We then trained a specific deep neural network model designed for ‘semantic segmentation’ whose goal is to identify and label the pixels associated with the object we care about, in this case a single or set of cables. A small example can be shown in figure 7 and the effects of boosting the available image data on training. Here the natural images on the left and their ground truth masks, when a model is trained using only the limited ‘natural’ dataset the results are poor (column 3), this dataset contained an abundant source of ‘blue cable’ images and masks to train on so fails to generalize to other cables styles and textures. The final column showcases a model trained on both ‘natural’ and ‘synthetic’ datasets and is able to correctly identify more accurately the cables and segment them out from the scene. These results are just a starting point, and we are now investigating how these models can be transitioned to a more industrial environment and perform in a real-time capability. Please stay tuned for more, or feel free to contact us if you are interested in this research.