This tinyML system helps soothe your dog’s separation anxiety with sounds of your voice
Arduino Team — November 17th, 2021
Due to the ongoing pandemic, Nathaniel Felleke’s family dog, Clairette, had gotten used to having people around her all the time and thus developed separation anxiety when the family would leave the house. But thanks to some clever thinking, Felleke came up with the idea to automatically detect when his dog started to bark and play some sounds of his family speaking to calm her down.
In order to detect when the dog is barking, Felleke collected plenty of audio samples from Google’s Bark Audioset, his own samples, speech commands, and miscellaneous cat and dog noises to distinguish background audio from a bark. After passing them into Edge Impulse’s Studio and creating a keyword spotting model, he downloaded the resulting model, which was then loaded onto a Nano 33 BLE Sense. If a bark is detected using the BLE Sense’s onboard microphone, it toggles a pin high to alert a separate Arduino Nano that a random human speech sound needs to be played by sending a command an attached Music Maker Feather Board.
To see this project in action, you can watch Felleke’s demonstration video below. For the code and resulting tinyML model, his files are available here on GitHub.
Gamifying exercise allows people to become more motivated and participate more often in physical activities while also being distracted by doing something fun at the same time. This inspired a team of students from the Handong Global University in Pohang, South Korea to come up with a system, dubbed “Move!,” that uses a microcontroller to detect various gestures and perform certain actions in mobile games accordingly.
They started by collecting many different gesture samples from a Nano 33 BLE Sense, which is worn by a person on their wrist. This data was then used to train a TensorFlow Lite model that classifies the gesture and sends it via Bluetooth to the host phone running the app. Currently, the team’s mobile app contains three games that a player can choose from.
There is a dinosaur game that operates similarly to the offline dinosaur game in Google Chrome where the user must jump to avoid incoming obstacles. The jumping jack game alternates between different movements that are mirrored by the player in a certain amount of time. And finally, there is a boxing game where the player punches the air when commanded onscreen.
You can read more about Move! — which was one of the five winning projects in the TensorFlow Lite for Microcontrollers Challenge — here and view/download the code for both the BLE Sense and mobile app on GitHub.
Getting in your daily exercise is vital to living a healthy life and having proper form when squatting can go a long way towards achieving that goal without causing joint pain from doing them incorrectly. The Squats Counter is a device worn around the thigh that utilizes machine learning and TensorFlow Lite to automatically track the user’s form and count how many squats have been performed.
Creator Manas Pange started his project by flashing the tf4micro-moition-kit code to a Nano 33 BLE Sense, which features an onboard three-axis accelerometer. From there, he opened the Tiny Motion Trainer Experiment by Google that connects to the Arduino over Bluetooth and captures many successive samples of motion. After gathering enough proper and improper form samples, Manas trained, tested, and deployed the resulting model to the board.
Every time a proper squad is performed, the counter ticks down by one until it reaches a predefined goal.
Monitor the pH levels of a hydroponic plant’s water supply with Arduino and tinyML
Arduino Team — September 2nd, 2021
Many plants are notorious for how picky they are about their environmental conditions. Having the wrong temperature, humidity, soil type, and even elevation can produce devastating effects. But none are perhaps as important and overlooked as water/soil pH, which is a measure of how acidic and/or alkaline the growing medium is. In hydroponics, maintaining optimal growing conditions is how high yields can be ensured without becoming too wasteful. Janet N on Hackster had the idea of harnessing the powers of embedded machine learning to let her know when the water had become unacceptable for her plants.
The device uses an Arduino Nano 33 BLE Sense to continuously monitor the pH of the hydroponics water supply with a simple probe. This data was initially loaded into Edge Impulse’s Studio where it was split into features and then sent to both a Keras classification model and an anomaly detection model for training. After she was satisfied with the performance of both, they were deployed back onto the Arduino.
As the system checks the pH of the water, it aggregates the data and places it into a buffer for classification. If the value is higher than 7, the soil is too basic, and a yellow LED is turned on. If the soil is too acidic (below 4), a red LED is activated. And finally, a green LED lights up when the optimal pH of around 5 has been reached.
You can read more about the process of creating this project here on Hackster.io.
Predicting a lithium-ion battery’s life cycle with tinyML
Arduino Team — August 24th, 2021
Nothing is perhaps more frustrating than suddenly discovering your favorite battery-powered device has shut down due to a lack of charge, and because almost no one finds joy in calculating how long it will live based on current consumption levels/time used, there must be a better way. This problem is what inspired Manivannan S. to create a small project that can predict when a battery is about to go flat using the “magic” of machine learning and a voltage sensor.
The circuit for the project is quite basic, consisting of an Arduino Nano 33 BLE Sense, a 125 ohm rheostat, a voltage sensing module, and finally the rechargeable 18650 Li-ion cell. The discharge current of the battery was set at 1 ampere with the rheostat, at which time the voltage output was sampled continuously for 30 minutes at a rate of one reading per minute. This data was imported into Edge Impulse’s Studio and used to train a regression model that can predict the estimated voltage and therefore also the capacity remaining.
Once tested, the model proved very successful in determining the battery’s voltage after an hour of use, after which Manivannan went onto explain how this data could be further extrapolated to estimate the complete life cycle. By incorporating machine learning into smart battery technology, power management can become more approachable and increasingly efficient.
Within an industrial setting, being able to determine if and/or when a machine malfunctions is vital to maintaining safety and uptime. This challenge is what prompted a maker who goes by javagoza on element14 to enter into their Design for a Cause 2021 contest with his device, which he calls the VenTTracker.
At its heart, the VenTTracker uses an Arduino Nano 33 IoT mounted onto a small protoboard that is attached to a sliding surface, such as a window or vent. Under normal operation, the device does nothing, but once an anomaly is detected, including an obstacle or breakdown, the onboard OLED screen shows an alert message.
Because this project uses machine learning to differentiate between normal operation and an anomaly, javagoza collected a large dataset of motions from an accelerometer and then uploaded it to Edge Impulse’s Studio. From there, he added a time series processing block and flattening block to generate the features that fed into the Keras neural network for training and validation. Once deployed back to the Arduino, the model performed very well at telling the difference between the window opening normally and something being in the way.
He even included Arduino Cloud functionality to display if the window is open and any anomalies that have been detected so far. There was an additional module constructed for environmental monitoring, which consists of a Nano 33 IoT and a BME680 sensor that sends CO2, temperature, and humidity data to another Cloud dashboard to let users know when to open the window.
Shortly after the COVID-19 pandemic began, Samuel Alexander and his housemates purchased a ping pong set and began to play — a lot. Becoming quite good at the game, Alexander realized that his style was not consistent with how more professional table tennis players hit the ball, as he simply taught himself without a coach. Because of this, he was inspired to create a smart paddle that uses an integrated IMU to intelligently classify which moves he makes and correct his form to improve it over time.
Alexander went with the Nano 33 BLE Sense board due to its ease of use and tight integration with TensorFlow Lite Micro, not to mention the onboard 6DOF accelerometer/ gyroscope module. He began by designing a small cap that fits over the bottom of a paddle’s handle and contains all the electronics and battery circuitry. With the hardware completed, it was time to get started with the software.
The Tiny Motion Trainer by Google Creative Lab was employed to quickly capture data from the Arduino over Bluetooth and store the samples for each motion. Once all of the movements had been gathered, Alexander trained the model for around 90 epochs and was able to achieve an impressive level of accuracy. His build log and demonstration video below shows how this smart paddle can be used to intelligently classify and coach a novice player into using better form while playing, and it will be fun to see just how good the model can get.
Snoring is an annoying problem that affects nearly half of all adults and can cause others to lose sleep. Additionally, the ailment can be a symptom of a more serious underlying condition, so being able to know exactly when it occurs could be lifesaving. To help solve this issue, Naveen built the Snoring Guardian — a device that can automatically detect when someone is snoring and begin to vibrate as an alert.
The Snoring Guardian features a Nano 33 BLE Sense to capture sound from its onboard microphone and determine if it constitutes a snore. He employed Edge Impulse along with the AudioSet dataset that contains hundreds or even thousands of labeled sound samples that can be used to train a TensorFlow Lite Micro model. The dataset within Edge Impulse was split between snoring and noise, with the latter label for filtering out external noise that is not a snore. With the spectrograms created and the model trained, Naveen deployed it to his Nano 33 BLE Sense as an Arduino library.
The program for the Snoring Guardian gathers new microphone data and passes it to the model for inference. If the resulting label is “snoring,” a small vibration motor is activated that can alert the wearer. As an added bonus, the entire thing runs off rechargeable LiPo batteries, making this an ultra-portable device. You can see a real-time demonstration below as well as read more about this project on Hackster.io.
Whether commuting to work or simply having fun around town, riding a bike can be a great way to get exercise while also enjoying the scenery. However, riding around on the road presents a danger as cars or other cyclists / pedestrians might not be paying attention while you try to turn. That is why Alvaro Gonzalez-Vila created VoiceTurn, a set of turn signals that are activated by simply saying which direction you are heading towards.
VoiceTurn works by using the Arduino Nano 33 BLE Sense at its heart to both listen for the “left” or “right” keywords and then activate the appropriate turn signal. Gonzalez-Vila took advantage of edge machine learning through the Edge Impulse Studio. First, he collected audio samples consisting of the words “left,” “right,” and then random noise via the Google Speech Commands Dataset. Next, he sent them through an MFCC block that does some processing to extract human speech features. And finally, the Keras neural network was trained on these features to produce a model.
With the model deployed to the Nano 33 BLE Sense, Gonzalez-Vila developed a simple program that continually reads in a waveform from the microphone and passes it to the model for inference. Based on the result, a string of NeoPixels on either the left or right will begin to light up for a predetermined number of cycles. As seen in his video below, the VoiceTurn works really well at detecting keywords and is easy to see from a distance. You can read more about how this project was built in its write-up here.
‘Droop, There It Is!’ is a smart irrigation system that uses ML to visually diagnose drought stress
Arduino Team — July 13th, 2021
Throughout the day as the sun evaporates the water from a plant’s leaves via a process called transpiration, observers will notice that they tend to get a little bit droopy. Also known as drought stress, this response to a loss of water results in low turgidity (internal water pressure) and can impact the ability of the plant to grow correctly. Traditional irrigation monitors use soil moisture sensors to determine the soil’s water levels, but Terry Rodriquez and Salma Mayorquin wanted to create something a bit more unique: a visual droop detection system.
Their device, which they affectionately call the “Droop, There It Is”, features a Nano 33 BLE Sense and ArduCam camera module to take pictures of the plant and uses an image classifier to determine if the plant is drooping or not. They started by taking a pre-trained MobileNetV2 base model and fine-tuned it with a set of 6,000 images. After optimizing the result with grayscale reductions and knowledge distillation techniques, the team deployed it onto their Nano 33 BLE Sense for inferencing.
Although the device only signals when the plant needs water over Bluetooth Low Energy for now, it can be augmented in the future to directly control pumps and valves if needed. This project is a great demonstration of how machine learning can be harnessed to reduce overwatering and increase efficiency. You can read more about it here or check out their video below!
A dangerous fall can happen to anyone, but they are particularly dangerous among the elderly as that demographic might not have effective ways to get help when needed. Rather than having to purchase an expensive device that costs up to $100 per month to use, Nathaniel F. on Hackster wanted to build a project that harnessed the power of embedded machine learning to detect falls and send an alert. His solution involves the Arduino Nano 33 BLE Sense board, which not only has an integrated accelerometer but also contains Bluetooth Low Energy capabilities that lets the processor communicate with the accompanying mobile app.
Nathaniel trained his ML model on the SmartFall dataset, which allows the device to respond to a wide variety of falls and ignore non-harmful movements. Once training was completed, he was able to achieve an accuracy of 95%. The Nano 33 BLE Sense samples accelerometer data at 31.25Hz to match the dataset’s frequency, and it makes a prediction every two seconds. If a fall is detected or the built-in emergency button was pressed, the user has 30 seconds to deactivate the alarm, otherwise it sends a BLE message to the phone which in turn sends an SMS message to an emergency contact containing the current location.
Even though this DIY fall detector works well already, Nathaniel plans on making a custom PCB and extending the battery life for longer use time between charging. You can read more about his design here, and you can view his demonstration video below.
One major drawback to the largescale farming of animals for meat consumption is the tendency for diseases to spread rapidly and decimate the population. This widespread issue is what drove Clinton Oduor to build a tinyML-powered device that can perform precision livestock farming tasks intelligently. His project works by continuously monitoring the noise coming from pigs and makes a determination about what they mean, such as if a cough is indicative of a respiratory illness or a squeal denoting stress.
Oduor gathered the sound samples for his dataset by downloading around seven minutes of coughing pig sounds and then split them up into one-second-long files. After using a trick called data curation that allows for more samples to be generated from previous ones, he trained a neural network with Edge Impulse and was able to achieve a 99.7% accuracy. As for deployment, the model runs on an Arduino Nano 33 BLE Sense, which has an onboard microphone for picking up ambient sounds. When coughing is detected, it sends some data via I2C to a MKR FOX 1200 board that broadcasts a message over the Sigfox network.
The developer plans on collecting more data from various pig species and at different stages of growth to further enhance the diversity of the model and increase its accuracy. As a more advanced challenge, he would also like to have his device recognize specific cough patterns for certain types of respiratory diseases. You can read more about his project here.
There are thousands of bird species in the world, with numerous different and unique ones living in various areas. Developers Errol Joshua, Mahesh Nayak, Ajith K J, and Supriya Nickam wanted to build a simple device that would allow them to automatically recognize the feathered friends near them and do some simple tracking, such as knowing how often a particular bird makes its call. Their project uses a Nano 33 BLE Sense, along with its onboard microphone, to pick up sounds and make inferences about what they are in real-time.
The team decided to train their tinyML model to detect four different species that are native to their area and then downloaded a sample dataset containing many sound files. After a bit of editing, they transferred the audio clips into Edge Impulse’s Studio and subsequently labeled each one. The Impulse consisted of a Mel-filter-bank energy (MFE) block that took the sounds and produced a spectrogram for each one. With these processed features, the model was able to achieve an impressive 95.9% accuracy.
As seen in their demonstration video below, the current bird sound being played was picked up and identified accurately by the Nano 33 BLE Sense. And with some minor changes to how the model was trained, the accuracy can be increased even more. You can read about this project on its page.
This pocket-sized uses tinyML to analyze a COVID-19 patient’s health conditions
Arduino Team — June 21st, 2021
In light of the ongoing COVID-19 pandemic, being able to quickly determine a person’s current health status is very important. This is why Manivannan S wanted to build his very own COVID Patient Health Assessment Device that could take several data points from various vitals and make a prediction about what they indicate. The pocket-sized system features a Nano 33 BLE Sense at its core, along with a Maxim Integrated MAX30102 pulse oximeter/heart-rate sensor to measure oxygen saturation and pulse.
From this incoming health data, Manivannan developed a simple algorithm that generates a “Health Index” score by plugging in factors such as SpO2, respiration rate, heart rate, and temperature into a linear regression. Once some sample data was created, he sent it to Edge Impulse and trained a model that uses a series of health indices to come up with a plausible patient condition.
After deploying the model to the Nano 33 BLE Sense, Manivannan put some test data on it to simulate a patient’s vital signs and see the resulting inferences. As expected, his model successfully identified each one and displayed it on an OLED screen. To read more about how this device works, plus a few potential upgrades, you can visit its write-up on Hackster.io here or check out the accompanying video below.
For the hearing impaired, communicating with others can be a real challenge, and this is especially problematic when it is a deaf parent trying to understand what their child needs, as the child is too young to learn sign language. Mithun Das was able to come up with a novel solution that combines a mobile app, machine learning, and a Neosensory Buzz wristband to enable this channel of communication.
Called the “Baby Connect”, Das’ system involves using a mobile app with a series of images that correspond to various feelings, actions, or wants/needs of a child. When something is requested, such as wanting to take a nap, the action is mapped to a sort of Morse code language that buzzes the four haptic motors on the Neosensory Buzz in a certain pattern. For instance, dislike is mapped to a dot, dash, and then dot, while yes is a single dot.
The Baby Connect also has some more advanced features including baby activity monitoring and environmental logging. Because deaf parents are unable to hear the difference between certain cries, the Nano 33 BLE Sense that controls the device runs a model trained with Edge Impulse that can distinguish between cries for pain, hunger, and general malaise. Finally, there’s the ability to use the app as a speech-to-text converter that takes words and changes them automatically into mapped vibrations.
Epilepsy can be a very terrifying and dangerous condition, as sufferers often experience seizures that can result in a lack of motor control and even consciousness, which is why one team of developers wanted to do something about it. They came up with a simple yet clever way to detect when someone is having a convulsive seizure and then send out an alert to a trusted person. The aptly named Epilet (Epilepsy + bracelet) system uses a Nano 33 BLE Sense along with its onboard accelerometer to continually read data and infer if the sensor is picking up unusual activity.
The Epilet was configured to leverage machine learning for seizure detection, trained using data captured from its accelerometer within Edge Impulse’s Studio. The team collected 30 samples each of both normal, everyday activities and seizures. From this, they trained a model that is able to correctly classify a seizure 97.8% of the time.
In addition to the physical device itself is an accompanying mobile app that handles the communication. When it receives seizure activity that lasts for at least 10 seconds from the Nano 33 BLE Sense, the app sends an SMS message to a contact of the user’s choice. The Epilet has a lot of potential to help people suffering from epilepsy, and it will be exciting to see what other features get added to it in the future.
Bike locks have not changed that much in the last few decades, even though our devices have gotten far smarter, so they seem in need of an update. Designed with this in mind, the TapLock is able to intelligently lock and unlock from either Bluetooth or taps on the enclosure. It uses a Nano 33 BLE Sense to detect tap patterns via an onboard accelerometer as well as BLE capabilities to communicate with the owner’s phone.
Because taps are not necessarily directional, the TapLock’s creators took an average of each accelerometer axis and charted the time between the peaks. After collecting a large sample of data, they used Edge Impulse to process the data and then train a model with an accuracy of 96.4%. This allows the owner to have some wiggle room when trying to lock or unlock the bike.
The team also developed a mobile app, which provides another way for the bike’s owner to lock or unlock the bike, along with some extra features too. After connecting to the TapLock, the app loads the previous state of the lock device and updates itself if needed. If the user wants to lock the bike, the app will send a “lock” command to the TapLock and store the current location to show on a map. This way the owner won’t forget where their bike is when trying to retrieve it.
Currently, the TapLock doesn’t have a physical locking mechanism, but the team states that one can be added and then electronically activated from one of the Nano 33 BLE Sense’s GPIO pins. You can see a demo of this project in the video below and read about it on Hackster.io.
The Arduino Uno is famous for its ease of use and compact size, yet its microcontroller, the ATmega328P, is quite small. The 328P contains a mere 32KB of flash storage for programs and 2KB of RAM, which has traditionally made it unsuitable for machine learning applications. However, a team at the Institute of Physics and Technology at Petrozavodsk State University was able to cram an algorithm that can recognize the handwritten digits within the MNIST dataset. Without getting too complicated, the Uno takes in an array of pixels that range in value from 0 to 255, or one byte. The entire 28 by 28 grid is then flattened to a single array of 784 elements that is passed into a reservoir that holds the weights for each pixel. As the model continues to get trained, these weights are gradually adjusted until the output matches the correct digit.
Input data is read from the serial port and stored within an array, where it is then used within the LogNNet library to compute the layer values. Once everything has been passed through the neural network the resulting digit is printed to the serial monitor. Overall, the neural network’s variables in RAM are quite space-efficient and account for just over a kilobyte.
As seen below, the researchers were able to achieve an accuracy of 82% with an inferencing time of around seven seconds, which is quite impressive for such a small chip. To read more about how the LogNNet reservoir neural network operates, be sure to check out the team’s paper.
In this deep dive article, performance optimization specialist Larry Bank (a.k.a The Performance Whisperer) takes a look at the work he did for the Arduino team on the latest version of the Arduino_OV767x library.
Arduino recently announced an update to the Arduino_OV767x camera library that makes it possible to run machine vision using TensorFlow Lite Micro on your Arduino Nano 33 BLE board.
If you just want to try this and run machine learning on Arduino, you can skip to the project tutorial.
The rest of this article is going to look at some of the lower level optimization work that made this all possible. There are higher performance industrial-targeted options like the Arduino Portenta available for machine vision, but the Arduino Nano 33 BLE has sufficient performance with TensorFlow Lite Micro support ready in the Arduino IDE. Combined with an OV767x module makes a low-cost machine vision solution for lower frame-rate applications like the person detection example in TensorFlow Lite Micro.
Need for speed
Recent optimizations done by Google and Arm to the CMSIS-NN library also improved the TensorFlow Lite Micro inference speed by over 16x, and as a consequence bringing down inference time from 19 seconds to just 1.2 seconds on the Arduino Nano 33 BLE boards. By selecting the person_detection example in the Arduino_TensorFlowLite library, you are automatically including CMSIS-NN underneath and benefitting from these optimizations. The only difference you should see is that it runs a lot faster!
The CMSIS-NN library provides optimized neural network kernel implementations for all Arm’s Cortex-M processors, ranging from Cortex-M0 to Cortex-M55. The library utilizes the processor’s capabilities, such as DSP and M-Profile Vector (MVE) extensions, to enable the best possible performance.
The Arduino Nano 33 BLE board is powered by Arm Cortex-M4, which supports DSP extensions. That will enable the optimized kernels to perform multiple operations in one cycle using SIMD (Single Instruction Multiple Data) instructions. Another optimization technique used by the CMSIS-NN library is loop unrolling. These techniques combined will give us the following example where the SIMD instruction, SMLAD (Signed Multiply with Addition), is used together with loop unrolling to perform a matrix multiplication y=a*b, where
a=[1,2]
and
b=[3,5 4,6]
a, b are 8-bit values and y is a 32-bit value. With regular C, the code would look something like this:
However, using loop unrolling and SIMD instructions, the loop will end up looking like this:
a_operand = a[0] | a[1] << 16 // put a[0], a[1] into one variable
for(i=0; i<2; ++i)
b_operand = b[0][i] | b[1][i] << 16 // vice versa for b
y[i] = __SMLAD(a_operand, b_operand, y[i])
This code will save cycles due to
fewer for-loop checks
__SMLAD performs two multiply and accumulate in one cycle
This is a simplified example of how two of the CMSIS-NN optimization techniques are used.
Figure 1: Performance with initial versions of libraries
Figure 2: Performance with CMSIS-NN optimizations
This improvement means the image acquisition and preprocessing stages now have a proportionally bigger impact on machine vision performance. So in Arduino our objective was to improve the overall performance of machine vision inferencing on Arduino Nano BLE sense by optimizing the Arduino_OV767X library while maintaining the same library API, usability and stability.
Figure 3: Performance with CMSIS-NN and camera library optimizations
For this, we enlisted the help of Larry Bank who specializes in embedded software optimization. Larry’s work got the camera image read down from 1500ms to just 393ms for a QCIF (176×144 pixel) image. This was a great improvement!
Let’s have a look at how Larry approached the camera library optimization and how some of these techniques can apply to your Arduino code in general.
Performance optimizing Arduino code
It’s rarely practical or necessary to optimize every line of code you write. In fact there are very good reasons to prioritize readable, maintainable code. Being readable and optimized don’t necessarily have to be mutually exclusive. However, embedded systems have constrained resources, and when applications demand more performance, some trade-offs might have to be made. Sometimes it is necessary to restructure algorithms, pay attention to compiler behavior, or even analyze timing of machine code instructions in order to squeeze the most out of a microcontroller. In some cases this can make the code less readable — but the beauty of an Arduino library is that this can be abstracted (hidden) from user sketch code beneath the cleaner library function APIs.
What does “Camera.readFrame” do?
We’ve connected a camera to the Arduino. The Arduino_OV767X library sets up the camera and lets us transfer the raw image data from the camera into the Arduino Nano BLE memory. The smallest resolution setting, QCIF, is 176 x 144 pixels. Each pixel is encoded in 2 bytes. We therefore need to transfer at least 50688 bytes (176 x 144 x 2 ) every time we capture an image with Camera.readFrame. Because the function is performing a byte read operation over 50 thousand times per frame, the way it’s implemented has a big impact on performance. So let’s have a look at how we can most efficiently connect the camera to the Arduino and read a byte of data from it.
Philosophy
I tend to see the world of code through the “lens” of optimization. I’m not advocating for everyone to share my obsession with optimization. However, when it does become necessary, it’s helpful to understand details of the target hardware and CPU. What I often encounter with my clients is that their code implements their algorithm neatly and is very readable, but it’s not necessarily ‘performance friendly’ to the target machine. I assume this is because most people see code from a top-down approach: they think in terms of the abstract math and how to process the data. My history in working with very humble machines and later turning that into a career has flipped that narrative on its head. I see software from the bottom up: I think about how the memory, I/O and CPU registers interact to move and process the data used by the algorithm. It’s often possible to make dramatic improvements to the code execution speed without losing any of its readability. When your readable/maintainable solution still isn’t fast enough, the next phase is what I call ‘uglification.’ This involves writing code that takes advantage of specific features of the CPU and is nearly always more difficult to follow (at least at first glance!).
Optimization methodology
Optimization is an iterative process. I usually work in this order:
Test assumptions in the algorithm (sometimes requires tracing the data)
Make innocuous changes in the logic to better suit the CPU (e.g. change modulus to logical AND)
Flatten the hierarchy or simplify overly nested classes/structures
Test any slow/fast paths (aka statistics of the data — e.g. is 99% of the incoming data 0?)
Go back to the author(s) and challenge their decisions on data precision / storage
Make the code more suitable for the target architecture (e.g. 32 vs 64-bit CPU registers)
If necessary (and permitted by the client) use intrinsics or other CPU-specific features
Go back and test every assumption again
If you would like to investigate this topic further, I’ve written a more detailed presentation on Writing Performant C++ code.
Depending on the size of the project, sometimes it’s hard to know where to start if there are too many moving parts. If a profiler is available, it can help narrow the search for the “hot spots” or functions which are taking the majority of the time to do their work. If no profiler is available, then I’ll usually use a time function like micros() to read the current tick counter to measure execution speed in different parts of the code. Here is an example of measuring absolute execution time on Arduino:
long lTime;
lTime = micros();
<do the work>
iTime = micros() - lTime;
Serial.printf(“Time to execute xxx = %d microseconds\n”, (int)lTime);
I’ve also used a profiler for my optimization work with OpenMV. I modified the embedded C code to run as a MacOS command line app to make use of the excellent XCode Instruments profiler. When doing that, it’s important to understand how differently code executes on a PC versus embedded — this is mostly due to the speed of the CPU compared to the speed of memory.
Pins, GPIO and PORTs
One of the most powerful features of the Arduino platform is that it presents a consistent API to the programmer for accessing hardware and software features that, in reality, can vary greatly across different target architectures. For example, the features found in common on most embedded devices like GPIO pins, I2C, SPI, FLASH, EEPROM, RAM, etc. have many diverse implementations and require very different code to initialize and access them.
Let’s look at the first in our list, GPIO (General Purpose Input/Output pins). On the original Arduino Uno (AVR MCU), the GPIO lines are arranged in groups of 8 bits per “PORT” (it’s an 8-bit CPU after all) and each port has a data direction register (determines if it’s configured for input or output), a read register and a write register. The newer Arduino boards are all built around various Arm Cortex-M microcontrollers. These MCUs have GPIO pins arranged into groups of 32-bits per “PORT” (hmm – it’s a 32-bit CPU, I wonder if that’s the reason). They have a similar set of control mechanisms, but add a twist — they include registers to SET or CLR specific bits without disturbing the other bits of the port (e.g. port->CLR = 1; will clear GPIO bit 0 of that port). From the programmer’s view, Arduino presents a consistent set of functions to access these pins on these diverse platforms (clickable links below to the function definitions on Arduino.cc):
For me, this is the most powerful idea of Arduino. I can build and deploy my code to an AVR, a Cortex-M, ESP8266 or an ESP32 and not have to change a single line of code nor maintain multiple build scripts. In fact, in my daily work (both hobby and professional), I’m constantly testing my code on those 4 platforms. For example, my LCD/OLED display library (OneBitDisplay) can control various monochrome LCD and OLED displays and the same code runs on all Arduino boards and can even be built on Linux.
One downside to having these ‘wrapper’ functions hide the details of the underlying implementation is that performance can suffer. For most projects it’s not an issue, but when you need to get every ounce of speed out of your code, it can make a huge difference.
Camera data capture
One of the biggest challenges of this project was that the original OV7670 library was only able to run at less than 1 frame per second (FPS) when talking to the Nano 33. The reason for the low data rate is that the Nano 33 doesn’t expose any hardware which can directly capture the parallel image data, so it must be done ‘manually’ by testing the sync signals and reading the data bits through GPIO pins (e.g. digitalRead) using software loops. The Arduino pin functions (digitalRead, digitalWrite) actually contain a lot of code which checks that the pin number is valid, uses a lookup table to convert the pin number to the I/O port address and bit value and may even disable interrupts before reading or changing the pin state. If we were to use the digitalRead function for an application like this, it would limit the data capture rate to be too slow to operate the camera. You’ll see this further down when we examine the actual code used to capture the data.
First, a quick review of the OV7670 camera module: According to its datasheet, it’s capable of capturing a VGA (640×480) color image at up to 30 FPS. The kit used for this project has the camera mounted to a small PCB and presents an 8-bit parallel data bus and various sync signals.
It requires an external “master clock” (MCLK in the photo) to drive its internal state machine which is used to generate all of the other timing signals. The Nano 33 can provide this external clock source by using its I2S clock. The OV767X library sets this master clock to 16Mhz (the camera can handle up to 48Mhz) and then there is a set of configuration registers to divide this value to arrive at the desired frame rate. Only a few possible frame rates are available (1, 5, 10, 15, 20, and 30 FPS).
Above is one of the timing diagrams from the OV7670 datasheet. This particular drawing shows the timing of the data for each byte received along each image row. The HREF signal is used to signal the start and end of a row and then each byte is clocked in with the PCLK signal. The original library code read each bit (D0-D7) in a loop and combined them together to form each data byte. The image data comes quickly, so we have very little time to read each byte. Assembling them one bit at a time is not very efficient. You might be thinking that it’s not that hard of a problem to solve on the Nano 33. After all, it has 22 GPIO pins and the Cortex-M inside it has 32-bit wide GPIO ports, so just hook up the data bits sequentially and you’ll be able to read the 8 data bits in one shot, then Mission Accomplished™. If only things were that easy. The Nano 33 does have plenty of GPIO pins, but there isn’t a continuous sequence of 8 bits available using any of the pins! I’m guessing that the original code did it one bit at a time because it didn’t look like there was a better alternative. In the pinout diagram below, please notice the P0.xx and P1.xx numbers. These are the Cortex-M GPIO port 0 and 1-bit numbers (other Cortex-M processors would label them PA and PB).
I wasn’t going to let this little bump in the road stop me from making use of bit parallelism. If you look carefully at the bit positions, the best continuous run we can get is 6 bits in a row with P1.10 through P1.15. It’s not possible to read the 8 data bits in one shot…or is it? If we connect D0/D1 of the camera to P1.02/P1.03 and D2-D7 to P1.10-P1.15, we can do a single 32-bit read from port P1 and get all 8 bits in one shot. The bits are in order, but will have a gap between D1 and D2 (P1.04 to P1.09). Luckily the Arm CPU has what’s called a barrel shifter. It also has a smart instruction set which allows data to be shifted ‘for free’ at the same time the instruction is doing something else. Let’s take a look at how and why I changed the code:
Original:
uint8_t in = 0;
for (int k = 0; k < 8; k++) {
bitWrite(in, k, (*_dataPorts[k] & _dataMasks[k]) != 0);
}
Optimized:
uint32_t in = port->IN; // read all bits in parallel
in >>= 2; // place bits 0 and 1 at the "bottom" of the
register
in &= 0x3f03; // isolate the 8 bits we care about
in |= (in >> 6); // combine the upper 6 and lower 2 bits
Code analysis
If you’re not interested in the nitty gritty details of the code changes I made, you can skip this section and go right to the results below.First, let’s look at what the original code did. When I first looked at it, I didn’t recognize bitWrite; apparently it’s not a well known Arduino bit manipulation macro; it’s defined as:
This macro was written with the intention of being used on GPIO ports (the variable value) where the logical state of bitvalue would be turned into a single write of a 0 or 1 to the appropriate bit. It makes less sense to be used on a regular variable because it inserts a branch to switch between the two possible outcomes. For the task at hand, it’s not necessary to use bitClear() on the in variable since it’s already initialized to 0 before the start of each byte loop. A better choice would be:
if (*_dataPorts[k] & _dataMasks[k]) in |= (1 << k);
The arrays _dataPorts[] and _dataMasks[] contain the memory mapped GPIO port addresses and bit masks to directly access the GPIO pins (bypassing digitalRead). So here’s a play-by-play of what the original code was doing:
Set in to 0
Set k to 0
Read the address of the GPIO port from _dataPorts[] at index k
Read the bit mask of the GPIO port from _dataMasks[] at index k
Read 32-bit data from the GPIO port address
Logical AND the data with the mask
Shift 1 left by k bits to prepare for bitClear and bitSet
Compare the result of the AND to zero
Branch to bitSet() code if true or use bitClear() if false
bitClear or bitSet depending on the result
Increment loop variable k
Compare k to the constant value 8
Branch if less back to step 3
Repeat steps 3 through 13, 8 times
Store the byte in the data array (not shown above)
The new code does the following:
Read the 32-bit data from the GPIO port address
Shift it right by 2 bits
Logical AND (mask) the 8 bits we’re interested in
Shift and OR the results to form 8 continuous bits
Store the byte in the data array (not shown above)
Each of the steps listed above basically translates into a single Arm instruction. If we assume that each instruction takes roughly the same amount of time to execute (mostly true on Cortex-M), then old vs. new is 91 versus 5 instructions to capture each byte of camera data, an 18x improvement! If we’re capturing a QVGA frame (320x240x2 = 153600 bytes), that becomes manymillionsof extra instructions.
Results
The optimized byte capture code translates into 5 Arm instructions and allows the capture loop to now handle a setting of 5 FPS instead of 1 FPS. The FPS numbers don’t seem to be exact, but the original capture time (QVGA @ 1 FPS) was 1.5 seconds while the new capture time when set to 5 FPS is 0.393 seconds. I tested 10 FPS, but readFrame() doesn’t read the data correctly at that speed. I don’t have an oscilloscope handy to probe the signals to see why it’s failing. The code may be fast enough now (I think it is), but the sync signals may become too unstable at that speed. I’ll leave this as an exercise to the readers who have the equipment to see what happens to the signals at 10 FPS.
For the work I did on the OV767X library, I created a test fixture to make sure that the camera data was being received correctly. For ML/data processing applications, it’s not necessary to do this. The built-in camera test pattern can be used to confirm the integrity of the data by using a CRC32.
My tinned protoboard test fixture with 320×240 LCD
Note: The frames come one immediately after another. If you capture a frame and then do some processing and then try to capture another frame, you may hit the middle of the next frame when you call readFrame(). The code will then wait until the next VSync signal, so that frame’s capture time could be as much as 2x as long as a single frame time.
More tips
I enjoy testing the limits of embedded hardware, especially when it involves bits, bytes and pixels. I’ve written a few blog posts that explore the topics of speed and power usage if you’re interested in learning more about it.
Conclusion
The embedded microcontrollers available today are capable of handling jobs that were unimaginable just a few years ago.
Optimized ML solutions from Google and Edge Impulse are capable of running on low-cost, battery-powered boards (vision, vibration, audio, whatever sensor you want to monitor).
Python and Arduino programming environments can test your project idea with little effort.
Software can be written an infinite number of ways to accomplish the same task, but one constant remains: TANSTATFC (there ain’t no such thing as the fastest code).
Never assume the performance you’re seeing is what you’re stuck with. Think of existing libraries and generic APIs available through open source libraries and environments as a starting point.
Knowing a bit of info about the target platform can be helpful, but it’s not necessary to read the MCU datasheet. In the code above, the larger concept of Arm Cortex-M 32-bit GPIO ports was sufficient to accomplish the task without knowing the specifics of the nRF52’s I/O hardware.
Don’t be afraid to dig a little deeper and test every assumption.
If you encounter difficulties, the community is large and there are a ton of resources out there. Asking for help is a sign of strength, not weakness.
If you’re interested in embedded machine learning (TinyML) on the Arduino Nano 33 BLE Sense, you’ll have found a ton of on-board sensors — digital microphone, accelerometer, gyro, magnetometer, light, proximity, temperature, humidity and color — but realized that for vision you need to attach an external camera.
In this article, we will show you how to get image data from a low-cost VGA camera module. We’ll be using the Arduino_OVD767x library to make the software side of things simpler.
Hardware setup
To get started, you will need:
You can of course get a board without headers and solder instead, if that’s your preference.
The one downside to this setup is that (in module form) there are a lot of jumpers to connect. It’s not hard but you need to take care to connect the right cables at either end. You can use tape to secure the wires once things are done, lest one comes loose.
You need to connect the wires as follows:
Software setup
First, install the Arduino IDE or register for Arduino Create tools. Once you install and open your environment, the camera library is available in the library manager.
Tools > Manage Libraries and search for the OV767 library
Press the Install button
Now, we will use the example sketch to test the cables are connected correctly:
Examples > Arduino_OV767X > CameraCaptureRawBytes
Uncomment (remove the //) from line 48 to display a test pattern
Camera.testPattern();
Compile and upload to your board
Your Arduino is now outputting raw image binary over serial. To view this as an image we’ve included a special application to view the image output from the camera using Processing.
Processing is a simple programming environment that was created by graduate students at MIT Media Lab to make it easier to develop visually oriented applications with an emphasis on animation and providing users with instant feedback through interaction.
Edit line 31-37 to match the machine and serial port your Arduino is connected to
Hit the play button in Processing and you should see a test pattern (image update takes a couple of seconds):
If all goes well, you should see the striped test pattern above!
Next we will go back to the Arduino IDE and edit the sketch so the Arduino sends a live image from the camera in the Processing viewer:
Return to the Arduino IDE
Comment out line 48 of the Arduino sketch
// We've disabled the test pattern and will display a live image
// Camera.testPattern();
Compile and upload to the board
Once the sketch is uploaded hit the play button in Processing again
After a few seconds you should now have a live image:
Considerations for TinyML
The full VGA (640×480 resolution) output from our little camera is way too big for current TinyML applications. uTensor runs handwriting detection with MNIST that uses 28×28 images. The person detection example in the TensorFlow Lite for Microcontrollers example uses 96×96 which is more than enough. Even state-of-the-art ‘Big ML’ applications often only use 320×320 images (see the TinyML book). Also consider an 8-bit grayscale VGA image occupies 300KB uncompressed and the Nano 33 BLE Sense has 256KB of RAM. We have to do something to reduce the image size!
Camera format options
The OV7670 module supports lower resolutions through configuration options. The options modify the image data before it reaches the Arduino. The configurations currently available via the library today are:
VGA – 640 x 480
CIF – 352 x 240
QVGA – 320 x 240
QCIF – 176 x 144
This is a good start as it reduces the amount of time it takes to send an image from the camera to the Arduino. It reduces the size of the image data array required in your Arduino sketch as well. You select the resolution by changing the value in Camera.begin. Don’t forget to change the size of your array too.
Camera.begin(QVGA, RGB565, 1)
The camera library also offers different color formats: YUV422, RGB444 and RGB565. These define how the color values are encoded and all occupy 2 bytes per pixel in our image data. We’re using the RGB565 format which has 5 bits for red, 6 bits for green, and 5 bits for blue:
Converting the 2-byte RGB565 pixel to individual red, green, and blue values in your sketch can be accomplished as follows:
// Convert from RGB565 to 24-bit RGB uint16_t pixel = (high << 8) | low; int red = ((pixel >> 11) & 0x1f) << 3; int green = ((pixel >> 5) & 0x3f) << 2; int blue = ((pixel >> 0) & 0x1f) << 3;
Resizing the image on the Arduino
Once we get our image data onto the Arduino, we can then reduce the size of the image further. Just removing pixels will give us a jagged (aliased) image. To do this more smoothly, we need a downsampling algorithm that can interpolate pixel values and use them to create a smaller image.
The techniques used to resample images is an interesting topic in itself. We found this downsampling example from Eloquent Arduino works with fine the Arduino_OV767X camera library output (see animated GIF above).
Applications like the TensorFlow Lite Micro Person Detection example that use CNN based models on Arduino for machine vision may not need any further preprocessing of the image — other than averaging the RGB values in order to remove color for 8-bit grayscale data per pixel.
However, if you do want to perform normalization, iterating across pixels using the Arduino max and min functions is a convenient way to obtain the upper and lower bounds of input pixel values. You can then use map to scale the output pixel values to a 0-255 range.
This was an introduction to how to connect an OV7670 camera module to the Arduino Nano 33 BLE Sense and some considerations for obtaining data from the camera for TinyML applications. There’s a lot more to explore on the topic of machine vision on Arduino — this is just a start!
Bike signal display keeps riders safe with machine learning
Arduino Team — June 21st, 2020
Cycling can be fun, not to mention great exercise, but is also dangerous at times. In order to facilitate safety and harmony between road users on his hour-plus bike commute in Marseille, France, Maltek created his own LED backpack signaling setup.
The device uses a hand mounted Arduino Nano 33 BLE Sense to record movement via its onboard IMU and runs a TinyML gesture recognition model to translate this into actual road signals. Left and right rotations of the wrist are passed along to the backpack unit over BLE, which shows the corresponding turn signal on its LED panel.
Other gestures include a back twist for stop, forward twist to say “merci,” and it displays a default green forward scrolling arrow as the default state.
Machine learning (ML) algorithms come in all shapes and sizes, each with their own trade-offs. We continue our exploration of TinyML on Arduino with a look at the Arduino KNN library.
In addition to powerful deep learning frameworks like TensorFlow for Arduino, there are also classical ML approaches suitable for smaller data sets on embedded devices that are useful and easy to understand — one of the simplest is KNN.
One advantage of KNN is once the Arduino has some example data it is instantly ready to classify! We’ve released a new Arduino library so you can include KNN in your sketches quickly and easily, with no off-device training or additional tools required.
In this article, we’ll take a look at KNN using the color classifier example. We’ve shown the same application with deep learning before — KNN is a faster and lighter weight approach by comparison, but won’t scale as well to larger more complex datasets.
Color classification example sketch
In this tutorial, we’ll run through how to classify objects by color using the Arduino_KNN library on the Arduino Nano 33 BLE Sense.
Select ColorClassifier from File > Examples > Arduino_KNN
Compile this sketch and upload to your Arduino board
The Arduino_KNN library
The example sketch makes use of the Arduino_KNN library. The library provides a simple interface to make use of KNN in your own sketches:
#include <Arduino_KNN.h> // Create a new KNNClassifier
KNNClassifier myKNN(INPUTS);
In our example INPUTS=3 – for the red, green and blue values from the color sensor.
Sampling object colors
When you open the Serial Monitor you should see the following message:
Arduino KNN color classifier
Show me an example Apple
The Arduino board is ready to sample an object color. If you don’t have an Apple, Pear and Orange to hand you might want to edit the sketch to put different labels in. Keep in mind that the color sensor works best in a well lit room on matte, non-shiny objects and each class needs to have distinct colors! (The color sensor isn’t ideal to distinguish between an orange and a tangerine — but it could detect how ripe an orange is. If you want to classify objects by shape you can always use a camera.)
When you put the Arduino board close to the object it samples the color and adds it to the KNN examples along with a number labelling the class the object belongs to (i.e. numbers 0,1 or 2 representing Apple, Orange or Pear). ML techniques where you provide labelled example data are also called supervised learning.
The code in the sketch to add the example data to the KNN function is as follows:
readColor(color); // Add example color to the KNN model
myKNN.addExample(color, currentClass);
The red, green and blue levels of the color sample are also output over serial:
The sketch takes 30 color samples for each object class. You can show it one object and it will sample the color 30 times — you don’t need 30 apples for this tutorial! (Although a broader dataset would make the model more generalized.)
Classification
With the example samples acquired the sketch will now ask to guess your object! The example reads the color sensor using the same function as it uses when it acquired training data — only this time it calls the classify function which will guess an object class when you show it a color:
readColor(color); // Classify the object classification = myKNN.classify(color, K);
You can try showing it an object and see how it does:
Let me guess your object
0.44,0.28,0.28
You showed me an Apple
Note: It will not be 100% accurate especially if the surface of the object varies or the lighting conditions change. You can experiment with different numbers of examples, values for k and different objects and environments to see how this affects results.
How does KNN work?
Although the Arduino_KNN library does the math for you it’s useful to understand how ML algorithms work when choosing one for your application. In a nutshell, the KNN algorithm classifies objects by comparing how close they are to previously seen examples. Here’s an example chart with average daily temperature and humidity data points. Each example is labelled with a season:
To classify a new object (the “?” on the chart) the KNN classifier looks for the most similar previous example(s) it has seen. As there are two inputs in our example the algorithm does this by calculating the distance between the new object and each of the previous examples. You can see the closest example above is labelled “Winter”.
The k in KNN is just the number of closest examples the algorithm considers. With k=3 it counts the three closest examples. In the chart above the algorithm would give two votes for Spring and one for Winter — so the result would change to Spring.
One disadvantage of KNN is the larger the amount of training example data there is, the longer the KNN algorithm needs to spend checking each time it classifies an object. This makes KNN less feasible for large datasets and is a major difference between KNN and a deep learning based approach.
Classifying objects by color
In our color classifier example there are three inputs from the color sensor. The example colors from each object can be thought of as points in three dimensional space positioned on red, green and blue axes. As usual the KNN algorithm guesses objects by checking how close the inputs are to previously seen examples, but because there are three inputs this time it has to calculate the distances in three dimensional space. The more dimensions the data has the more work it is to compute the classification result.
Further thoughts
This is just a quick taste of what’s possible with KNN. You’ll find an example for board orientation in the library examples, as well as a simple example for you to build on. You can use any sensor on the BLE Sense board as an input, and even combine KNN with other ML techniques.
Of course there are other machine learning resources available for Arduino include TensorFlow Lite tutorials as well as support from professional tools such as Edge Impulse and Qeexo. We’ll be inviting more experts to explore machine learning on Arduino more in the coming weeks.
Um dir ein optimales Erlebnis zu bieten, verwenden wir Technologien wie Cookies, um Geräteinformationen zu speichern und/oder darauf zuzugreifen. Wenn du diesen Technologien zustimmst, können wir Daten wie das Surfverhalten oder eindeutige IDs auf dieser Website verarbeiten. Wenn du deine Einwillligung nicht erteilst oder zurückziehst, können bestimmte Merkmale und Funktionen beeinträchtigt werden.
Funktional
Immer aktiv
Die technische Speicherung oder der Zugang ist unbedingt erforderlich für den rechtmäßigen Zweck, die Nutzung eines bestimmten Dienstes zu ermöglichen, der vom Teilnehmer oder Nutzer ausdrücklich gewünscht wird, oder für den alleinigen Zweck, die Übertragung einer Nachricht über ein elektronisches Kommunikationsnetz durchzuführen.
Vorlieben
Die technische Speicherung oder der Zugriff ist für den rechtmäßigen Zweck der Speicherung von Präferenzen erforderlich, die nicht vom Abonnenten oder Benutzer angefordert wurden.
Statistiken
Die technische Speicherung oder der Zugriff, der ausschließlich zu statistischen Zwecken erfolgt.Die technische Speicherung oder der Zugriff, der ausschließlich zu anonymen statistischen Zwecken verwendet wird. Ohne eine Vorladung, die freiwillige Zustimmung deines Internetdienstanbieters oder zusätzliche Aufzeichnungen von Dritten können die zu diesem Zweck gespeicherten oder abgerufenen Informationen allein in der Regel nicht dazu verwendet werden, dich zu identifizieren.
Marketing
Die technische Speicherung oder der Zugriff ist erforderlich, um Nutzerprofile zu erstellen, um Werbung zu versenden oder um den Nutzer auf einer Website oder über mehrere Websites hinweg zu ähnlichen Marketingzwecken zu verfolgen.