Inferring Grasp Intentions from Arm Trajectories via Deep Learning to Enable Functional Movement in Quadriplegia

Background Cervical spinal cord injury severely affects grasping ability of its survivors. Fortunately, many individuals with quadriplegia retain residual arm movements that allow them to reach for objects. We propose a wearable technology that utilizes pattern recognition and deep learning methods to automatically classify arm trajectories and infer grasping intentions. Further, this technology can enable individuals with SCI to grasp objects without assistance via neuromuscular stimulation. Methods Two cervical SCI participants performed various reaching movements and smooth trajectories in space, which were recorded using an inertial sensor worn on their wrist. Time series classifiers were trained to recognize the trajectories using either a Dynamic Time Warping (DTW) algorithm or a Long Short-Term Memory (LSTM) recurrent neural network. Successful trajectory prediction in real-time was demonstrated using DTW, which when used in combination with a high density neuromuscular stimulation sleeve with textile electrodes, enabled participants to perform functional grasps. ± We demonstrate the feasibility of inferring intention from reaching trajectories using wearable sensors. control neuromuscular


Introduction
In United States alone, every year there are more than 17,700 new cases of spinal cord injury (1). A 3 majority of these injuries results in incomplete (48%) and complete (12%) quadriplegia, which severely affects arm and hand movements of the survivors and undermines their quality of life.
Neuromuscular stimulation offers a viable solution to assist with arm and hand movements to increase independence, but often users find it challenging to efficiently control such stimulation devices for everyday use. Therefore, several different modalities have been developed to extract user intent for controlling neuromuscular stimulation devices in order to restore grasping. These modalities range from conventional push button or shoulder position control (2,3), to implanted muscle sensors (4), and most recently brain implants (5,6).
Grasping an object is often preceded by reaching for the object. In fact, previous studies have shown that grasping intentions of amputees and able-bodied participants could be inferred from their muscle activity (electromyogram signals) during reaching (7). Quadriplegia is most often caused by damage to the C5 vertebra and importantly, individuals with C5 and below level injury retain sufficient control over their deltoid and biceps muscles, which allows them to reach for objects (8,9). Therefore, we proposed to develop a non-invasive approach that can infer grasping intentions of quadriplegics from their reaching and other novel arm trajectories. Unlike previous studies that used multi-channel surface electromyography for deciphering reaching movements (7), here we used a single low-cost, wearable and easy-to-setup inertial sensor. Further, we combined our non-invasive grasp inference technique with a custom built neuromuscular stimulator and sleeve, to facilitate hand opening and closing in quadriplegic SCI participants and enabled them to perform functional movements (e.g. eat a granola bar).
In recent years, inertial measurement units (IMU) are extensively being used for human computer interactions, particularly for gesture recognition and wearable sensing (10). With advancement in portable computing devices, sophisticated machine learning algorithms such as recurrent neural networks, can be readily deployed for deciphering IMU data (11). In this study, we compared a wellknown pattern recognition algorithm called Dynamic Time Warping (DTW) with a recurrent neural network for time series classification called Long Short-Term Memory (LSTM) and classified reaching trajectories in 2-dimensional (2D) and 3-dimensional (3D) space. We hypothesized that while DTW-based techniques are easily deployable and computationally inexpensive, LSTM networks with inherent long-term dependencies will perform more consistently across multiple days.
In Sect. 2, methods for the paper describing experimental setup, study protocol, and training of machine learning algorithms are presented. Section 3 presents results from offline and online validation of the algorithms, based on data from two SCI participants and discusses its significance.

A Participants
Two participants with quadriplegia were recruited for the study after providing informed consent. The were decoded online (in real-time) and used to drive a custom neuromuscular stimulator with textilebased electrodes housed in a sleeve (12). This in turn allowed the participant to perform functional movements (e.g. eat a granola bar). Participant 2 was a 28 year old male, injured 10 years prior, with a C4/C5 ASIA A injury. He participated in 3 sessions, which involved 2 training and 1 online testing session.

B Experiment Setup and Data Collection
Participants were seated with their hands initially resting on a table. A wireless sensor module was attached to the wrist of their arm using a Velcro strap. While both participants were bilaterally impaired, each still possessed residual movement that allowed reaching with at least one of their arms and was eventually used for the study. The sensor module consisted of a 32-bit ARM microcontroller unit (MCU) from Adafruit (Feather Huzzah32) and a Bosch SensorTec BNO055 9-axis IMU. The IMU has a built-in processor and algorithms to estimate its orientation and perform gravity compensation in real-time to produce linear acceleration in three orthogonal directions. Linear acceleration along the X, Y, and Z axes was available externally via an I2C interface. A flexible printed circuit board was designed to interconnect the IMU with the MCU as shown in Fig. 1B. Data was continuously streamed from the MCU at 50 Hz via Bluetooth to MATLAB 2019a running on a desktop PC and stored for offline processing.
During the experiments, verbal cues associated with different 2D and 3D movement trajectories were randomly called out to the participant. The participants were instructed to perform the reaching trajectories starting from the edge or corner of the table and move towards the center, using smooth movements that were up to a second long. Three different 3D reaching trajectories: a sideways arc (e.g. reaching for a cup or bottle, Fig. 1A), a vertical arc (e.g. reaching for a pen or marker lying on a table), and a corkscrew motion were trained. Additionally four 2D trajectories (performed in the horizontal plane) corresponding to well-known English and Greek letters: S, ε (epsilon or E), γ (gamma), and M were trained. Experiments were conducted in blocks of 18-20 trials and sufficient breaks were given between blocks to minimize participant fatigue. Initially, the participants were asked to perform only S and ε trajectories because these were simple to learn and didn't cause fatigue. Later, once the participants became comfortable with moving their arm, we included additional 2D and 3D trajectories. Thus, in our final datasets there was a higher percentage of 2D trajectories (especially, S and ε) than the remaining trajectories.  Figure 1C shows still images of an SCI participant using a simple 2D trajectory (e.g. M) to grasp and eat a granola bar with his paralyzed hand.

C Data Processing and Machine Learning
The 3-axis linear acceleration obtained from the IMU was band-pass filtered (Butterworth, 8th order, 6 0.2-6 Hz) and processed offline for identifying training samples. The magnitude of the 3-axis acceleration vector was used to identify onset of movement by setting a threshold of 0.95 g. The movement onsets were then used to segment the acceleration data over time along the X, Y, and Z axes into windows ranging − 0.1 s to 0.9 s with respect to onset. Each trial was visually confirmed to be free from any noise artifacts or if it exceeded the 1 s window and such trials were excluded from further analysis. Next, two time series classifiers based on either a Dynamic Time Warping (DTW) distance measure or Long Short Term Memory (LSTM) network algorithms were trained separately for 2D and 3D trajectories.
The DTW algorithm optimally aligns a sample trajectory with respect to a previously determined template trajectory such that the Euclidean distance between the two trajectories is minimized. This is achieved by iteratively expanding or shrinking the time axis until an optimal match is obtained. For multivariate data such as acceleration, the algorithm simultaneously minimizes the distance along the different dimensions using dependent time warping (13). In our DTW-based classifier, this algorithm was used to compute the optimal distance between a test sample and pre-defined templates associated with the 2D and 3D trajectories. Ultimately, the template with the smallest optimal distance to the test sample, was selected as the classifier's output. Since the classifier's output is dependent on the quality of its templates, we used an internal optimization loop to select the best template trajectory from a set of training trajectories. Within this loop, the DTW scores of each training sample with every other training sample was computed. Then the training sample with the least aggregate DTW score, was chosen as the template for that trajectory.
To implement the LSTM network we used MATLAB R2019b Deep Learning Toolbox with default values for most parameters. Specifically, an LSTM network comprising of a single bidirectional layer with 10 hidden units was used. This transformed the 2D or 3D linear acceleration data into inputs for a fully connected layer whose outcome was binary, i.e. 0 or 1. Next, a softmax layer was used to determine the probability of multiple output classes. Finally, the network output mode was set at 'last', so as to generate a decision only after the final time step has passed. This allowed the LSTM classifier to behave similarly to DTW and classify trajectory windows. During training of the LSTM network weights, an adaptive moment estimation (ADAM) solver was used with a gradient threshold of 1 and maximum number of epochs of 200. Since all the training and validation data were 1 second long, zero padding was not used.
During real-time classification of arm trajectories, the linear acceleration signals were filtered and processed in real-time using a MATLAB script that looped at 50 Hz. Within the loop, the acceleration data was divided into 1 second long segments with 98% overlap. To demonstrate proof-of-concept, only the DTW-based classifier was implemented and was designed to compare the incoming acceleration windows with 2D trajectories. If the optimal distance between trajectories were below 10 units (empirically determined), then positive classification was issued, which then triggered our custom neuromuscular stimulator to perform a complete movement sequence of opening and closing of the hand.

Results And Discussion
Over 250 training samples across 7 movement trajectories were recorded for participant 1 and 96 samples from 5 movement trajectories were recorded for participant 2. Trials with noisy sensor data or incorrect labels were visually identified and removed from the training set. Table 1 shows the distribution of samples across different 2D and 3D trajectories for both the participants. The top row also shows the relative position estimation for the participant's hand in space, which was obtained by double integration of IMU data.
Given the unequal distribution of samples in our dataset, a 5-fold stratified cross-validation scheme was selected for evaluating DTW and LSTM based classifiers. Figure 2 shows the mean ± standard deviation (SD) classification accuracy for the 2 participants. In the offline scenario both DTW and LSTM based classifiers performed well for 2D trajectories, achieving 94 ± 5% and 98 ± 3% accuracy, respectively. For offline 3D trajectories however, LSTM outperformed DTW and obtained 99 ± 3% accuracy over 83 ± 16%. Using two-sided Wilcoxon rank sum test, LSTM based classification accuracy was significantly better than DTW (p < 0.05) in both cases. Also shown in Fig. 2, is the online performance of DTW based classifier for 2D trajectories. During online classification, we either compared between 2 trajectories (e.g. S v/s ε) or between a single trajectory and rest (e.g. M v/s rest) 8 and achieved 79 ± 5% accuracy.
To further evaluate each classifier's performance for type I and II errors, we calculated their cumulative confusion matrices by adding the confusion matrices from each fold for each participant.
The resulting confusion matrices for both classifiers and for both types of trajectories are shown in Fig. 3.
For DTW-based classifier, type I error occurred more frequently for 3D than 2D trajectories. The highest percentage of type I error occurred for the corkscrew trajectory (37.8%), followed by vertical arc (14%), ε (10.2%) and M (10%) trajectories. In terms of type II errors, DTW-based classifier misclassified vertical arc (14.5%), side arc (13.8%) and S (8.33%) trajectories as compared to rest of the classes. For LSTM-based classifier the type I and II errors were very low and ranged from 0-3% for almost all trajectories, with the exception M trajectory that had a type I error rate of 40%. This is probably because we had only 10 trials of M trajectory for training, which weren't enough for the LSTM classifier to distinguish it from other classes that had larger number of samples.
A potential limitation of this study is that the LSTM-based classifier has not been validated during online testing. This is still under development and will be reported in a future publication of this study.
Nonetheless, LSTM's highly robust offline performance, suggests that its online performance will be at least as good as or better than DTW's online performance. Another limitation is that a reasonable degree of residual arm movements should be preserved in order for the deep learning algorithms to reliably infer grasp intentions. However, given that most quadriplegics include individuals with C5 and lower level injury that retain sufficient arm movements, a majority of SCI survivors will be able to operate this technology.

Conclusions
This study demonstrates the feasibility of inferring grasp intentions, merely from reaching and other novel arm motions of individuals with cervical SCI and enables them to benefit from neuromuscular stimulation-based assistance. This approach has clinical viability and could be deployed in rehabilitation centers in the future for use in not only SCI patients, but also individuals living with paralysis from stroke, multiple sclerosis, traumatic brain injury, or other injuries or diseases.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.