-
Schwarz posted an update 7 months, 2 weeks ago
The method shows maximum accuracy of 98.63% and Area under Curve of 0.981 using Random Forest Classifier and ten fold cross validation.Tracking a liquid or food bolus in videofluoroscopic images during X-ray based diagnostic swallowing examinations is a dominant clinical approach to assess human swallowing function during oral, pharyngeal and esophageal stages of swallowing. This tracking represents a highly challenging problem for clinicians as swallowing is a rapid action. Therefore, we developed a computer-aided method to automate bolus detection and tracking in order to alleviate issues associated with human factors. Specifically, we applied a stateof-the-art deep learning model called Mask-RCNN to detect and segment the bolus in videofluoroscopic image sequences. We trained the algorithm with 450 swallow videos and evaluated with an independent dataset of 50 videos. The algorithm was able to detect and segment the bolus with a mean average precision of 0.49 and an intersection of union of 0.71. The proposed method indicated robust detection results that can help to improve the speed and accuracy of a clinical decisionmaking process.Vocal folds (VFs) play a critical role in breathing, swallowing, and speech production. VF dysfunctions caused by various medical conditions can significantly reduce patients’ quality of life and lead to life-threatening conditions such as aspiration pneumonia, caused by food and/or liquid “invasion” into the windpipe. selleck kinase inhibitor Laryngeal endoscopy is routinely used in clinical practice to inspect the larynx and to assess the VF function. Unfortunately, the resulting videos are only visually inspected, leading to loss of valuable information that can be used for early diagnosis and disease or treatment monitoring. In this paper, we propose a deep learning-based image analysis solution for automated detection of laryngeal adductor reflex (LAR) events in laryngeal endoscopy videos. Laryngeal endoscopy image analysis is a challenging task because of anatomical variations and various imaging problems. Analysis of LAR events is further challenging because of data imbalance since these are rare events. In order to tackle this problem, we propose a deep learning system that consists of a two-stream network with a novel orthogonal region selection subnetwork. To our best knowledge, this is the first deep learning network that learns to directly map its input to a VF open/close state without first segmenting or tracking the VF region, which drastically reduces labor-intensive manual annotation needed for mask or track generation. The proposed two-stream network and the orthogonal region selection subnetwork allow integration of local and global information for improved performance. The experimental results show promising performance for the automated, objective, and quantitative analysis of LAR events from laryngeal endoscopy videos.Clinical relevance- This paper presents an objective, quantitative, and automatic deep learning based system for detection of laryngeal adductor reflex (LAR) events in laryngoscopy videos.Different approaches have been proposed in the literature to detect the fall of an elderly person. In this paper, we propose a fall detection method based on the classification of parameters extracted from depth images. Three supervised learning methods are compared decision tree, K-Nearest Neighbors (K-NN) and Random Forests (RF). The methods have been tested on a database of depth images recorded in a nursing home over a period of 43 days. The Random Forests based method yields the best results, achieving 93% sensitivity and 100% specificity when we restrict our study around the bed. Furthermore, this paper also proposes a 37 days follow-up of the person, to try and estimate his or her daily habits.Cervical spinal cord injury (cSCI) causes the paralysis of upper and lower limbs and trunk, significantly reducing quality of life and community participation of the affected individuals. The functional use of the upper limbs is the top recovery priority of people with cSCI and wearable vision-based systems have recently been proposed to extract objective outcome measures that reflect hand function in a natural context. However, previous studies were conducted in a controlled environment and may not be indicative of the actual hand use of people with cSCI living in the community. Thus, we propose a deep learning algorithm for automatically detecting hand-object interactions in egocentric videos recorded by participants with cSCI during their daily activities at home. The proposed approach is able to detect hand-object interactions with good accuracy (F1-score up to 0.82), demonstrating the feasibility of this system in uncontrolled situations (e.g., unscripted activities and variable illumination). This result paves the way for the development of an automated tool for measuring hand function in people with cSCI living in the community.Exercising has various health benefits and it has become an integral part of the contemporary lifestyle. However, some workouts are complex and require a trainer to demonstrate their steps. Thus, there are various workout video tutorials available online. Having access to these, people are able to independently learn to perform these workouts by imitating the poses of the trainer in the tutorial. However, people may injure themselves if not performing the workout steps accurately. Therefore, previous work suggested to provide visual feedback to users by detecting 2D skeletons of both the trainer and the learner, and then using the detected skeletons for pose accuracy estimation. Using 2D skeletons for comparison may be unreliable, due to the highly variable body shapes, which complicate their alignment and pose accuracy estimation. To address this challenge, we propose to estimate 3D rather than 2D skeletons and then measure the differences between the joint angles of the 3D skeletons. Leveraging recent advancements in deep latent variable models, we are able to estimate 3D skeletons from videos. Furthermore, a positive-definite kernel based on diversity-encouraging prior is introduced to provide a more accurate pose estimation. Experimental results show the superiority of our proposed 3D pose estimation over the state-of-the-art baselines.