ICRA 2026 Workshop — Enabling Autonomy and Independence in Aging Societies through Advanced Robotics and AI
Parkinson's disease (PD) gait assessment remains dominated by subjective clinical rating, while the instrumented gait laboratories that could substitute for it are inaccessible to most patients.
We present MuscleLens, a pipeline that lifts monocular video into a phase-locked $30 \times 80$ muscle activation code on a unified musculoskeletal model. The stack combines SMPL recovery, GMR retargeting to MyoFullBody, and a frozen MuscleMimic policy that emits 80-dimensional muscle activations at $100$ Hz. Evaluated on roughly 4 800 subject-level samples from CARE-PD, augmenting SMPL-H kinematics with muscle features (i) raises 5-fold PD balanced accuracy from $0.807$ to $0.814$ at matched AUC ($0.937$); (ii) consistently lowers cohort-normalised UPDRS MAE across PCA dimensions, reaching $0.344$ versus $0.349$ at 128 dimensions; and (iii) improves leave-one-cohort-out balanced accuracy on six of seven cohorts.
Beyond accuracy, the actuator-space code exposes co-activation patterns that joint kinematics alone cannot encode, providing a biomechanically interpretable view of PD gait. The monocular branch demonstrates the end-to-end deployment route: about 1 minute from a 15-second clip to a complete muscle code on a single NVIDIA RTX 5090, bringing muscle-level analysis within reach of routine clinical-style acquisition.
MuscleLens accepts either monocular video
or SMPL motion. For video, we recover SMPL-H with
GVHMR; for CARE-PD we use the released SMPL-H
sequences directly. The motion is then retargeted to the
MyoFullBody musculoskeletal model via
GMR, and replayed under the frozen
mm-fullbody-base MuscleMimic policy,
which emits 80-dimensional muscle activations at $100$ Hz.
A lightweight coordinate-frame harmonisation precedes retargeting on
the CARE-PD branch to keep cross-source processing consistent.
Activations are segmented into gait cycles, resampled to 30 phase bins per cycle, and averaged across valid cycles to form a code $\mathbf{A}\in\mathbb{R}^{30\times 80}$, where $A_{p,m}=\mathbb{E}_k[a_m(t_p^{(k)})]$. We compare a 550-dimensional NMF summary of this code, a 5 310-dimensional kinematic descriptor, and their concatenation.
The full inference stack runs on a single NVIDIA RTX 5090. The MuscleMimic policy emits activations at roughly 12 s per SMPL clip; on the video branch, end-to-end processing of a 15-second clip (GVHMR → SMPL → GMR → MuscleMimic) completes in about 1 minute, returning both kinematics and muscle activations in a single pass. The headline number is the end-to-end budget a downstream user would experience — placing MuscleLens within reach of routine clinical-style acquisition without specialised gait laboratories.
A single walk-forward clip carried through the full MuscleLens pipeline.
Stage 1 — Monocular video → SMPL-H (GVHMR). Left: in-camera SMPL-H projection overlaid on the raw RGB clip. Right: the same motion rendered in a world-grounded global frame.
Stage 2 — SMPL-H → MyoFullBody (GMR). Joint trajectories are retargeted onto the muscle-actuated skeleton.
Stage 3 — MyoFullBody → muscle activations (MuscleMimic). A frozen MuscleMimic policy tracks the retargeted motion in MuJoCo and emits 80-dimensional muscle activations at $100$ Hz.
All quantitative results below use the same SMPL-H input, isolating the effect of muscle augmentation from any upstream reconstruction error. Kinematics remain the strongest single modality; muscle features add a small but consistent gain on top of kinematics, most defensibly once cohort structure and feature dimensionality are controlled. The gain appears in three independent settings: random folds, cohort-normalised PCA, and LOCO balanced accuracy.
| Modality | PD AUC | PD BAcc | UPDRS MAE | 128-D MAE (cohort-norm) |
|---|---|---|---|---|
| Muscle NMF | 0.836 ± 0.023 | 0.751 ± 0.027 | 0.454 ± 0.037 | 0.490 |
| Kinematic | 0.938 ± 0.030 | 0.807 ± 0.011 | 0.372 ± 0.013 | 0.349 |
| Muscle + Kinematic | 0.937 ± 0.029 | 0.814 ± 0.025 | 0.382 ± 0.022 | 0.344 |
5-fold logistic regression on 4 669 PD-labelled (resp.\ 2 559 UPDRS-labelled) subject samples; mean ± std across folds. The last column is the cohort-normalised PCA setting at 128 dimensions — the strongest evidence that muscle features contribute complementary signal, with the improvement holding across 8 / 16 / 32 / 64 / 128 PCA dimensions.
Modality comparison. (a) Random five-fold PD classification; (b) Random five-fold UPDRS-gait prediction; (c) After cohort-wise z-scoring and equal-dimensional PCA, fusion yields consistently lower UPDRS MAE than kinematics across PCA dimensions.
| Cohort | N | % PD | Kin. BAcc | Fus. BAcc | Δ |
|---|---|---|---|---|---|
| 3DGait | 88 | 72.7% | 0.466 | 0.503 | +0.036 |
| BMCLab | 779 | 100% | 0.981 | 0.985 | +0.004 |
| DNE | 303 | 38.3% | 0.480 | 0.506 | +0.026 |
| E-LC | 162 | 90.1% | 0.483 | 0.479 | −0.003 |
| KUL-DT-T | 735 | 100% | 0.899 | 0.905 | +0.005 |
| PD-GaM | 1 692 | 100% | 0.690 | 0.717 | +0.027 |
| T-LTC | 910 | 100% | 0.618 | 0.663 | +0.045 |
Fusion improves balanced accuracy on 6 of 7 cohorts (mean $\Delta=+0.020$). Pooled LOCO PD AUC looks alarmingly low for both modalities (kinematics $0.282$, fusion $0.340$) but is a between-cohort probability-calibration artefact: four of seven cohorts are 100 % PD, so per-cohort AUC is undefined and the pool is driven by cross-cohort calibration shift rather than ranking failure inside any single cohort. BAcc, invariant to that shift, is the meaningful summary on this corpus.
(a) HDBSCAN finds one dominant cluster and a small side cluster. (b) Cohort labels still retain source structure. (c) UPDRS-gait labels show a weak ordering rather than clean separation.
The fused unsupervised embedding shows one dominant cluster and a smaller side cluster, with severity labels forming only a weak gradient. Weighted UPDRS purity is $0.472$ for muscle, $0.448$ for kinematics, and $0.447$ for fusion; PD purity sits at ≈ 0.96 across all modalities, essentially the labelled PD base rate of 95.1 %. The unsupervised view is therefore not diagnostic on its own; the supervised analyses above remain where the muscle-space contribution is most credible.
MuscleLens establishes a shared route from monocular video or SMPL-H motion to phase-locked muscle activation codes, and provides initial evidence that the actuator-space representation carries information complementary to joint kinematics: five-fold PD balanced accuracy improves, cohort-normalised UPDRS regression improves at every PCA dimension tested, and LOCO balanced accuracy improves on six of seven cohorts. The activation code also surfaces co-activation structure that joint kinematics alone cannot encode — a step toward more interpretable PD gait analysis.
Three limitations bound the present claim:
@inproceedings{zhu2026musclelens,
title = {MuscleLens: A Shared Muscle-Space Pipeline for Parkinsonian Gait Analysis},
author = {Zhu, Yinglei and Wang, Bozhao and Zhao, Huichan},
booktitle = {ICRA 2026 Workshop on Enabling Autonomy and Independence in
Aging Societies through Advanced Robotics and AI},
year = {2026}
}