ICRA 2026 Workshop — Enabling Autonomy and Independence in Aging Societies through Advanced Robotics and AI

MuscleLens:
A Shared Muscle-Space Pipeline for Parkinsonian Gait Analysis

Yinglei Zhu1†, Bozhao Wang1†, Huichan Zhao1*
1Tsinghua University
Equal contribution.  *Corresponding author.
MuscleLens pipeline overview

MuscleLens lifts monocular video into a phase-locked 30×80 muscle activation code on a unified musculoskeletal model.

Abstract

Parkinson's disease (PD) gait assessment remains dominated by subjective clinical rating, while the instrumented gait laboratories that could substitute for it are inaccessible to most patients.

We present MuscleLens, a pipeline that lifts monocular video into a phase-locked $30 \times 80$ muscle activation code on a unified musculoskeletal model. The stack combines SMPL recovery, GMR retargeting to MyoFullBody, and a frozen MuscleMimic policy that emits 80-dimensional muscle activations at $100$ Hz. Evaluated on roughly 4 800 subject-level samples from CARE-PD, augmenting SMPL-H kinematics with muscle features (i) raises 5-fold PD balanced accuracy from $0.807$ to $0.814$ at matched AUC ($0.937$); (ii) consistently lowers cohort-normalised UPDRS MAE across PCA dimensions, reaching $0.344$ versus $0.349$ at 128 dimensions; and (iii) improves leave-one-cohort-out balanced accuracy on six of seven cohorts.

Beyond accuracy, the actuator-space code exposes co-activation patterns that joint kinematics alone cannot encode, providing a biomechanically interpretable view of PD gait. The monocular branch demonstrates the end-to-end deployment route: about 1 minute from a 15-second clip to a complete muscle code on a single NVIDIA RTX 5090, bringing muscle-level analysis within reach of routine clinical-style acquisition.

TL;DR.  Cascade GVHMR → GMR → frozen MuscleMimic into a one-shot pipeline from RGB video to phase-locked muscle activations. On CARE-PD, adding muscle features to SMPL-H kinematics gives small but consistent gains in PD BAcc, cohort-normalised UPDRS MAE, and LOCO transfer, and surfaces co-activation structure that kinematics alone do not encode.

Pipeline

MuscleLens accepts either monocular video or SMPL motion. For video, we recover SMPL-H with GVHMR; for CARE-PD we use the released SMPL-H sequences directly. The motion is then retargeted to the MyoFullBody musculoskeletal model via GMR, and replayed under the frozen mm-fullbody-base MuscleMimic policy, which emits 80-dimensional muscle activations at $100$ Hz. A lightweight coordinate-frame harmonisation precedes retargeting on the CARE-PD branch to keep cross-source processing consistent.

Activations are segmented into gait cycles, resampled to 30 phase bins per cycle, and averaged across valid cycles to form a code $\mathbf{A}\in\mathbb{R}^{30\times 80}$, where $A_{p,m}=\mathbb{E}_k[a_m(t_p^{(k)})]$. We compare a 550-dimensional NMF summary of this code, a 5 310-dimensional kinematic descriptor, and their concatenation.

Compute & runtime

The full inference stack runs on a single NVIDIA RTX 5090. The MuscleMimic policy emits activations at roughly 12 s per SMPL clip; on the video branch, end-to-end processing of a 15-second clip (GVHMR → SMPL → GMR → MuscleMimic) completes in about 1 minute, returning both kinematics and muscle activations in a single pass. The headline number is the end-to-end budget a downstream user would experience — placing MuscleLens within reach of routine clinical-style acquisition without specialised gait laboratories.

Video → SMPL → Muscle walkthrough

A single walk-forward clip carried through the full MuscleLens pipeline.

Stage 1 — Monocular video → SMPL-H (GVHMR). Left: in-camera SMPL-H projection overlaid on the raw RGB clip. Right: the same motion rendered in a world-grounded global frame.

Stage 2 — SMPL-H → MyoFullBody (GMR). Joint trajectories are retargeted onto the muscle-actuated skeleton.

Stage 3 — MyoFullBody → muscle activations (MuscleMimic). A frozen MuscleMimic policy tracks the retargeted motion in MuJoCo and emits 80-dimensional muscle activations at $100$ Hz.

Key supervised results

All quantitative results below use the same SMPL-H input, isolating the effect of muscle augmentation from any upstream reconstruction error. Kinematics remain the strongest single modality; muscle features add a small but consistent gain on top of kinematics, most defensibly once cohort structure and feature dimensionality are controlled. The gain appears in three independent settings: random folds, cohort-normalised PCA, and LOCO balanced accuracy.

Modality PD AUC PD BAcc UPDRS MAE 128-D MAE
(cohort-norm)
Muscle NMF 0.836 ± 0.023 0.751 ± 0.027 0.454 ± 0.037 0.490
Kinematic 0.938 ± 0.030 0.807 ± 0.011 0.372 ± 0.013 0.349
Muscle + Kinematic 0.937 ± 0.029 0.814 ± 0.025 0.382 ± 0.022 0.344

5-fold logistic regression on 4 669 PD-labelled (resp.\ 2 559 UPDRS-labelled) subject samples; mean ± std across folds. The last column is the cohort-normalised PCA setting at 128 dimensions — the strongest evidence that muscle features contribute complementary signal, with the improvement holding across 8 / 16 / 32 / 64 / 128 PCA dimensions.

Modality comparison

Modality comparison. (a) Random five-fold PD classification; (b) Random five-fold UPDRS-gait prediction; (c) After cohort-wise z-scoring and equal-dimensional PCA, fusion yields consistently lower UPDRS MAE than kinematics across PCA dimensions.

Leave-one-cohort-out transfer

Cohort N % PD Kin. BAcc Fus. BAcc Δ
3DGait 88 72.7% 0.466 0.503 +0.036
BMCLab 779 100% 0.981 0.985 +0.004
DNE 303 38.3% 0.480 0.506 +0.026
E-LC 162 90.1% 0.483 0.479 −0.003
KUL-DT-T 735 100% 0.899 0.905 +0.005
PD-GaM 1 692 100% 0.690 0.717 +0.027
T-LTC 910 100% 0.618 0.663 +0.045

Fusion improves balanced accuracy on 6 of 7 cohorts (mean $\Delta=+0.020$). Pooled LOCO PD AUC looks alarmingly low for both modalities (kinematics $0.282$, fusion $0.340$) but is a between-cohort probability-calibration artefact: four of seven cohorts are 100 % PD, so per-cohort AUC is undefined and the pool is driven by cross-cohort calibration shift rather than ranking failure inside any single cohort. BAcc, invariant to that shift, is the meaningful summary on this corpus.

Fusion embedding overview

Fusion embedding overview

(a) HDBSCAN finds one dominant cluster and a small side cluster. (b) Cohort labels still retain source structure. (c) UPDRS-gait labels show a weak ordering rather than clean separation.

The fused unsupervised embedding shows one dominant cluster and a smaller side cluster, with severity labels forming only a weak gradient. Weighted UPDRS purity is $0.472$ for muscle, $0.448$ for kinematics, and $0.447$ for fusion; PD purity sits at ≈ 0.96 across all modalities, essentially the labelled PD base rate of 95.1 %. The unsupervised view is therefore not diagnostic on its own; the supervised analyses above remain where the muscle-space contribution is most credible.

Conclusion & limitations

MuscleLens establishes a shared route from monocular video or SMPL-H motion to phase-locked muscle activation codes, and provides initial evidence that the actuator-space representation carries information complementary to joint kinematics: five-fold PD balanced accuracy improves, cohort-normalised UPDRS regression improves at every PCA dimension tested, and LOCO balanced accuracy improves on six of seven cohorts. The activation code also surfaces co-activation structure that joint kinematics alone cannot encode — a step toward more interpretable PD gait analysis.

Three limitations bound the present claim:

  1. The monocular branch is a deployment demonstration, not yet a clinically validated front end. Large-scale patient-video evaluation remains future work; the qualitative video → SMPL → muscle clips above document the current state.
  2. The musculoskeletal model and tracking policy are not yet calibrated to older adults with PD-specific movement strategies, which likely caps co-activation fidelity.
  3. The residual cohort-transfer gap on pooled LOCO PD AUC reflects a between-cohort probability-calibration problem more than a within-cohort ranking failure. Domain-invariant training and cohort-calibrated scoring are natural next steps and the most direct route to strengthening the present claim.

BibTeX

@inproceedings{zhu2026musclelens,
  title     = {MuscleLens: A Shared Muscle-Space Pipeline for Parkinsonian Gait Analysis},
  author    = {Zhu, Yinglei and Wang, Bozhao and Zhao, Huichan},
  booktitle = {ICRA 2026 Workshop on Enabling Autonomy and Independence in
               Aging Societies through Advanced Robotics and AI},
  year      = {2026}
}