Junhua Liu

Junhua Liu
Undergraduate Student
Research assistant at FNii; Research intern at SenseTime
Second Floor, Zhixin Building, Shenzhen, Guangdong, China, 518172

junhua.liu.0@usc.edu,junhualiu@+cuhk.edu.cn, berkeley.edu, cuhk.edu.hk

Email  /  CV (Up to Oct.2023)  /  Google Scholar  /  Github  /  Twitter  / 


I am an incoming PhD student at University of Southern California, advised by Prof. Feng Qian and supported by Viterbi Fellowship. I received my B.S. at the Chinese University of Hong Kong in 2024, advised by Prof. Fangxin Wang and Prof. Shuguang Cui. I also work closely with Prof. Bo Han. In my undergraduate years, I interned at Sensetime, CMU, Harvard, and AIRS, where I worked with Prof. Mallesham Dasari, Prof. Yan Wang, Prof. Huazhe Xu, Prof. Zeyu Wang.

My interest lies at the intersection of system, multimedia, and HCI with a focus on 1) Immersive video and networked VR/AR/XR system; 2) Digital media and content creation; 3) ML for networking system and system for ML deployment.

I am always open to potential collaborations. Please contact me if you are interested.


News

Dec 2022
One paper was accepted by IEEE VR 2023 as first author
June 2022
I joined the Video Codec Group in SenseTime Research Institute as a Research Intern.
June 2022
One paper was accepted by ACM MM 2022 as co-first author.
Mar 2022
I joined the Harvard Univ. as a Research Assistant.
Sep 2022
I joined the Intelligent Networking and Multimedia Lab as a Research Assistant.
July 2022
I joined NLP group in AIRS as a Research Assistant.

Publications

Show selected / Show all by date, Representative papers are Highlight: , *: Equally contribution.

Undisclosed Mobile Streaming through Neural-based 3D Representation Paper Arxiv 24

Junhua Liu, Mallesham Dasari, Bo Han, Yufeng Wang, Fangxin Wang
In submission to Mobicom 2024.

Undisclosed Neural-enhanced 3D Video Conferencing Paper Arxiv 24

Junhua Liu, Ruizhi Cheng, Bo Han, Mallesham Dasari, Fangxin Wang
In submission to NSDI 2024. Arxiv, 2024.

A Networking Perspective of Volumetric Video Service: Architecture, Opportunities and Case Study Network 24

Yili Jin*, Junhua Liu*, Kaiyuan Hu, Fangxin Wang
IEEE Network, 2024.

Documenting Dunhuang Dance Using Motion Capture Technology EG'24

Zeyu Wang, Chengan He, Zhe Yan, Yingke Wang, Jiashun Wang, Junhua Liu, Anzhi Shen, Mengying Zeng, Holly Rushmeier, Huazhe Xu, Borou Yu, Chenchen Lu, Eugene Wang. Eurographics & Journal on Computing and Cultural Heritage, 2024.
A collaborative project among Harvard, Yale, Stanford, and CMU, scientist, musician and dancer. Refer Cave Dance for more details.

Fumos: Neural Compression and Progressive Refinement for Continuous Point Cloud Video Streaming. VR' 24

Junhua Liu*, Zhicheng Liang*, Mallesham Dasari, Fangxin Wang
IEEE VR, 2024; - Also selected to appear in TVCG special issue.

Point cloud video (PCV) offers watching experiences in photorealistic 3D scenes with six-degree-of-freedom (6-DoF), enabling a variety of VR and AR applications. The user's Field of View (FoV) is more fickle with 6-DoF movement than 3-DoF movement in 360-degree video. PCV streaming is extremely bandwidth-intensive. However, current streaming systems still need hundreds of Mbps bandwidth to transmit, exceeding the bandwidth capabilities of commodity devices. To save bandwidth, FoV-adaptive streaming predicts a user's FoV and only downloads point cloud data falling in the predicted FoV. But it is difficult to accurately predict the users FoV even 2-3 seconds before playback due to 6-DoF. Misprediction of FoV or network bandwidth dips results in frequent stalls. To avoid rebuffering, existing systems would cause incomplete FoV and degraded experience, which deteriorates the user's quality of experience (QoE). In this paper, we describe Fumos, a novel system that preserves interactive experience by avoiding playback stalls while maintaining high perceptual quality and high compression rate. We find a research gap in inter-frame redundant utilization and progressive mechanism. Fumos has three crucial designs, including (1) Neural compression framework with inter-frame coding, namely N-PCC, which achieves both bandwidth efficiency and high fidelity. (2) Progressive refinement streaming framework that enables continuous playback by incrementally upgrading a fetched portion to a higher quality (3) System-level adaptation that employs Lyapunov optimization to jointly optimize the long-term user QoE. Experimental results demonstrate that Fumos significantly outperforms Draco, achieving an average decoding rate acceleration of over 260×. Moreover, the proposed compression framework N-PCC attains remarkable BD-Rate gains, averaging 91.7% and 51.7% against the state-of-the-art point cloud compression methods G-PCC and V-PCC, respectively.

Hulk: Human-Centered Live Volumetric Video Streaming System VR' 24

Kaiyuan Hu, Yongting Chen, Kaiying Han, Junhua Liu, Yili Jin, Boyan Li, Fangxin Wang.
IEEE VR (Poster), 2024. Full version is revised at Transaction on Mobile Computing.

Volumetric video has emerged as a prominent medium within the realm of eXtended Reality (XR) with the advancements in computer graphics and depth capture hardware. Users can fully immersive themselves in volumetric video with the ability to switch their viewport in six degree-of-freedom (DOF), including three rotational dimensions (yaw, pitch, roll) and three translational dimensions (X, Y, Z). Different from traditional 2D videos that are composed of pixel matrices, volumetric videos employ point clouds, meshes, or voxels to represent a volumetric scene, resulting in significantly larger data sizes. While previous works have successfully achieved volumetric video streaming in video-on-demand scenarios, the live streaming of volumetric video remains an unresolved challenge due to the limited network bandwidth and stringent latency constraints. In this paper, we for the first time propose a holistic live volumetric video streaming system, LiveVV, which achieves multi-view capture, scene segmentation & reuse, adaptive transmission, and rendering. LiveVV contains multiple lightweight volumetric video capture modules that are capable of being deployed without prior preparation. To reduce bandwidth consumption, LiveVV processes static and dynamic volumetric content separately by reusing static data with low disparity and decimating data with low visual saliency. Besides, to deal with network fluctuation, LiveVV integrates a volumetric video adaptive bitrate streaming algorithm (VABR) to enable fluent playback with the maximum quality of experience. Extensive real-world experiment shows that LiveVV can achieve live volumetric video streaming at a frame rate of 24 fps with a latency of less than 350ms.

Understanding User Behavior in Volumetric Video Watching: Dataset, Analysis and Prediction MM' 23

Kaiyuan Hu, Haowen Yang, Yili Jin, Junhua Liu, Yongting Chen, Miao Zhang, Fangxin Wang
ACM Multimedia (Oral), 2023.

Volumetric video emerges as a new attractive video paradigm in recent years since it provides an immersive and interactive 3D viewing experience with six degree-of-freedom (DoF). Unlike traditional 2D or panoramic videos, volumetric videos require dense point clouds, voxels, meshes, or huge neural models to depict volumetric scenes, which results in a prohibitively high bandwidth burden for video delivery. Users’ behavior analysis, especially the viewport and gaze analysis, then plays a significant role in prioritizing the content streaming within users’ viewport and degrading the remaining content to maximize user QoE with limited bandwidth. Although understanding user behavior is crucial, to the best of our best knowledge, there are no available 3D volumetric video viewing datasets containing fine-grained user interactivity features, not to mention further analysis and behavior prediction. In this paper, we for the first time release a volumetric video viewing behavior dataset, with a large scale, multiple dimensions, and diverse conditions. We conduct an in-depth analysis to understand user behaviors when viewing volumetric videos. Interesting findings on user viewport, gaze, and motion preference related to different videos and users are revealed. We finally design a transformer-based viewport prediction model that fuses the features of both gaze and motion, which is able to achieve high accuracy at various

CaV3: Cache-assisted Viewport Adaptive Volumetric Video Streaming VR 23

Junhua Liu, Boxiang Zhu, Fangxin Wang, Yili Jin, Wenyi Zhang, Zihan Xu, Shuguang Cui.
IEEE VR, 2023.

Volumetric video (VV) recently emerges as a new form of video application providing a photorealistic immersive 3D viewing experience with 6 degree-of-freedom (DoF), which empowers many applications such as VR, AR, and Metaverse. A key problem therein is how to stream the enormous size VV through the network with limited bandwidth. Existing works mostly focused on predicting the viewport for a tiling-based adaptive VV streaming, which however only has quite a limited effect on resource saving. We argue that the content repeatability in the viewport can be further leveraged, and for the first time, propose a client-side cache-assisted strategy that aims to buffer the repeatedly appearing VV tiles in the near future so as to reduce the redundant VV content transmission. The key challenges exist in three aspects, including (1) feature extraction and mining in 6 DoF VV context, (2) accurate long-term viewing pattern estimation and (3) optimal caching scheduling with limited capacity. In this paper, we propose CaV3, an integrated cache-assisted viewport adaptive VV streaming framework to address the challenges. CaV3 employs a Long-short term Sequential prediction model (LSTSP) that achieves accurate short-term, mid-term and long-term viewing pattern prediction with a multi-modal fusion model by capturing the viewer's behavior inertia, current attention, and subjective intention. Besides, CaV3 also contains a contextual MAB-based caching adaptation algorithm (CCA) to fully utilize the viewing pattern and solve the optimal caching problem with a proved upper bound regret. Compared to existing VV datasets only containing single or co-located objects, we for the first time collect a comprehensive dataset with sufficient practical unbounded 360° scenes. The extensive evaluation of the dataset confirms the superiority of CaV3, which outperforms the SOTA algorithm by 15.6%-43% in viewport prediction and 13%-40% in system utility.

Internet Streaming with Implicit Neural Codecs Arxiv' 23

Junhua Liu, Yuanyuan Wang, Fangxin Wang, Mallesham Dasari
In submission, 2023. (Preprint accessible by Email)

Multimedia services evolved explosively in recent years to provide not only higher resolution and frame rate but also richer experience of immersion and interactivity, which empowers a plethora of modern applications such as panoramic and emerging volumetric videos. The ensuing challenge lies in a predicament from the perspective of streaming process, i.e., the increasing demand for video quality and delay by users, and the fast-growing transmission resource overhead required by current media forms. Existing works mainly treat videos as discrete and explicit frames, and based on this, a series of optimization strategies, such as super-resolution, have been established, However, the current methods are high-cost, bulky, and compute-intensive. In this paper, We propose Uno, the first implicit streaming design, which delivers neural networks rather than explicit frames. It is able to: (1) better support the overall streaming pipeline (storage, upload, and download) and address the recent challenges of immersive multimedia, (2) overcome the limitations of existing neural-based techniques in the explicit scheme and integrate them seamlessly, and (3) outperform traditional methods in cost and functionality and demonstrates unique solutions to emerging multimedia challenges. We further give an in-depth research agenda and discuss the open issues, limitations, and opportunities for the proposed research roadmap. We envision that Uno will be the foundation for future multimedia transport.

FSVVD: A Dataset of Full Scene Volumetric Video MMSys'23

Kaiyuan Hu, Yili Jin, Haowen Yang, Junhua Liu, Fangxin Wang
ACM Multimedia Systems, 2023.

Mobile Volumetric Video Streaming System through Implicit Neural Representation EMS'23

Junhua Liu, Yuanyuan Wang, Mallesham Dasari, Yan Wang, Yufeng Wang, Shuguang Cui, Fangxin Wang
ACM SIGCOMM EMS, 2023. Full version is in submission

A Large-Scale Dataset of Head and Gaze Behavior for 360-Degree Videos and a Pilot Study. MM 23

Yili Jin*, Junhua Liu*, Fangxin Wang, Shuguang Cui
ACM Multimedia, 2022.

Ebublio: Edge Assisted Multi-user 360-Degree Video Streaming IoTJ 23

Yili Jin, Junhua Liu, Fangxin Wang, Shuguang Cui
IEEE Internet of Things Journal, 2023.


Academic Service

  • Conference Reviewer: IEEE VR (2023-2024), ACM MM (2023-2024), UbiComp/ISWC (2023), CSCW (2023), CHI (2023-2025), ICML (2024), NeurIPS (2024), ICLR (2025), ICASSP (2023-2025)
  • Journal Reviewer: IEEE Internet of Things Journal, IEEE Open Journal of the Communications Society, IEEE Transactions on Image Processing, IEEE Transactions on Mobile Computing, IEEE Transactions on Wireless Communications.

Teaching

  • To be updated

Code and Dataset

Volumetric Video
Full Scene Dataset, Watching User Behavior Dataset
360-degree Video
Large-Scale Behavior Dataset, Edge-assisted Multi-user Streaming [IoTJ'23]

Talks


Experience

Carnegie Mellon University, WiSE Lab (Aug. 2023 - Dec. 2023)
Research Assistant, advised by Prof. Mallesh Dasari and Prof. Anthony Rowe.
Future Network of Intelligence Institute, INML (Dec. 2021 - Now)
Research Assistant, advised by Prof. Fangxin Wang and Prof. Shuguang Cui.
SenseTime Technology, ISP&Codec Group (Aug. 2022 - Now)
Research Intern, advised by Prof. Yan Wang and Dr. Yuanyuan Wang.
Harvard University, CAMLab (Mar. 2022 - Jun. 2022)
Research Assistant, advised by Profs. Eugene Y. Wang, Huazhe Xu and Zeyu Wang
Shenzhen Institute of Artificial Intelligence and Robotics (Jan. 2022 - Feb. 2022)
Research Intern, hosted by Prof. Yan Song and Prof. Huihuan Qian.

Miscellaneous

. This page has been visited for Free Hit Counters times.