Federated learning is a popular distributed learning approach for training a machine learning model without disclosing raw data. It consists of a parameter server and a possibly large collection of clients (e.g., in cross-device federated learning) that may operate in congested and changing environments. In this paper, we study federated learning in the presence of stochastic and dynamic communication failures wherein the uplink between the parameter server and client i is on with unknown probability pit in round t. Furthermore, we allow the dynamics of pit to be arbitrary. We first demonstrate that when the pit’s vary across clients, the most widely adopted federated learning algorithm, Federated Average (FedAvg), experiences significant bias. To address this observation, we propose Federated Postponed Broadcast (FedPBC), a simple variant of FedAvg. It differs from FedAvg in that the parameter server postpones broadcasting the global model to the clients with active uplinks till the end of each training round. Despite uplink failures, we show that FedPBC converges to a stationary point of the original non-convex objective. On the technical front, postponing the global model broadcasts enables implicit gossiping among the clients with active links in round t. In spite of the time-varying nature of pit, we can bound the perturbation of the global model dynamics using techniques to control gossip-type information mixing errors. Extensive experiments have been conducted on real-world datasets over diversified unreliable uplink patterns to corroborate our analysis.
@article{xiang2024empowering,author={Xiang, Ming and Ioannidis, Stratis and Yeh, Edmund and Joe-Wong, Carlee and Su, Lili},journal={IEEE Transactions on Signal Processing},title={Empowering Federated Learning With Implicit Gossiping: Mitigating Connection Unreliability Amidst Unknown and Arbitrary Dynamics},year={2025},volume={73},number={},pages={766-780},keywords={Servers;Uplink;Federated learning;Vehicle dynamics;Training;Stochastic processes;Distance learning;Computer aided instruction;Smart phones;Convergence;Federated learning;communication failures;gossiping;non-convex optimization;fault-tolerance},doi={10.1109/TSP.2025.3526782},}
2024
NeurIPS 2024
Efficient Federated Learning against Heterogeneous and Non-stationary Client Unavailability
Addressing intermittent client availability is critical for the real-world deployment of federated learning algorithms. Most prior work either overlooks the potential non-stationarity in the dynamics of client unavailability or requires substantial memory/computation overhead. We study federated learning in the presence of heterogeneous and non-stationary client availability, which may occur when the deployment environments are uncertain or the clients are mobile. The impacts of the heterogeneity and non-stationarity in client unavailability can be significant, as we illustrate using FedAvg, the most widely adopted federated learning algorithm. We propose FedAPM, which includes novel algorithmic structures that (i) compensate for missed computations due to unavailability with only O(1) additional memory and computation with respect to standard FedAvg, and (ii) evenly diffuse local updates within the federated learning system through implicit gossiping, despite being agnostic to non-stationary dynamics. We show that FedAPM converges to a stationary point of even non-convex objectives while achieving the desired linear speedup property. We corroborate our analysis with numerical experiments over diversified client unavailability dynamics on real-world data sets.
@inproceedings{xiang2024efficient,selected=true,booktitle={The 38th Annual Conference on Neural Information Processing Systems (NeurIPS)},title={Efficient Federated Learning against Heterogeneous and Non-stationary Client Unavailability},author={Xiang, Ming and Ioannidis, Stratis and Yeh, Edmund and Joe-Wong, Carlee and Su, Lili},year={2024},}
2023
Submitted
Federated Learning in the Presence of Adversarial Client Unavailability
Federated learning (FL) is a decentralized machine learning framework wherein a parameter server (PS) and a collection of clients collaboratively trains a model. Communication bandwidth is a scarce resource. In each round, the PS aggregates the updates from a subset of clients only. In this paper, we consider non-uniform and time-varying communication between the PS and the clients. Specifically, in each round t, the link between the PS and client i is active with probability pit, which is unknown to both the PS and the clients. This arises when the channel conditions are heterogeneous across clients and are changing over time. We show that when the pit’s are not uniform (i.e., not identical over i), Federated Average (FedAvg) – the most widely adopted FL algorithm – fails to minimize the global objective. Observing this, we propose Federated Postponed Broadcast (FedPBC) which is a simple variant of FedAvg; it differs from FedAvg in that the PS postpones broadcasting the global model till the end of each round. FedPBC converges to a stationary point. Moreover, the staleness is mild and there is no significant slowdown. Both theoretical analysis and numerical results are provided. On the technical front, postponing the global model broadcasts enables implicit gossiping among the clients with active links at round t. Consequently, we are able to control the perturbation of the global model dynamics caused by non-uniform and time-varying pit via the techniques of controlling gossip-type information mixing errors.
@inproceedings{xiang2023towards,selected=true,booktitle={2023 IEEE 62st Conference on Decision and Control (CDC)},title={Towards Bias Correction of FedAvg over Nonuniform and Time-Varying Communications},author={Xiang, Ming and Ioannidis, Stratis and Yeh, Edmund and Joe-Wong, Carlee and Su, Lili},year={2023},publisher={IEEE}}
Submitted
Federated SGD with Differentially Private and Byzantine Resilient One-Bit Compressors
State-of-health prognosis for lithium-ion batteries considering the limitations in measurements via maximal information entropy and collective sparse variational gaussian process
Ming Xiang, Yigang He, Hui Zhang, and 4 more authors
@article{xiang2020state,title={State-of-health prognosis for lithium-ion batteries considering the limitations in measurements via maximal information entropy and collective sparse variational gaussian process},author={Xiang, Ming and He, Yigang and Zhang, Hui and Zhang, Chaolong and Wang, Lei and Wang, Chenyuan and Sui, Chunsong},journal={IEEE Access},volume={8},pages={188199--188217},year={2020},publisher={IEEE},}