In this blog post, we’ll guide you through creating your first custom experiment using FLEET and Hydra, a powerful configuration management system. Hydra simplifies the management of experimental parameters by organizing them into Python dataclasses and YAML configuration files. This approach ensures a clean separation between default values (in the code) and user-defined customizations (in YAML).
By the end of this post, you'll know:
Hydra is a Python library for managing configurations. In FLEET, configurations are defined as Python dataclasses and grouped into logical sections such as dataset, fl_client, fl_server, and net. These dataclasses act as the single source of truth for all default values.
Each configuration is defined in Python as a dataclass. Below is a list of all available configurations and their default values:
DatasetConfig)Defined in common/dataset_utils.py, this configuration controls dataset-related parameters.
@dataclass
class DatasetConfig:
path: str = "static/data" # Directory to store datasets
name: str = "cifar10" # Dataset name (e.g., "cifar10", "imdb")
partitioner_cls_name: str = "IidPartitioner" # Partitioner class name
partitioner_kwargs: dict = field(default_factory=dict) # Partitioner arguments
force_create: bool = False # Recreate dataset if already exists
test_size: float = 0.2 # Fraction of test data
server_eval: bool = True # Enable server-side evaluation
train_split_key: str = "train" # Key for the training split
test_split_key: str = "test" # Key for the test split
ClientConfig)Defined in flcode_pytorch/utils/configs.py, this configuration controls federated learning (FL) client settings.
@dataclass
class ClientConfig:
log_to_stream: bool = True # Enable console logging
logging_level: str = "INFO" # Logging level: DEBUG, INFO, etc.
train_batch_size: int = 32 # Training batch size
val_batch_size: int = 128 # Validation batch size
local_epochs: int = 1 # Number of local training epochs
learning_rate: float = 0.001 # Learning rate for optimization
log_interval: int = 100 # Steps between logging metrics
collect_metrics: bool = False # Enable client-side metric collection
collect_metrics_interval: int = 5 # Metric collection interval (seconds)
server_address: str = "tcp://localhost:5555" # FL server address
zmq: Dict[str, Any] = field(default_factory=lambda: {
"enable": False, # Enable ZMQ communication
"host": "localhost",
"port": 5555
})
extra: Dict[str, Any] = field(default_factory=dict) # Additional client parameters
ServerConfig)Defined in flcode_pytorch/utils/configs.py, this configuration governs the FL server's behavior.
@dataclass
class ServerConfig:
log_to_stream: bool = True # Enable console logging
logging_level: str = "INFO" # Logging level: DEBUG, INFO, etc.
strategy: str = "FedAvg" # FL strategy (default: FedAvg)
min_fit_clients: int = 1 # Minimum clients for training
min_evaluate_clients: int = 1 # Minimum clients for evaluation
min_available_clients: int = 1 # Minimum available clients
num_rounds: int = 1 # Number of FL rounds
fraction_fit: float = 1.0 # Fraction of clients to fit
fraction_evaluate: float = 1.0 # Fraction of clients to evaluate
server_eval: bool = False # Enable server-side evaluation
val_batch_size: int = 128 # Validation batch size
server_param_init: bool = True # Initialize server parameters
stop_by_accuracy: bool = False # Stop training by accuracy
accuracy_level: float = 0.8 # Accuracy level to stop training
collect_metrics: bool = False # Enable server-side metric collection
collect_metrics_interval: int = 60 # Metric collection interval (seconds)
zmq: Dict[str, Any] = field(default_factory=lambda: {
"enable": False, # Enable ZMQ communication
"host": "localhost",
"port": 5555
})
extra: Dict[str, Any] = field(default_factory=dict) # Additional server parameters
NetConfig)Defined in containernet_code/config.py, this configuration handles the network topology and background traffic.
@dataclass
class NetConfig:
topology: TopologyConfig = field(default_factory=TopologyConfig) # Topology settings
fl: FLClientConfig = field(default_factory=FLClientConfig) # FL client settings
bg: BGConfig = field(default_factory=BGConfig) # Background traffic settings
sdn: SDNConfig = field(default_factory=SDNConfig) # SDN settings
TopologyConfig)@dataclass
class TopologyConfig:
source: str = "topohub" # Topology source: topohub/custom
topohub_id: Optional[str] = None # Topohub ID for predefined topologies
custom_topology: Dict = field(default_factory=lambda: {
"path": "", "class_name": ""
}) # Path to custom topology class
link_util_key: str = "deg" # Key for link utilization (degree)
link_config: Dict = field(default_factory=dict) # Link-specific configurations
switch_config: Dict = field(default_factory=lambda: {
"failMode": "standalone", "stp": True
}) # Switch-specific configurations
extra: Dict = field(default_factory=dict) # Additional topology parameters
BGConfig)@dataclass
class BGConfig:
enabled: bool = False # Enable background traffic
image: str = "bg-traffic:latest" # Docker image for background traffic
network: str = "10.1.0.0/16" # IP range for background traffic
clients_limits: Dict = field(default_factory=lambda: {
"cpu": 0.5, "mem": 256
}) # CPU/memory limits for clients
generator_config: Dict = field(default_factory=lambda: {
"name": "iperf" # Traffic generator (e.g., iperf)
})
pattern_config: Dict = field(default_factory=lambda: {
"name": "poisson", # Traffic pattern (e.g., Poisson)
"parallel_streams": 1,
"max_rate": 100.0, "min_rate": 1.0
})
extra: Dict = field(default_factory=dict) # Additional background traffic settings
Before customizing, let’s run the default experiment provided in main.yaml. This uses:
Topohub topology.Run the following command:
sudo .venv/bin/python main.py
Hydra allows you to override any parameter in the Python dataclasses by modifying the YAML files. For example:
To use the IMDB sentiment classification dataset, update static/config/dataset/default.yaml:
name: "imdb"
To increase the number of training rounds from 30 to 50, edit static/config/fl_server/default.yaml:
num_rounds: 50
Hydra makes it easy to override any parameter directly from the command line without editing YAML files. For example:
To use the IMDB sentiment classification dataset instead of CIFAR-10:
sudo .venv/bin/python main.py dataset.name=imdb
To increase the number of training rounds from 30 to 50 and add more clients:
sudo .venv/bin/python main.py fl_server.num_rounds=50 net.fl.clients_number=20
To use a different Topohub topology:
sudo .venv/bin/python main.py net.topology.topohub_id=ibm/10/0
In the next post, we’ll dive deeper into dataset management, including how to partition data for IID and non-IID experiments. Stay tuned!
Previous: Introduction to FLEET: Federated Learning Testbed
Next: