Monitor job

This page explains how to monitor a submitted job

After you submit any job on our platform - training or validation - you can view the live metrics of your latest submitted job on our website in the /livetraining route (the route is same for training and validation both). The /livetraining route also allows you to download job as a ZIP and abort the submitted job.

Training

Metrics available:

  • Status - Status tells you the stage of the submitted job including INITIALIZATION, IN PROCESS, TRAINING, ERROR OCCURED, UPLOADING RESULTS, FINISHED.

  • Site-wise steps - After the training process starts, you will see a table on the dashboard indicating the steps completed versus total steps for each client site.

  • Validation accuracy vs epochs - Under the Show Metrics button, you can see the tensorboard metrics for global model's validation accuracy across each site by epoch.

  • Validation loss vs steps - Under the Show Metrics button, you can see the tensorboard metrics for the validation loss against local steps for each site.

  • Communication Logs - Under the Show Logs button, you can see the live communication logs of the federated learning process which gives you an insight on the current state of the process

Validation

Metrics available:

  • Status - Status tells you the stage of the submitted job including INITIALIZATION, IN PROCESS, VALIDATION, ERROR OCCURED, UPLOADING RESULTS, FINISHED.

  • Communication Logs - Under the Show Logs button, you can see the live communication logs of the federated validation process which gives you an insight on the current state of the process

Last updated