Workshop Description
For MLOps engineers and AI platform security teams. Covers quantum cryptographic exposure of model weights, training checkpoints, NCCL/MPI cluster communications, model registry signing, and PQC migration using FIPS 203/204/205 for distributed AI training infrastructure.
Training a frontier LLM costs tens of millions of dollars in GPU compute. The resulting model weights are signed with ECDSA or RSA, stored encrypted in cloud object stores, and transmitted across multi-node GPU clusters using NCCL AllReduce or MPI collectives. Every one of these cryptographic operations is vulnerable to a quantum computer running Shor's algorithm. The harvest-now-decrypt-later threat is particularly acute for model weights: an adversary recording encrypted checkpoint files today could decrypt them once quantum capability arrives, gaining access to proprietary model architectures and training data encoded in the weights. This workshop maps the full cryptographic dependency chain of a distributed training pipeline, from data ingest through GPU cluster communications to checkpoint storage and model registry. We then build a phased PQC migration plan using FIPS 203 (ML-KEM) for key encapsulation, FIPS 204 (ML-DSA) for model signing, and FIPS 205 (SLH-DSA) for long-lived provenance attestation, addressing the specific performance constraints of GPU cluster communications and large-artefact signing.
What participants cover
- Model artefact cryptographic exposure: weight signing (ECDSA/RSA), checkpoint encryption, and container image signing vulnerability to Shor's algorithm
- Distributed training cluster security: NCCL AllReduce, MPI collectives, and parameter server gRPC TLS under quantum threat
- Model registry integrity: cosign/Notary v2 signing, SBOM attestation, and supply chain verification for AI artefacts
- FIPS 203/204/205 for AI: ML-KEM for checkpoint encryption, ML-DSA for model signing, SLH-DSA for long-lived provenance
- Performance constraints: signature size impact on model registry throughput, key encapsulation overhead for large-artefact storage
- Compliance alignment: NIST AI RMF model integrity requirements, EU AI Act Article 15, cloud GPU platform PQC readiness