AI reading

reading groups
distributed training
stas00Updated Nov 29, 2023
Training LLMs at Scale - Deepak Narayanan | Stanford MLSys #83
Episode 83 of the Stanford MLSys Seminar Series! Training Large Language Models at Scale Speaker: Deepak Narayanan Abstract: Training LLMs efficiently is challenging for a few reasons: training can require yottaFLOPs of compute, and accelerators have limited memory capacity making it impossible to fit large models on even a multi-GPU server. Consequently, new methods of model parallelism such as tensor and pipeline parallelism have been proposed. Unfortunately, naïve usage of these methods leads to scaling issues at thousands of GPUs. In this talk, I describe various systems innovations incorporated into Megatron-LM (https://github.com/nvidia/megatron-lm) that allow us to run training iterations for models with up to a trillion parameters on thousands of GPUs. Bio: Deepak is a Senior Applied Deep Learning Research Scientist in the ADLR group at NVIDIA, where he builds software systems to more efficiently train and serve LLMs. He graduated from Stanford with a Ph.D. in Computer Science in September 2021, where he was advised by Prof. Matei Zaharia. -- Stanford MLSys Seminar hosts: Simran Arora, Dan Fu Twitter: https://twitter.com/simran_s_arora https://twitter.com/realDanFu​ -- Check out our website for the schedule: http://mlsys.stanford.edu Join our mailing list to get weekly updates: https://groups.google.com/forum/#!forum/stanford-mlsys-seminars/join #machinelearning #ai #artificialintelligence #systems #mlsys #computerscience #stanford
Training LLMs at Scale - Deepak Narayanan | Stanford MLSys #83
open source projects
NVIDIAUpdated Nov 29, 2023
NVIDIAUpdated Nov 29, 2023
NVIDIAUpdated Nov 29, 2023
neural achitecture design
loss spikes?
great resume example