![]() ![]() Why not just use a smaller batch size instead of GradientAccumulation? What is the rule of thumb for picking batch sizes? How about adjusting learning rate according to the batch size?. ![]() More questions: it should be count >= 64 in the code above when doing GradientAccumulation lr_find uses batch size from the DataLoader.What is the implication of using GradientAccumulation? How much difference is the numeric result between using GradientAccumulation and not? What is the main cause for the difference?.What is the problem of using smaller batch size? (smaller batch size, larger volatility of learning rate and weights) How can we make the model train in smaller batch size as if it is in large batch size? How to explain GradientAccumulation in code?.What if a model causes a crash problem of cuda out of memory? What is GradientAccumulation? What is integer divide? ( //).How much memory does convnext-small model take? Which line of code does Jeremy use to find out the GPU memory used up by the model? Which two lines of code does Jeremy use to free unnecssarily occupied memories GPU so that you don’t need to restart the kernel to run the next model? Jeremy then trained different models to see how much memories they used up.How did Jeremy use a 24G GPU to find out what can a 16G GPU do? How did Jeremy find out how much GPU memory will a model use? How did Jeremy choose the smallest subgroup of images as the training set? Will training the model longer take up more memory? (No) So, smallest training set + 1 epoch training can quickly tell us how much memory is needed for the model.How big is Kaggle GPU? Do you have to run notebooks on kaggle sometimes for example code competitions? Why it is good and fair to use Kaggle notebook to win leaderboard?.What are the benefits of using larger models? What are the problems of larger models? (use up GPU memory as GPU is not as clever as CPU to find ways to free itself so large model needs very expensive GPU) What can we do about it when GPU out of memory? first, to restart the notebook then Jeremy is about to show us a trick to enable us to train extra large models on Kaggle, Wow!.Review of the notebook Road to Top part 2 and congrats to fastai students beat Jeremy on 1st and 2nd.In this lecture we will focus on tweaking first and last layers, in the next few weeks on tweaking middle part of the neuralnet. We have explored the simplest neural net with fully connected linear layers in earlier lectures. ![]()
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |