![]() ![]() In my experience learning from scratch leads to better results, but it is much costly than the others especially regarding time and resources consumption. The output layer can also be different and have some of it frozen regarding the training. You can also have a base model working for a similar task and then freezing some of the layers to keep the old knowledge when performing the new training session with the new data. Usually, we change the learning rate to a smaller one, so it does not have a significant impact on the already adjusted weights. Then, we train the same model with the remaining 10%. In Fine-tuning, an approach of Transfer Learning, we have a dataset, and we use let's say 90% of it in training. Then, we train the same model with another dataset that has a different distribution of classes, or even with other classes than in the first training dataset). In Transfer Learning or Domain Adaptation, we train the model with a dataset. Fine-tuning is one approach to transfer learning where you change the model output to fit the new task and train only the output model. Transfer learning is when a model developed for one task is reused to work on a second task. ![]() I just don't know if I am stuck, or I am just doing something horribly wrong. I don't know if this is the proper platform for discussion on this topic, perhaps you know a slack or gitter channel which this belongs to. It's likely that I am doing anything wrong or that I get the purpose of transfer learning wrong (I thought that it should speed up learning, as most layers aren't trainable and therefore no calculation is done) ![]() It seems, that transfer learning or fine tuning on these detectors doesn't have any advantage at all. I also found pretrained weights for traffic sign classification with VGG16 which I thought should be the ideal base for transfer learning on this topic, but this detector was the worst performing so far (loss stagnated at 11, even when learning rate is changed and after 100 epochs it overfitted). I've implemented my detector not on my own, but based upon an original SSD port to Keras/Tensorflow (from here) and already trained it with different variations (Belgium from scratch, pretrained with MS COCO, Transfer to Germany, Convolution frozen, fine tuned to Germany) and after weeks of Training now I can say, that Belgium with random weights from scratch is converging fastest (after only 40 epochs/2 days my custom SSD loss function is down to a value of 3) while all other variations need much more time, more epochs and loss is never falling below a value of 9.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |