When I moved my code to a much more powerful machine with lots of memory, cpus and gpus, I got a GPU enabled TemsorFlow binary. That speed up the things, but still CPU was a bottleneck.
At that point TemsorFlow was complaining that the CPU supports instructions that the binary was not built with. Before building TF from source, I decided to update cuda and other Nvidia software. After a dance following outdated tutorials, I got it done. Speed improved, but still, while CPUs were at 100%, GPUs were barely loaded.
Then I relaised, that resising images on the fly might be the main reason of the slow down. This idea came to me on Friday afternoon. The end of the day was intense, but resizing images once and staving the sizes I need improved the speed. Still, the GPU is not 100% busy. Now I'm looking for Monday, to see the results!
Cpu during training or during preprocessing/data manipulation?
Training. I used PIL to resize images, so it's just CPU. Actually, Pillow-SIMD to get easy speedups.
If you train a prebuilt network like alexnet do you see the same behavior? Might help you determine if there's something in your particular implementation causing the slowdown.
I didn't try Alexnet. My network is much smaller, this behavior might be expected.
A beta setup of a Mastodon instance primary for family and friends.