I picked up Keras, so the code for the network itself is simple. Complexity is around: dataset preparation, evaluation and model selection.
Initially I worked on my laptop with an extremely small values, for example, images were resized to 128 by 128, which feels too small. However, I managed to get reasonable performance numbers.
When I moved my code to a much more powerful machine with lots of memory, cpus and gpus, I got a GPU enabled TemsorFlow binary. That speed up the things, but still CPU was a bottleneck.
Then I relaised, that resising images on the fly might be the main reason of the slow down. This idea came to me on Friday afternoon. The end of the day was intense, but resizing images once and staving the sizes I need improved the speed. Still, the GPU is not 100% busy. Now I'm looking for Monday, to see the results!
The server crashed shortly after I left, so no great results.
vmtouch is a nice tool to cache file content into memory.
If you train a prebuilt network like alexnet do you see the same behavior? Might help you determine if there's something in your particular implementation causing the slowdown.
I didn't try Alexnet. My network is much smaller, this behavior might be expected.
The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!