SnapML Tutorial Model Training
Hi,
I am trying to use the sample code in the tutorial here: SnapML Overview - Lens Studio by Snap Inc. (snapchat.com)
I am trying to train a model using the default configuration with the style_image.jpg that snapchat provides in the tutorial. When I run the script, everything works up to cell that says "Now we can run the main training loop." It is supposed to enter the training loop here. I see that a progress bar is created using tqdm, but the progress bar never advances beyond 0%. I am using my own computer to train the model (I am using my GPU/cuda to do the training). It seems odd it would be stuck at 0 percent for 5+ hours. It never prints progress or updates the progress bar. It seems to run indefinitely. I am not sure: does this sound like a bug or is this is normal behavior?
I have tried the following configuration to make the training a little bit faster:
DATASET_SIZE = 100 # How many images use for training
NUM_TRAINING_STEPS = 10 # number of steps for training, longer is better
LOGGING_FREQUENCY = 1 # log validation every N steps
BATCH_SIZE = 1 # number of images per batch
NUM_WORKERS = 1 # number of CPU threads available for image preprocessing
With these settings I get the error:
PicklingError: Can't pickle <function <lambda> at 0x000002B42A59B9D0>: attribute lookup <lambda> on __main__ failed
The error occurs at this line:
for batch_num, batch in enumerate(pbar):
I am pretty sure this is a bug, if I re-download the script from the snapchat website and replace all of the model training code in cell with "Now we can run the main training loop" with a print statement, the progress bar still never advances.
I still see that the progress bar freezes, something about "enumerate(pbar)" is not working.
For anyone else who finds this in the future, I was able to get it working by converting the jupyter notebook to a script, moving all of the training into "main" and keeping the global variables in the script body. My GPU has 24 gb of memory, and my limit is 4 threads with 16 image batches.