SnapML Tutorial Model Training

Hi,

I am trying to use the sample code in the tutorial here: SnapML Overview - Lens Studio by Snap Inc. (snapchat.com)

I am trying to train a model using the default configuration with the style_image.jpg that snapchat provides in the tutorial. When I run the script, everything works up to cell that says "Now we can run the main training loop." It is supposed to enter the training loop here. I see that a progress bar is created using tqdm, but the progress bar never advances beyond 0%. I am using my own computer to train the model (I am using my GPU/cuda to do the training). It seems odd it would be stuck at 0 percent for 5+ hours. It never prints progress or updates the progress bar. It seems to run indefinitely. I am not sure: does this sound like a bug or is this is normal behavior?

I have tried the following configuration to make the training a little bit faster:

DATASET_SIZE = 100      # How many images use for training
NUM_TRAINING_STEPS = 10 # number of steps for training, longer is better
LOGGING_FREQUENCY = 1   # log validation every N steps
BATCH_SIZE = 1          # number of images per batch
NUM_WORKERS = 1         # number of CPU threads available for image preprocessing

With these settings I get the error:

PicklingError: Can't pickle <function <lambda> at 0x000002B42A59B9D0>: attribute lookup <lambda> on __main__ failed

The error occurs at this line:

for batch_num, batch in enumerate(pbar):

Permanently deleted user

Snap Lens Network Member Lens Studio Team

February 19, 2021 17:26
Edited

I am pretty sure this is a bug, if I re-download the script from the snapchat website and replace all of the model training code in cell with "Now we can run the main training loop" with a print statement, the progress bar still never advances.

for batch_num, batch in enumerate(pbar):
    print("test")

I still see that the progress bar freezes, something about "enumerate(pbar)" is not working.

February 20, 2021 21:10

For anyone else who finds this in the future, I was able to get it working by converting the jupyter notebook to a script, moving all of the training into "main" and keeping the global variables in the script body. My GPU has 24 gb of memory, and my limit is 4 threads with 16 image batches.

SnapML Tutorial Model Training

We're here to help! We just need a little bit of information...

What system do you run Lens Studio on?

Follow the next steps to help us solve your issue:

Thanks for submitting this issue.

We'll try to resolve this issue as soon as possible. Thanks for letting us know about it!