top of page

Eating up all your memory in Python? How to use ChatGPT to help out.



Recently, I was working on a project where I needed to optimize a hefty real-time AI edge compute task. The iterations were running around 15 seconds each at steady state. After some optimizations, the task was running in less than a second.


But... after about 150 iterations, the program slowed to a crawl. 15, 30, 60, 95 seconds per iteration. After using the jtops utility (a fantastic tool if you use Nvidia Jetson devices), it was pretty clear that my Python memory management was bad.


So I went to ChatGPT for help. Here's how you can too by giving the correct context and questions.


Set up profiling


Really simple here - import the python memory profiler:

from memory_profiler import profile

and then tag on the profiler decorator to the functions you are interested in:

@profile
def process_frame(
    stop_processes: mp.Value = None,
    queue_of_interesting_data: mp.Queue = None,
    model_weights_path: str = None,
    function_to_make_a_pretty_graph: Callable[[float], None] = None,
    data_input_fpath: Path = None,
):
...code...
return

Execute the script


Then run your script (on a jetson nano if you are still living in the late 20-teens like me)

(.venv) jetson@nano:~/dev/$ python3.8 scripts/my_nn_script.py  --weights_path models/my_model.tar --input_frame_path ../rt_data/ > profile.log

Once your script is completed, the end of profile.,log will spit out something that looks like this:

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   333    278.1 MiB    278.1 MiB           1   @profile
   334                                         def process_frame(
   335                                             stop_processes: mp.Value = None,
   336                                             queue_of_interesting_data: mp.Queue = None,
   337                                             model_weights_path: str = None,
   338                                             function_to_make_a_pretty_graph: Callable[[float], None] = None,
   339                                             data_input_fpath: Path = None,
   340                                         )
   
...
   369                                                 # TODO: optimize this code
   370                                                 # create a simple map of uuids to the frames contained therein
   371   2287.5 MiB  -3613.1 MiB          50           check_map = {}
   372   2287.5 MiB -19290.5 MiB         204           for id in identities.values():
   373   2287.5 MiB -102576.4 MiB         919               check_map[id.uuid] = set([obs.frame.uuid for obs in id.observations])
   374   2287.5 MiB -15677.2 MiB         154               if id.uuid not in stale_identity_map:
   375   2287.5 MiB   -658.0 MiB           9                   stale_identity_map[str(id.uuid)] = [
   376   2287.5 MiB  -2701.0 MiB          38                       set([obs.frame.uuid for obs in id.observations]),
   377   2287.5 MiB   -658.0 MiB           9                       0,
   378                                                         ]
   ...
   421   2112.6 MiB   -168.2 MiB           1       return

Ask AI about It


Go to chat.openai.com (honestly, after a few hours with it, bard.google.com doesn't seem worth your time at this point. Less accurate and thorough. Probably a blog for another day though. Forgive me, google SEO overlords...)


The most effective question I found after some trials was:

how do I optimize this python code given this profiling output from using the memory_profile python library? <paste profile.log output for a function here>

The response that I got was:


Based on the profiling output, it seems that the main bottleneck in the process_frame function is in lines 372-373, where a dictionary is being created with UUIDs as keys and sets of UUIDs as values. This operation is causing a significant increase in memory usage.
To optimize this code, one possible approach is to use a defaultdict instead of a regular dictionary to avoid checking if the key already exists in the dictionary. Another approach is to use a set comprehension instead of a for loop to create the set of UUIDs for each identity.
Here's an example of how you could modify lines 372-373 to use a defaultdict and a set comprehension:

and then it dropped this code sample:

import collections

# create a defaultdict with sets as default values
check_map = collections.defaultdict(set)

# use a set comprehension to create the set of UUIDs for each identity
for id in identities.values():
    check_map[id.uuid].update(obs.frame.uuid for obs in id.observations)

That's undoubtedly a good suggestion to save memory! Additionally, the response is without context on the functions being called underneath the hood. If I give more information to ChatGPT, I would expect a dramatically improved suggestion about my data structures and implementations.


Hope this helps!


If you need help moving large ML/AI models to the real time edge, shoot us an email at info@depotanalytics.co for a free consultation. We'd love to help!

תגובות


bottom of page