Streamlit is an amazing tool that provides an easy-to-use framework for developing machine learning and data science applications. However, it has serious limitations when being integrated with data that requires an API OAuth 2.0 protocol for authentication. OAuth 2.0 is a delegated authorization framework for REST/APIs that allows apps to obtain limited access to a user's data without giving away a user's password. After completing the authentication process, an API usually returns the user to the registered redirect URI. Since Streamlit programs are executed linearly, it requires some engineering to connect the beginning and end of the authentication process within a single Streamlit site.
Problem to solve: Integrate API redirect URI into a Streamlit application to create a singular program for authenticating a user and accessing their data.
Example use case: I have created an application to access and download a user's videos off of Vimeo using their API. Each user needs to authenticate their own Vimeo account to access their videos.
Here is the login page created using Streamlit that is hosted on the Streamlit Cloud:
Note: It is possible for a program to be hosted locally that is integrated with an API that directs users to a redirect URI - however, this requires temporarily exposing your localhost to the internet and using a webhook. This can be accomplished by combining services like Ngrok and running a flask script.
For security, session states are used and all backend data is saved in an encrypted Google Sheet. To find out more about how to create a multipage Streamlit app using session states and integrating a backend Google Sheet, visit the Streamlit site (or we'd be happy to get you moving in the right direction - info@depotanalytics.co).
Also, note that Streamlit supports multi-page apps with separate files, but that forces some conventions that are undesirable for this use case.
Here is a simple example of the code required to run a multi-page Streamlit app with session states:
if __name__ == "__main__":
if "page" not in st.session_state:
st.session_state["page"] = "home"
if st.session_state["page"] == "home":
home()
if st.session_state["page"] == "terms_and_conditions":
terms_and_conditions()
if st.session_state["page"] == "user_info":
user_info()
if st.session_state["page"] == "login_page":
vimeo_login()
if st.session_state["page"] == "select_videos":
select_videos()
if st.session_state["page"] == "end_of_app":
end_of_app()
If a new user wants to use the application, they are directed to the terms_and_condtions page and asked to connect their Vimeo account:
After a user clicks "here", they are directed to Vimeo's site to provide consent that this Streamlit app can access their Vimeo data.
After clicking "Allow", the redirect URI registered with the Vimeo API is opened in a new tab with the authentication code attached to the end of the URL. For example:
https://depot-analytics-vda.streamlit.app/?code=f095e36678a076dc1d2a0d40dcb78f027c3c4567
After being directed to the redirect URI, the difficulty comes in because all session state variables before authentication have not been transferred to the new tab. Not to mention, grabbing the code from the URL to access the Oauth2 token is another barrier to cross. Let's tackle these one at a time:
Solution 1: Accessing the query parameters from the redirect URI
url_params = st.query_params()
# url_params captures the url parameters from the redirect uri and puts them into a dictionary
if url_params:
code = url_params["code"][0]
st.session_state.code = code
st.session_state["page"] = "login_page"
The above function reads the current URL and generates a dictionary if any query parameters are present. If there aren't any parameters, then an empty dictionary is generated and the code within the if-statement is not run. To learn more, check out this Streamlit guide.
Note: Integrating the above code into a larger home function allows the home page to load when there are not any query parameters and the login_page to load when there are (See below).
After acquiring the code, it can be exchanged for the user's access token. The access token can then be used to get the current user's information stored in the Vimeo API.
Solution 2: Maintaining variables between two pages
def home():
if "gs_instance" not in st.session_state:
st.session_state.gs_instance = GoogleSheets.from_default_config()
else:
gs_instance = st.session_state.gs_instance
# Instantiate all session state variables each time the home page
# is loaded
# All session state variables (some removed for brevity)
if "primary_email" not in st.session_state:
st.session_state.primary_email = ""
if "user_id" not in st.session_state:
st.session_state.user_id = ""
if "access_token" not in st.session_state:
st.session_state.access_token = ""
if "code" not in st.session_state:
st.session_state.code = 0
url_params = st.query_params()
if url_params:
code = url_params["code"][0]
st.session_state.code = code
st.session_state["page"] = "login_page"
Once the login_page function is run, the variables from the current user established in the original tab need to be connected to the redirect page. One way to accomplish this is to store a user's backend information in a Google sheet and use a variable that is unique to each user to connect them.
For this site, the unique user ID generated by Vimeo is used. Each time someone completes the Vimeo authentication process, the information from the Vimeo API is compared to the information stored in the Google sheet. If the user ID is not stored in the sheet, then it is a user's first time using the Streamlit site, and a new row is populated in the sheet. If they are a returning user, then the corresponding row is pulled and updated.
Since the session state variables have been re-instantiated at the top of the home function and set to blank values, once the access token has been exchanged for the user's information, the session state variables can be repopulated with this information.
user_series: pd.DataFrame = gs_instance.locate_user(
search_id=str(user_id), search_col="user_id"
)
if user_series.empty:
user_dictionary = {
"primary_email": primary_email,
"code": str(code),
"token": token
}
gs_instance.add_user(user_dictionary)
else:
user_series["primary_email"] = primary_email
user_series["code"] = code
user_series["token"] = token
gs_instance.update_user(user_series)
# Reload an instance of google sheet to have updated
gs_instance_updated = GoogleSheets.from_default_config()
st.session_state.gs_instance = gs_instance_updated
df_current_user: pd.DataFrame = gs_instance_updated.locate_user(
search_id=str(user_id), search_col="user_id"
)
access_token = df_current_user["token"].iloc[0]
user_id = df_current_user["user_id"].iloc[0]
if access_token != "error":
video_data = get_video_metadata(access_token=access_token)
st.session_state.access_token = access_token
st.session_state.user_id = user_id
st.session_state.primary_email = primary_email
st.balloons()
button_text = "Successfully connected to your Vimeo account - click here to continue"
go_to_user_info = st.button(button_text)
Have additional questions? Trying to solve similar problems in your organization? Let us know and we'll help out - info@depotanalytics.co
Comments