Office365-REST-Python-Client icon indicating copy to clipboard operation
Office365-REST-Python-Client copied to clipboard

Direct upload to SharePoint from dataframe - either cannot overwrite or format error

Open mrranck opened this issue 3 years ago • 7 comments

Python 3.8 Office365-REST-Python-Client 2.3.11

I am able to upload pandas dataframes directly to a .csv file in an Azure blob and am seeking to do the same thing in Sharepoint. I've run into the issue that if I upload it this way:

file_content=df.to_csv(index=True,encoding='utf-8')
target_folder = ctx.web.get_folder_by_server_relative_url(folderurl)
target_file = target_folder.upload_file('filename', file_content).execute_query()

It works great but it won't overwrite an existing file. But if I upload it this way:

file_content=df2_daily.to_csv(index=True,encoding='utf-8')
target_folder = ctx.web.get_folder_by_server_relative_url(folderurl)
info = FileCreationInformation()
info.content = file_content
info.url='filename'
info.overwrite = True
target_file = target_folder.files.add(info)
ctx.execute_query()

The resulting .csv opens in with everything in the first row due to the line_terminator being generated as \r\n.

Thoughts?

mrranck avatar Mar 16 '22 11:03 mrranck

I have the same issue. Can you please answer that @vgrem ?

vietluu-collab avatar Apr 03 '22 07:04 vietluu-collab

@mrranck @vgrem can you please show me exactly what to put into the 'file name' in bold below?

target_folder = ctx.web.get_folder_by_server_relative_url(folderurl) info = FileCreationInformation() info.content = file_content info.url='filename' info.overwrite = True target_file = target_folder.files.add(info) ctx.execute_query()`

vietluu-collab avatar Apr 04 '22 02:04 vietluu-collab

It's the name of the file being uploaded to sharepoint. In this case it was a csv file called 'MostRecentDaily_export9.csv'

mrranck avatar Apr 04 '22 12:04 mrranck

@mrranck thank you very much

vietluu-collab avatar Apr 04 '22 14:04 vietluu-collab

@mrranck I have a suggestion. Instead of uploading the file, why don't you just overwrite file_content directly to a a file on sharepoint. However, this is only theory and I donnot know how to do this. Hi @vgrem, can you also show us the code to implement my idea?

vietluu-collab avatar Apr 04 '22 16:04 vietluu-collab

Hi @vietluu-collab,

as demonstrated in @mrranck example, the following property:

info.overwrite = True

instructs whether file content needs to be replaced if it already exist.

Another option to consider, namely to upload a file (and replace if existing one) is utilize Folder.upload_file method (syntactic sugar to add method but without the need to instantiate FileCreationInformation object):

path = "./Report.xlsx"
with open(path, 'rb') as content_file:
    file_content = content_file.read()

target_folder = ctx.web.get_folder_by_server_relative_path("/sites/team/Shared Documents")
name = os.path.basename(path)
target_file = target_folder.upload_file(name, file_content).execute_query()

vgrem avatar Apr 04 '22 18:04 vgrem

Thanks! I just tried using the Folder.upload_file command but it didn't overwrite the existing file. I see an updated timestamp in the sharepoint folder, but the file contents are still from the old version.

mrranck avatar Apr 04 '22 19:04 mrranck

I know this is an old thread, but I came across this due to the issue from the OP regarding the CSV saving with the data all on the first row once uploaded to Sharepoint. I didn't see a specific reference to a solution other than an alternative way to upload the file, so I wanted to share mine.

Basically, if you add the .encode() function to the results of the .to_csv() function, it will properly convert the resulting CSV to a binary string, which is what Sharepoint is looking for in the file content. The default line terminators \r\n, won't be interpreted as special characters if the file content is delivered as a native string type. Even though the .to_csv() function includes an option to specify encoding as a parameter, it will still only output a native string type (not binary string).

In my example below, I used .encode('utf-8') due to additional unicode characters in the contents. This fixed the data all on a first line issue for me.

#Retrieve Dataframe from source (Snowflake)
snow_ctx = connect_snowflake()
df = snowflake_query_df(snow_ctx, """SELECT * FROM TABLE1;""")
snow_ctx.close	

#Convert DF to CSV encoded to utf-8 for Sharepoint
fileContent= df.to_csv(quoting = csv.QUOTE_NONNUMERIC, index=False).encode("utf-8")

#Establish connection to Sharepoint target folder
sp_auth = AuthenticationContext(siteurl) 
sp_auth.acquire_token_for_user(username, password)
sp_client = ClientContext(siteurl, sp_auth) 

#Upload file content to designated folder
resp = sp_client.web.get_folder_by_server_relative_url(remotepath).upload_file(fileName, fileContent).execute_query()

eciccarone avatar Feb 07 '23 01:02 eciccarone