simple-salesforce
simple-salesforce copied to clipboard
Simpler output format for successful queries
I've found that when I want the output of an SOQL query in Python, it's usually to get down and dirty with the actual values of the data. I don't typically need the "metadata" included in Salesforce.queryall()
's output.
I'd love to contribute code to give simple_salesforce
an option to return "ready-to-crunch" data, but I'm not sure if there's anyone out there but me looking for something like this.
I haven't finished this (never did get it working w/ OrderedDict
output), though it's "good enough" for personal use since I'm fine with processing normal dict
s (I now write code like allcontacts = stripJunkSimpleSalesforce(sf.query_all(queryString))
, but before I work on "tidying it up" for sharing with the world (e.g. adding some error handling for when there was an error returning the data, figuring out how to return it as an OrderedDict for consistency with the normal output, etc.), I guess I'm curious:
Is a function like this something that the simple_salesforce
user community would be interested in seeing as part of native simple_salesforce
functionality?
(Either as an added API call that can be wrapped around return results the way I am, or perhaps as something that gets called if an optional parameter is passed to a query()
or query_all()
command.)
My code's functionality is to:
- Strip out excessive nesting, making everything a much simpler "list of dicts, perhaps with some more lists of dicts inside where there were SELECT subqueries"
- Make "lookup fields" from a "parent" record simply a key-value pair in the record that has the lookup -- more like the way you type the SOQL query in the first place
- (i.e. if it's a "Contact" query and you said "Account.Name" in the SOQL query, simply let the "Contact" dict have an "Account.Name" key, rather than burying it in a "Name" property of a separate dict under 'Account')
- Strip out any key-value pairs that aren't data as requested in the SOQL query
My code so far is:
def stripJunkSimpleSalesforce(structure):
from collections import OrderedDict
import json
def fixdict(struc, keyChainList, parentListRecord, parentListAddColsAndVals, parentListKillCols, passingContext):
if 'records' in struc.keys() and type(struc['records']) is list:
parentListRecord[keyChainList[0]] = struc['records']
fixlist(parentListRecord[keyChainList[0]], 'fixDict')
else:
if 'attributes' in struc.keys():
struc.pop('attributes')
for k in struc.keys():
if k != 'attributes':
if type(struc[k]) in [dict, OrderedDict]:
parentListAddColsAndVals, parentListKillCols = fixdict(struc[k], keyChainList + [k], parentListRecord, parentListAddColsAndVals, parentListKillCols, 'fixDict')
if type(struc[k]) not in [None, list, dict, OrderedDict] and len(keyChainList) > 0:
parentListAddColsAndVals['.'.join(keyChainList+[k])] = struc[k]
if passingContext == 'fixDict':
parentListKillCols.add(keyChainList[0])
return parentListAddColsAndVals, parentListKillCols
def fixlist(struc, passingContext):
if type(struc) is list and len(struc) > 0:
for x in struc:
if type(x) in [dict, OrderedDict]:
listAddColsAndVals, listKillCols = fixdict(x, [], x, {}, set(), 'fixList')
if len(listAddColsAndVals) > 0:
for k,v in listAddColsAndVals.items():
x[k] = v
if len(listKillCols) > 0:
for k in listKillCols:
if k in x.keys():
x.pop(k)
return
outerStructure = None
# I can't get this working for Ordered Dicts, so giving up and having it return a plain dict. Leaving some junk from earlier attempt behind.
if type(structure) in [dict, OrderedDict] and 'records' in structure.keys() and type(structure['records']) is list:
if type(structure) is OrderedDict:
outerStructure = json.loads(json.dumps(structure))['records']
else:
outerStructure = structure['records']
if type(outerStructure) is list:
fixlist(outerStructure, 'outermost')
return outerStructure
I would very much love to see this function in the package. Right now it just returns a huge OrderDict with dirty nested and nested data. Ideally what I want to do is get the data and load it into pandas.DataFrame
You can just do a json.dumps(result)
to get the JSON structure back out of the OrderedDict
. I have seen a couple examples on StackOverflow of using pandas or another library to ingest that information.
You can just do a
json.dumps(result)
to get the JSON structure back out of theOrderedDict
. I have seen a couple examples on StackOverflow of using pandas or another library to ingest that information.
json.dumps(result)
returns a JSON string, but still the junk response is there. I use this function to clean up the response:
def parse_sf_records(nested_dict):
"""Recursively parse the nested dictionaries returned by Salesforce Simple API library,
:param nested_dict: Nested dictionary object
:return: Flattened dictionary representing record
"""
for k, v in nested_dict.items():
if k == 'attributes':
nested_dict.pop(k)
elif isinstance(v, dict):
clean_dict = parse_sf_records(v)
for child_key, clean_value in clean_dict.items():
clean_key = "{}.{}".format(k, child_key)
nested_dict[clean_key] = clean_value
nested_dict.pop(k)
return nested_dict
I agree there needs to be some work on how the responses are formatted, especially with the difference between "results" and "records" depending on whether you did a query
or a POST
.
I would love to see it more similar to the Requests API for handling responses, have a resp.status_code for the return code of the call, resp.json to get the raw result, and maybe resp.text to nicely format any error messages that were triggered/returned by the request if they weren't JSON.
Your data cleaning function seems pretty useful as well ensuring that things are consistent and usable.
@espoelstra I too would like to go this route. I'm leaning towards all calls returning the requests.response
object and then adding convenience flags for some of the methods to return cleaned/scrubbed data. This will give the most flexibility and predictability for users.
I agree that the ordered dictionary format is unpleasant to work with, to say the least. Here's a quick function I created to convert the ordered dictionary format from Simple Salesforce's query/query_all methods into a Pandas DataFrame:
import pandas as pd
def convert_simple_salesforce_ordered_dictionary_to_pandas_dataframe(ordered_dictionary):
# Get records list from the ordered dictionary
ord_dict_records = ordered_dictionary['records']
dictionary = {}
# Create the dictionary with the proper formatting to import to the Pandas Dataframe
# Note, special case for 'attributes' key since it's a nested ordered dictionary
# Example format: d = {'col1': [1, 2], 'col2': [3, 4]}
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html
for key, value in ord_dict_records[0].items():
if key == 'attributes':
for attribute_key, attribute_value in value.items():
dictionary[attribute_key] = []
else:
dictionary[key] = []
# populate the dictionary. The keys being the future pandas dataframe headers
# and the values being appended to the dictionary value list
for record in ord_dict_records:
for key, value in record.items():
if key == 'attributes':
for attribute_key, attribute_value in value.items():
dictionary[attribute_key].append(attribute_value)
else:
dictionary[key].append(value)
return pd.DataFrame(data=dictionary)
Adding my code here in case it helps someone writing their own function, definitely could be improved upon upon but does what I need it to. By default the variable parents is set to a list with the value I most commonly use, I just left it as an example. Would be interested to see how people are optimizing for larger queries.
import pandas as pd
def ordered_dict_to_dataframe(results, parents=['Licensee_Report__r']):
"""
results: OrderedDict from SOQL simple_salesforce query
parents: List of strings, names of parent objects
output: data frame of query results
"""
out_df = pd.DataFrame(results['records'])
for parent in parents:
parent_data = out_df[parent].values
map(lambda x: x.pop('attributes'), parent_data)
parent_df = pd.DataFrame(list(parent_data))
out_df = pd.concat([out_df,parent_df ], axis=1)
out_df.drop(parents+['attributes'], axis=1, inplace=True)
return out_df
Going to drop this in here - hope it helps someone!
def make_dict(query):
query = json.loads(json.dumps(query))
return query
Hi all,
The below code will work for "bottom up" queries. That means if you select from a child up.
Example:
SELECT Account.Name, Opportunity.StageName FROM Opportinity
Top down approach won't work:
Example
SELECT Name, (SELECT StageName FROM Opportunities) FROM Account
Here is the code:
def recursive_walk(od_field: OrderedDict):
"""
Recursively flattens each row the results of simple salesforce.
Only works for bottom up queries.
:param od_field: results returned by simple salesforce (multiple objects)
:return: returns a flattened list of dictionaries
"""
d = {}
for k in od_field.keys():
if isinstance(od_field[k], OrderedDict) & (k != 'attributes'):
if 'attributes' in od_field[k].keys():
ret_df = recursive_walk(od_field[k])
d = {**d, **ret_df}
else:
if k != 'attributes':
obj = ''.join([char for char in od_field['attributes']['type'] if char.isupper()])
d[f'{obj}.{k}'] = od_field[k]
return d
def transform_sf_result_set_rec(query_results: OrderedDict):
"""
Recursively flattens the results of simple salesforce. It needs flattening when selecting
multiple objects.
:param query_results:
:return:
"""
data = []
for res in query_results:
d = recursive_walk(res)
data.append(d)
return data
This is how you call it. And make sure to pass only the ['records']
of your result.
# dict_results < -- This is the result of your SOQL
array_dicts = transform_sf_result_set_rec(dict_results['records'])
pd.DataFrame(array_dicts)
Hi all,
The below code will work for "bottom up" queries. That means if you select from a child up. Example:
SELECT Account.Name, Opportunity.StageName FROM Opportinity
Hey @orion512 - thanks so much for sharing this, it's helped me a lot. One thing I've noticed is that if two fields are selected from the same relationship, only one will be returned in the dataframe.
For example, applying the function to the following query will only return Customer Number and Created By columns.
SELECT CustomerNumber, CreatedBy.Name, LastModifiedBy.Name FROM Customer
The raw query results look like this -
CreatedBy.Name
OrderedDict([('attributes', OrderedDict([('type', 'User'), ('url', '/services/data/v42.0/sobjects/User/00524000001mXXXXX')])), ('Name', 'Example Name One')])
LastModifiedBy.Name
OrderedDict([('attributes', OrderedDict([('type', 'User'), ('url', '/services/data/v42.0/sobjects/User/0051p00000AXXXXX')])), ('Name', 'Example Name Two')])
Do you know how the function could be amended to resolve this?
If you are still looking for a solution, I wrote this today. Works for relationship / bottom up queries. 🍻
def query_all_as_dataframe(self, query, include_deleted=False, include_parent_relationship=False, include_attributes=False, **kwargs):
attributes = search(r"(?<=SELECT|select)(.*)(?=FROM|from)", query).group()
relationships = findall(r"([A-z]+\.[A-z]+)", attributes)
records = list(self.query_all_iter(query, include_deleted=include_deleted, **kwargs))
dataframe = DataFrame(records)
rm = []
for relationship in relationships:
object_name, attr = relationship.split(".")
dataframe[f"{object_name}_{attr}"] = DataFrame(list(dataframe[object_name]))[0].apply(lambda x: x if x == None else x[attr])
if object_name not in rm: rm.append(object_name)
if not include_attributes: dataframe = dataframe.drop(columns=["attributes"])
if not include_parent_relationship: dataframe = dataframe.drop(columns=rm)
return dataframe
Stumbled upon this thread because I ran into the same issue.
I used this post to come up with a workable solution => FreeCodeCamp
This works well for relationship / bottom-up queries.
from collections.abc import MutableMapping
# generator function
def _flatten_dict_gen(d, parent_key, sep, root_key="Root", drop_keys=[]):
for k, v in d.items():
if k not in drop_keys:
new_key = f"{parent_key}{sep}{k}" if parent_key else f"{root_key}{sep}{k}"
if isinstance(v, MutableMapping):
yield from flatten_dict(v, new_key, sep=sep, drop_keys=drop_keys).items()
else:
yield new_key, v
def flatten_dict(d: MutableMapping, parent_key: str = '', sep: str = '.', root_key="", drop_keys=[]):
return dict(_flatten_dict_gen(d, parent_key, sep, root_key=root_key, drop_keys=drop_keys))
calling the function
# convert to dict and flatten
flattened_records = [
flatten_dict(x, root_key='RootObject', drop_keys=['attributes'])
for x in query_results['records']
]