function: 'lowest' common type
Sometimes going through a whole array is not needed. You have the types of the subsets of the array and you just want to get a compatible data type for all subsets.
A common scenario when assembling horrible csvs is that the same column might be inferred as different types in different csvs. For example, (float <-- int). Worst case is to 'fall back' to string.
Hey Majid, great observation. Although it’s not exactly what you’re looking for we have a performance enhancement implementation leveraging this fact under ‘visions.type sets.typeset’ called ‘traverse_graph_with_sampled_series’ that you can invoke directly for a quick speed up win.
More broadly, if instead of the ‘detect_type’ method you simply use ‘detect’ (and infer counterparts) you can pull the full inference path which consists of a list of nodes from root to final. You can then find the intersections between columns across your discrete data sets to determine a best representation.
On Tue, Dec 22 2020 at 12:37 PM, Majid alDosari < [email protected] > wrote:
Sometimes going through a whole array is not needed. You have subsets of the array and you just want to get a compatible data type for all subsets.
A common scenario when assembling horrible csvs is that the same column might be inferred as different types in different csvs. For example, (float <-- int). Worst case is to 'fall back' to string.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub ( https://github.com/dylan-profiler/visions/issues/157 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AB3MV54GPBNRRD4SHHRGN4TSWDKMDANCNFSM4VF5XCUA ).
I should add, If you were interested in making a PR for this use case it would be more than welcome.
A basic implementation would look something like this:
def cast_along_path(series, graph, path, state={}):
base_type = path[0]
for vision_type in path[1:]:
relation = graph[base_type][vision_type]["relationship"]
series = relation.transform(series, state)
return series
Which could be invoked
T = typeset
s = pd.Series([your data])
path = [Generic, Object, String]
# Type Detection
new_s = cast_along_path(s, T.base_graph, path)
# Type Inference
new_s = cast_along_path(s, T.relation_graph, path)