visions icon indicating copy to clipboard operation
visions copied to clipboard

function: 'lowest' common type

Open majidaldo opened this issue 5 years ago • 2 comments

Sometimes going through a whole array is not needed. You have the types of the subsets of the array and you just want to get a compatible data type for all subsets.

A common scenario when assembling horrible csvs is that the same column might be inferred as different types in different csvs. For example, (float <-- int). Worst case is to 'fall back' to string.

majidaldo avatar Dec 22 '20 17:12 majidaldo

Hey Majid, great observation. Although it’s not exactly what you’re looking for we have a performance enhancement implementation leveraging this fact under ‘visions.type sets.typeset’ called ‘traverse_graph_with_sampled_series’ that you can invoke directly for a quick speed up win.

More broadly, if instead of the ‘detect_type’ method you simply use ‘detect’ (and infer counterparts) you can pull the full inference path which consists of a list of nodes from root to final. You can then find the intersections between columns across your discrete data sets to determine a best representation.

On Tue, Dec 22 2020 at 12:37 PM, Majid alDosari < [email protected] > wrote:

Sometimes going through a whole array is not needed. You have subsets of the array and you just want to get a compatible data type for all subsets.

A common scenario when assembling horrible csvs is that the same column might be inferred as different types in different csvs. For example, (float <-- int). Worst case is to 'fall back' to string.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub ( https://github.com/dylan-profiler/visions/issues/157 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AB3MV54GPBNRRD4SHHRGN4TSWDKMDANCNFSM4VF5XCUA ).

ieaves avatar Dec 23 '20 13:12 ieaves

I should add, If you were interested in making a PR for this use case it would be more than welcome.

A basic implementation would look something like this:

def cast_along_path(series, graph, path, state={}):
  base_type = path[0]
  for vision_type in path[1:]:
      relation = graph[base_type][vision_type]["relationship"]
      series = relation.transform(series, state)
  return series

Which could be invoked

T = typeset
s = pd.Series([your data])
path = [Generic, Object, String]

# Type Detection
new_s = cast_along_path(s, T.base_graph, path)

# Type Inference
new_s = cast_along_path(s, T.relation_graph, path)

ieaves avatar Dec 23 '20 15:12 ieaves