document-api-python icon indicating copy to clipboard operation
document-api-python copied to clipboard

Creating folders and adding items/fields to folders

Open KernpunktAnalytics opened this issue 8 years ago • 11 comments

I have the following requirements:

  • The ability to create and modify folders in a Tableau Datasource (TDS).
  • The ability to add fields to an existing folder.

KernpunktAnalytics avatar Nov 06 '16 16:11 KernpunktAnalytics

Felix,

Thanks for filing this.

Do you have any rough sketches of how you want to expose this in the API?

Here's some rambling that is not comprehensive or thought out, but may help:

  • In the Schema Viewer, Folders are a first-class object that groups fields
  • Fields can only be a member of one folder
  • We also have hierarchies, which may want a similar API?
  • We also have Table Names, which could have a similar group-by requirement
  • Folders can only contain fields and hierarchies
  • You can have a Folder that you put Fields in, or Fields with a folder attribute, and then helper methods that group/collect/filter.

I'm not sure what I'd prefer, but wanted to brain-dump some of those thoughts!

t8y8 avatar Nov 07 '16 02:11 t8y8

Hey Tyler,

thanks a lot for the input. I'll just brain-dump some thoughts back:

  • I'm not familiar with hierarchies, so I'd gladly follow any recommendations from you on this one.
  • Table Names: I assumed that sort by folders and sort by table names are mutually exclusive (in the Tableau UI), with the latter being dependent on information from the metadata-record's.
  • Since folders are represented by a dedicated folder-tag, I'd like to expose it in the same way. Looking at the XML, folders are a sub-element of the datasource and not of fields. My PR currently contains a "datasource.add_folder" and "datasource.add_to_folder" functions. However, there are some things (like "fields can only belong to one folder") that are currently not safeguarded against. This would be easier using the approach you mentioned (using fields with a folder attribute).
  • Overall, it might be good to create a dedicated "Folder" wrapper object similar to field.py. This is currently my preferred solution and would allow us to treat folders similar to fields (like calling my_tds.folders["myFolder"].add("[myField]") )

I'll leave the branch as-is for now. Once the best solution has been agreed on, I'll add some safeguards, some ways to remove folders and fields from folders and then update the PR.

Cheers Felix

KernpunktAnalytics avatar Nov 07 '16 07:11 KernpunktAnalytics

Thanks Felix for the counter-dump!

After some thought, here's a bit of what I'm thinking more coherently:

  • I like that add_folder or a similar method exists off of datasource, and it should take a string that's the name of the folder.
  • folders are exposed via ds.folders
  • I think that add_to_folder should live on the folder object, not the datasource. So it would be
profit_field = ...  # code that gets the field, or creates a new one
sales_folder = ds1.make_folder('Sales Metrics')  # Or something similar
sales_folder.add(profit_field)  

The profit_field object can check itself it it's a member of a folder already, either via folder calling something like if not profit_field.folder: ...; else raise MultipleFolderException, a special method, or something like that.

I need more thinking time on hierarchies and the other questions I raised.

This might sound silly, but would you write some sample code (using the methodology I describe above) that would be typical of how you plan to use the folders API? It might help it shake out the details and I think you've got a use case in mind that's better than my toy example above

t8y8 avatar Nov 11 '16 20:11 t8y8

Hey Tyler,

thanks for the response. I definitely agree with the points you've brought up.

I've used your feedback and created an updated test (test/test_folders.py) documenting how I'd like the API to behave. I've included them in the updated PR. I hope this fits your idea of some sample code well enough ;)

There are still some details that are missing (like the type of exceptions that should be thrown), but it should be good enough as a foundation for the next steps.

The only thing that I'm still on the fence about is something like profit_field.folder: I'm not sure whether a field should have the knowledge about what folder it is assigned to. This knowledge, and any logic that comes with it, should IMHO sit with the folder object, which is accessed only via the datasource it belongs to. I hope this explanation makes some sense.

Enjoy your weekend! -Felix

KernpunktAnalytics avatar Nov 12 '16 11:11 KernpunktAnalytics

Felix,

Thanks, I'll take a look at the test ASAP now that I'm back from vacation!

On the question you raise -- I proposed the field have some knowledge so that we can always check if it's a already a member of a folder.

Do you have an idea of how we could do that check a different way? (I'm definitely open to an alternative)

t8y8 avatar Nov 29 '16 19:11 t8y8

Hey Tyler,

The way I see it, we have two alternative approaches that we can take.

  1. We put a reference to the folder (or at least its name) on the field object. The downside is, we now have the folder information in two places (folder object and field) and need to keep this in sync at all times. IMHO there are quite a few edge cases and such, and implementing this in a way that works consistently in all situations is not trivial.

  2. We put a function on the field object that can dynamically check whether this field is assigned to a folder somewhere. I think this is a better approach than the first since we don't have to worry about keeping folder information in two places. On the other hand, it requires the field to keep a reference to the containing datasource, and I'm not sure whether this is a good design approach.

Based on those two alternatives, my preferred approach would be to not allow folder information to be retrieved via the field object. Maybe a way to make it more comfortable for end users would be a function get_folder_for_field(fieldname) on the datasource? Not too sure about this either.

Let me know what you think.

  • Felix

KernpunktAnalytics avatar Nov 30 '16 07:11 KernpunktAnalytics

Thanks!

I'm mostly interested in how this code could work:

random_field = ds.fields.pop()
awesome_folder = ds.add_folder('awesome')
awesome_folder.add_field(random_field)
>>> MemberOfMultipleFoldersException: Field is already in folder 'abc'

With option 1, even if it's not meant to be a public api, the folder can simply check in the add code like this:

def add_field(self, field):
    if not field._folder:
        # do the add
        field._folder = self.folder_name
    else:
        raise MemberOfMultipleFoldersException

With option two, are you proposing the folder has a registry, and the help function for the field just get a list of all folders in the ds, loop through, and check to see if it's already in any of them?

t8y8 avatar Nov 30 '16 18:11 t8y8

Hey Tyler,

thanks for the use case.

With option two, are you proposing the folder has a registry, and the help function for the field just get a list of all folders in the ds, loop through, and check to see if it's already in any of them?

That was roughly the idea. But seeing your use case example, there might be an easier way to fulfill your requirements. How about this:

  1. When creating a folder object, it receives a reference to the parent datasources "folders"-property (a list of all folders).
  2. When adding a field to a folder, the folder can check all other folders if the field is already in them and raise an exception if needed.
  3. Since this "folders" reference is a property and not an attribute, it's automatically kept up to date by the data source.

The folder implementation would then look something like this:

class Folder:

   def __init__(self, other_things, all_folders):
       self._all_folders = all_folders
       ...

   def add_field(self, field):
       if any(field in folder for folder in self._all_folders):
           raise MemberOfMultipleFoldersException
       ...

How does that approach sound for you?

-Felix

KernpunktAnalytics avatar Dec 01 '16 07:12 KernpunktAnalytics

Felix, what about a reference to the parent data source instead, and then it would follow the same logic.

class Folder:

    def __init__(self, name, parent_datasource):
        self.ds = parent_datasource
        self.name = name
        ....

    def add_field(self, field):
       if any(field in folder for folder in self.ds.folders):
           raise MemberOfMultipleFoldersException
       ...

For that to work we'd need to implement __contains__ on the folder object, but there's an elegance to saying: if field not in folder: do_stuff() ... I think I'm coming around! :)

t8y8 avatar Dec 01 '16 21:12 t8y8

Sounds good to me!

My train of thought was to expose as little information as possible about the parent datasource to the folder objects. Thats why I proposed to only expose the datasource.folders property to the folder instead of the whole datasource. But if exposing the data source is all right with you, I won't complain about it :)

Are there other open points about the folders feature on your side? Otherwise, I'd go ahead and start implementing the things we've discussed. Probably at some point next week or so.

Enjoy your weekend! -Felix

KernpunktAnalytics avatar Dec 02 '16 07:12 KernpunktAnalytics

I think it's ready for the first crack, we might tweak exposing ds vs folders (but it's an easy item to play with) or some other small items (Should the list of folders be ordered? It is in the UI but I'm not sure it really matters that much...)

Have at it!

t8y8 avatar Dec 02 '16 17:12 t8y8