OpenBBTerminal icon indicating copy to clipboard operation
OpenBBTerminal copied to clipboard

[Enhancement] Get Field Descriptions from OBBject

Open deeleeramone opened this issue 4 months ago • 9 comments

  1. Why? (1-3 sentences or a bullet point list):

The purpose is to have a way to easily extract the field descriptions (and their keys) from the Python interface. This method can be used to generate tool tips and read the metadata associated with each field returned within the results.

  1. What? (1-3 sentences or a bullet point list):

It's a dictionary where the keys are the fields returned, and the values are descriptions from the provider data model.

Fields automatically created by the alias generator will not have a description.

data = obb.some.function("DOGE")

data.get_field_descriptions()
  1. Impact (1-2 sentences or a bullet point list):
  • Not a breaking change.
  • Relies on the model definitions for output.
  1. Testing Done:
  • Added integration and unit tests.
  • A bunch of functions with a variety of output, obb.some.function().get_field_descriptions()
In [10]: obb.equity.discovery.filings().get_field_descriptions()
Out[10]: 
{'symbol': 'Symbol representing the entity requested in the data.',
 'cik': 'Central Index Key (CIK) for the requested entity.',
 'title': 'Title of the filing.',
 'date': 'The date of the data.',
 'form_type': 'The form type of the filing',
 'link': 'URL to the filing page on the SEC site.'}

deeleeramone avatar Feb 23 '24 03:02 deeleeramone

Agree with @hjoaquim, model_fields yields the pydantic fields or model_json_schema() gets you that in a json format.

Having it as a method here is not reliable because we don't impose any restriction on the shape of results, it can be a list, dict or any nested serializable object.

If we really need it, it will be safer to inject the schema into OBBject itself, like in the .extra during execution.

o = obb.equity.price.historical("AAPL")
o.results[0].model_fields

o.results[0].model_json_schema()

image

montezdesousa avatar Feb 23 '24 10:02 montezdesousa

Aside from that, an idea might be to grab the descriptions from our maps like the ProviderInterface or the like which might make it easier than dumping the model and pulling things out from it?

The purpose of this is to get them post-request, so you only get descriptions for what is returned and is specific to that particular object.

deeleeramone avatar Feb 23 '24 16:02 deeleeramone

I tend to like the idea but, isn't it a lot of overhead to have that method in the OBBject when one can simply do something like this:

Yes, that's too much work and setup. None of those outputs are a clean dictionary of key:value that are ready-to-go out of the box.

deeleeramone avatar Feb 23 '24 16:02 deeleeramone

Agree with @hjoaquim, model_fields yields the pydantic fields or model_json_schema() gets you that in a json format.

Having it as a method here is not reliable because we don't impose any restriction on the shape of results, it can be a list, dict or any nested serializable object.

If it is a model, then it will work. If it is not a model, then there are no descriptions. The advantage of doing a to_df() move is that it looks at the fields across all the models in a list of models and accounts for instances where the fields are not None in every model within the list.

If you rely on the first item in results only it will not be an accurate representation of the shape cast by to_df().

deeleeramone avatar Feb 23 '24 16:02 deeleeramone

If we really need it, it will be safer to inject the schema into OBBject itself, like in the .extra during execution.

This would duplicate schemas, because they already exist under the results object, and add unnecessary bloat to the OBBject by serializing a whole bunch of text. We simply need a method for retrieving the information that already exists within the object. Please keep in mind that this is specifically for the Python interface, and not for the API. API users do not get the benefits of the OBBject methods.

deeleeramone avatar Mar 01 '24 19:03 deeleeramone

If we really need it, it will be safer to inject the schema into OBBject itself, like in the .extra during execution.

This would duplicate schemas, because they already exist under the results object, and add unnecessary bloat to the OBBject by serializing a whole bunch of text. We simply need a method for retrieving the information that already exists within the object. Please keep in mind that this is specifically for the Python interface, and not for the API. API users do not get the benefits of the OBBject methods.

True, nvm that idea.

Anyway the schema is easily accessible o.results[0].model_json_schema() and I don't see why we need a method just to get field descriptions specifically after the command is executed (otherwise obb.coverage provides that info pre-execution). Seems a very specific use case that should be handled by the user. That's my opinion but whatever team decides.

montezdesousa avatar Mar 02 '24 18:03 montezdesousa

Anyway the schema is easily accessible o.results[0].model_json_schema()

if you rely on the first entry in results only, you will not get an accurate representation of the shape cast by to_df().

Wading through obb.coverage to get the specific information that's relevant to the object returned is not convenient.

obb.coverage.command_schemas(filter_by_provider="yfinance")

filter_by_provider doesn't seem to work. Even then, keying through that to get relevant information is not at all simple or intuitive.

The point is to very quickly and reliably extract information, not the entire schema. Only the returned field's descriptions - if they exist - as a dictionary. Nothing more, nothing less.

Frontend components interacting with the Python interface will find this useful.

Getting that same information (what's only relevant to the actual object returned) from obb.coverage would be quite a struggle, and would require stitching together multiple nested items.

Screenshot 2024-03-02 at 3 07 14 PM

deeleeramone avatar Mar 02 '24 22:03 deeleeramone

Hi @deeleeramone ,

I think that this might be a custom OBBject extension and that we shouldn't have it there by default as we don't really see any use-case or need from users for this. There are coverage endpoints, maps, and documentation that all facilitate this. Also, @hjoaquim has shown how to easily get what is needed.

"filter_by_provider doesn't seem to work. Even then, keying through that to get relevant information is not at all simple or intuitive." -> For me, this works without issues.

obb.coverage.command_schemas(filter_by_provider="yfinance")[".equity.price.historical"]["input"].model_fields

At this stage, we should prioritise more what the users need and our OKRs which means adding less features and focusing more at tasks at hand.

IgorWounds avatar Mar 05 '24 13:03 IgorWounds

"filter_by_provider doesn't seem to work. Even then, keying through that to get relevant information is not at all simple or intuitive." -> For me, this works without issues.

obb.coverage.command_schemas(filter_by_provider="yfinance")[".equity.price.historical"]["input"].model_fields

I'm looking for the output, but even then it does not fulfill the purpose of this method. The "coverage" endpoints are not relevant to the actual returned object.

This compiled description is not relevant to the actual returned object. The purpose is to get only relevant information, not all the extra noise and things that you have to wade through and filter out.

description='The adjusted close price. (provider: alpha_vantage, fmp);\n    Adjusted closing price during the period. (provider: intrinio);\n    Adjusted closing price during the period. (provider: tiingo)')

There is definitely a use-case, "tooltips", and @hjoaquim's solution cannot be applied universally. I'm struggling to understand what the harm of cleanly extracting field descriptions is.

By this same logic, there is no purpose of "to_dict()" because you could just do, "data.model_dump().get("results")"

deeleeramone avatar Mar 06 '24 21:03 deeleeramone