OpenMetadata icon indicating copy to clipboard operation
OpenMetadata copied to clipboard

Trino column validation errors during metadata table ingestion

Open SumanMaharana opened this issue 7 months ago • 0 comments

Affected module Ingestion Framework

Describe the bug

[2025-02-19, 18:59:36 UTC] {sql_column_handler.py:308} WARNING - Unexpected exception processing column [{'name': 'stores_config', 'type': ROW([('app_name', VARCHAR()), ('app_icon', VARCHAR()), ('play_store', ROW([('application_id', VARCHAR()), ('listing', ROW([('full_description', VARCHAR()), ('short_description', VARCHAR()), ('copyright', VARCHAR()), ('developer_id', VARCHAR()), ('contact', ROW(...
.
.
.
-- ITS A HIGHLY COMPLEX NESTED FIELD --


2 validation errors for Column
children.2.children.1.children.9.dataLength
  Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='varchar', input_type=str]
    For further information visit https://errors.pydantic.dev/2.7/v/int_parsing
children.3.children.2.children.6.dataLength
  Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='varchar', input_type=str]
    For further information visit https://errors.pydantic.dev/2.7/v/int_parsing

To Reproduce Here is the table ddl causing this issue

CREATE TABLE domain_events.mobile.v1_branded_app_crud (
   _id varchar,
   _entity_fqdn varchar,
   _slug varchar,
   _entity_id varchar,
   _event_time timestamp(6),
   _triggered_by_anonymize_request boolean,
   _originated_from varchar,
   _kafka_dc varchar,
   _kafka_cluster varchar,
   _artifact_name varchar,
   _bot_name varchar,
   _brand_id varchar,
   _browser_family varchar,
   _browser_name varchar,
   _browser_version varchar,
   _bsi varchar,
   _bsi_page_number varchar,
   _city varchar,
   _client_app_def_id varchar,
   _client_id varchar,
   _client_instance_id varchar,
   _country_code varchar,
   _device_type varchar,
   _gsi varchar,
   _identity_data varchar,
   _identity_response varchar,
   _impersonating_user_id varchar,
   _ip varchar,
   _ip_hash varchar,
   _isp varchar,
   _lng varchar,
   _logged_account_id varchar,
   _logged_session_uuid varchar,
   _logged_user_id varchar,
   _mobile_app_id varchar,
   _mobile_app_prod_variant varchar,
   _mobile_app_version varchar,
   _msid varchar,
   _os_name varchar,
   _pod varchar,
   _postal_code varchar,
   _reg_color varchar,
   _region varchar,
   _request_id varchar,
   _site_branch_id varchar,
   _target_account_id varchar,
   _user_agent varchar,
   _user_agent_enricher varchar,
   _visitor_id varchar,
   id varchar COMMENT ' [ALPHA][INTERNAL]',
   stores_config ROW(app_name varchar, app_icon varchar, play_store ROW(application_id varchar, listing ROW(full_description varchar, short_description varchar, copyright varchar, developer_id varchar, contact ROW(email varchar, phone varchar, website varchar), default_language varchar, category varchar, tags array(varchar), app_content ROW(content_rating ROW(email_address varchar, questionnaire ROW(v_i_o_l_e_n_c_e boolean, s_e_x_o_r_n_u_d_i_t_y boolean, o_f_f_e_n_s_i_v_e_l_a_n_g_u_a_g_e boolean, i_l_l_e_g_a_l_d_r_u_g_s boolean, u_s_e_r_t_o_u_s_e_r_i_n_t_e_r_a_c_t_i_o_n_s boolean, s_h_a_r_e_l_o_c_a_t_i_o_n_w_i_t_h_u_s_e_r_s boolean, a_l_l_o_w_s_p_u_r_c_h_a_s_e_o_f_d_i_g_i_t_a_l_g_o_o_d_s boolean), content_rating_category varchar), privacy_policy_url varchar, contains_ads boolean, news_app boolean, target_audience ROW(target_ages array(varchar))), localized_info map(varchar, ROW(full_description varchar, short_description varchar, title varchar, video varchar, graphics ROW(app_icon varchar, feature_graphic varchar, phone_screenshots array(varchar), tablet_seven_inch_screenshots array(varchar), tablet_ten_inch_screenshots array(varchar))))), download_link varchar, certificate_fingerprint varchar, certificate_fingerprint_sha256 varchar, publish_info ROW(is_published boolean, published_version varchar, first_publish_date timestamp(6), last_publish_date timestamp(6)), developer_name varchar, cloud_project_id varchar, automation_status ROW(play_console_connected boolean, cloud_platform_connected boolean), wix_member_email varchar), app_store ROW(bundle_id varchar, team_id varchar, listing ROW(full_description varchar, short_description varchar, copyright varchar, rating ROW(alcohol_tobacco_or_drug_use_or_references varchar, gambling_and_contests boolean, gambling_simulated varchar, kids_age_band varchar, medical_or_treatment_information varchar, profanity_or_crude_humor varchar, sexual_content_graphic_and_nudity varchar, sexual_content_or_nudity varchar, horror_or_fear_themes varchar, mature_or_suggestive_themes varchar, unrestricted_web_access boolean, violence_cartoon_or_fantasy varchar, violence_realistic_prolonged_graphic_or_sadistic varchar, violence_realistic varchar), categories ROW(primary_category varchar, primary_subcategory_one varchar, primary_subcategory_two varchar, secondary_category varchar, secondary_subcategory_one varchar, secondary_subcategory_two varchar), app_icon varchar, localized_info map(varchar, ROW(name varchar, description varchar, subtitle varchar, screenshots ROW(screenshot_sets array(ROW(type varchar, images array(varchar)))), preview ROW(preview_sets array(ROW(type varchar, images array(varchar)))), keywords array(varchar), marketing_url varchar, promotional_text varchar, support_url varchar, whats_new varchar, privacy_policy_url varchar))), download_link varchar, publish_info ROW(is_published boolean, published_version varchar, first_publish_date timestamp(6), last_publish_date timestamp(6)), in_app_purchase ROW(products array(ROW(subscription ROW(duration varchar), origin ROW(paid_plan ROW(id varchar, name varchar, description varchar, perks ROW(""values"" array(varchar)), pricing ROW(subscription ROW(cycle_duration ROW(count bigint, unit varchar), cycle_count bigint), single_payment_for_duration ROW(count bigint, unit varchar), single_payment_unlimited boolean, price ROW(value varchar, currency varchar), free_trial_days bigint, fee_configs array(ROW(fee ROW(name varchar, amount varchar), applied_at varchar))), public boolean, archived boolean, primary boolean, has_orders boolean, created_date timestamp(6), updated_date timestamp(6), slug varchar, max_purchases_per_buyer bigint, allow_future_start_date boolean, buyer_can_cancel boolean, terms_and_conditions varchar, client_data map(varchar, varchar), form_id varchar), bookings_response ROW(service ROW(id varchar, info ROW(name varchar, description varchar, images array(ROW(id varchar, url varchar, height bigint, width bigint, alt_text varchar, url_expiration_date timestamp(6), filename varchar, size_in_bytes varchar, focal_point ROW(x double, y double, height bigint, width bigint))), tag_line varchar, media ROW(items array(ROW(image ROW(id varchar, url varchar, height bigint, width bigint, alt_text varchar, url_expiration_date timestamp(6), filename varchar, size_in_bytes varchar, focal_point ROW(x double, y double, height bigint, width bigint)))), main_media ROW(image ROW(id varchar, url varchar, height bigint, width bigint, alt_text varchar, url_expiration_date timestamp(6), filename varchar, size_in_bytes varchar, focal_point ROW(x double, y double, height bigint, width bigint))), cover_media ROW(image ROW(id varchar, url varchar, height bigint, width bigint, alt_text varchar, url_expiration_date timestamp(6), filename varchar, size_in_bytes varchar, focal_point ROW(x double, y double, height bigint, width bigint))))), policy ROW(max_participants_per_booking bigint, book_up_to_x_minutes_before bigint, cancel_reschedule_up_to_x_minutes_before bigint, is_book_online_allowed boolean, is_cancel_allowed boolean, is_reschedule_allowed boolean, future_bookings_policy ROW(should_limit boolean, limit_x_minutes_to_the_future bigint), waiting_list_policy ROW(is_enabled boolean, capacity bigint, time_window_minutes bigint), bookings_approval_policy ROW(is_business_approval_required boolean, requests_affects_availability boolean), override_business_policy_fields array(varchar), cancellation_policy varchar), payment_options ROW(wix_pay_online boolean, wix_pay_in_person boolean, custom boolean, wix_paid_plan boolean), category_id varchar, booking_form_id varchar, schedule_ids array(varchar), custom_properties map(varchar, varchar), status varchar, sort_order bigint, advanced_seo_data ROW(tags array(ROW(type varchar, props map(varchar, varchar), meta map(varchar, varchar), children varchar, custom boolean, disabled boolean)), settings ROW(prevent_auto_redirect boolean, keywords array(ROW(term varchar, is_main boolean, origin varchar)))), seo_data ROW(tags array(ROW(type varchar, props map(varchar, varchar), meta map(varchar, varchar), children varchar, custom boolean, disabled boolean)), settings ROW(prevent_auto_redirect boolean, keywords array(ROW(term varchar, is_main boolean, origin varchar)))), include_conference_option boolean), schedules array(ROW(id varchar, schedule_owner_id varchar, first_session_start timestamp(6), last_session_end timestamp(6), time_zone varchar, intervals array(ROW(id varchar, start timestamp(6), ""end"" timestamp(6), interval ROW(days_of_week varchar, hour_of_day bigint, minute_of_hour bigint, duration bigint), frequency ROW(repetition bigint), affected_schedules array(ROW(schedule_id varchar, transparency varchar, schedule_owner_id varchar, schedule_owner_name varchar, schedule_owner_user_id varchar)), interval_type varchar)), title varchar, tags array(varchar), location ROW(location_type varchar, custom_address ROW(street_address ROW(number varchar, name varchar, apt varchar), country varchar, subdivision varchar, city varchar, postal_code varchar, hint varchar, geocode ROW(latitude double, longitude double), subdivisions array(ROW(code varchar, name varchar))), business_location ROW(id varchar, name varchar, description varchar, default boolean, status varchar, location_type varchar, fax varchar, time_zone varchar, business_schedule ROW(periods array(ROW(open_day varchar, open_time varchar, close_day varchar, close_time varchar)), special_hour_period array(ROW(start_date varchar, end_date varchar, is_closed boolean, comment varchar))), revision bigint, archived boolean, location_types array(varchar))), capacity bigint, rate ROW(price_text varchar, default_varied_price ROW(amount varchar, currency varchar, down_pay_amount varchar), labeled_price_options map(varchar, ROW(amount varchar, currency varchar, down_pay_amount varchar))), availability ROW(start timestamp(6), ""end"" timestamp(6), linked_schedules array(ROW(schedule_id varchar, transparency varchar, schedule_owner_id varchar, schedule_owner_name varchar, schedule_owner_user_id varchar)), constraints ROW(slot_durations array(bigint), time_between_slots bigint, split_interval bigint, slots_split_interval ROW(same_as_duration boolean, value_in_minutes bigint)), locations array(ROW(location_type varchar, custom_address ROW(street_address ROW(number varchar, name varchar, apt varchar), country varchar, subdivision varchar, city varchar, postal_code varchar, hint varchar, geocode ROW(latitude double, longitude double), subdivisions array(ROW(code varchar, name varchar))), business_location ROW(id varchar, name varchar, description varchar, default boolean, status varchar, location_type varchar, fax varchar, time_zone varchar, business_schedule ROW(periods array(ROW(open_day varchar, open_time varchar, close_day varchar, close_time varchar)), special_hour_period array(ROW(start_date varchar, end_date varchar, is_closed boolean, comment varchar))), revision bigint, archived boolean, location_types array(varchar)))), use_default_location boolean), total_number_of_participants bigint, participants array(ROW(id varchar, contact_id varchar, name varchar, phone varchar, email varchar, party_size bigint, approval_status varchar, inherited boolean)), external_calendar_overrides ROW(title varchar, description varchar), status varchar, created timestamp(6), updated timestamp(6), version bigint, versions ROW(schedule_version bigint, participants_version bigint), inherited_fields array(varchar), conference_provider ROW(provider_id varchar), calendar_conference ROW(id varchar, external_id varchar, conference_id varchar, provider_id varchar, host_url varchar, guest_url varchar, password varchar, description varchar, conference_type varchar, account_owner_id varchar), schedule_owner_name varchar, schedule_owner_user_id varchar))), stores_response map(varchar, varchar), videos_response map(varchar, varchar), programs_response map(varchar, varchar), events_response map(varchar, varchar), vertical varchar, id varchar), apple_product ROW(id varchar, type varchar, reference_name varchar, availability varchar), display_name varchar, description varchar, price varchar, status varchar, apple_price_to_schedule varchar, apple_currency_to_schedule varchar, required_product_type varchar, apple_pricing ROW(value varchar, currency varchar)))), developer_name varchar, app_id varchar, wix_member_email varchar, has_pending_agreement boolean), screenshots_customized boolean) COMMENT 'Marketing [ALPHA][INTERNAL]',
   build_config ROW(splash ROW(color varchar, icon varchar, custom_story_board varchar, custom_android_splash varchar), android ROW(firebase_app_id varchar, google_services_json varchar, whitelisted_audience varchar, android_client_id varchar, build_preview ROW(download_url varchar, app_version varchar)), name varchar, app_icon varchar, url_scheme varchar, universal_link varchar, universal_links array(varchar), ios ROW(ios_client_id varchar)) COMMENT 'IDs and keys required for build [ALPHA][INTERNAL]',
   mobile_config ROW(login ROW(social_login_enabled boolean, welcome_screen ROW(top_logo_asset varchar, style ROW(background_color varchar, content_primary_color varchar, content_secondary_color varchar, content_inactive_color varchar, cta_main_color varchar, cta_secondary_color varchar), subtitle varchar, sign_up_text varchar, login_text varchar, show_invite boolean, center_image_asset ROW(uri varchar), title varchar), verify_password_screen ROW(title varchar), select_auth_method_screen ROW(existing_user_subtitle_text varchar, new_user_subtitle_text varchar), enter_email_screen ROW(email_placeholder_text varchar), captch_screen ROW(email_placeholder_text varchar), simple_login ROW(welcome_screen_c_t_a_text varchar), social_login ROW(ios ROW(google boolean, facebook boolean, apple boolean), android ROW(facebook boolean, google boolean)), extended_signup ROW(method_id varchar), disclaimer ROW(privacy_policy varchar, terms_of_use varchar), extended_login ROW(method_id varchar)), suspended boolean, default_locale varchar, in_app_purchase ROW(opt_in_product_ids array(varchar)), inbox ROW(chat_type varchar), locale ROW(white_list array(varchar))) COMMENT 'How the app itself behaves [ALPHA][INTERNAL]',
   date_created timestamp(6) COMMENT ' [ALPHA][INTERNAL]',
   date_updated timestamp(6) COMMENT ' [ALPHA][INTERNAL]',
   is_upgraded boolean COMMENT ' [ALPHA][INTERNAL]',
   moved_to_trash boolean COMMENT 'Indicates whether deleted entity was moved to trash',
   _modified_fields map(varchar, varchar) COMMENT 'Map of modified fields during update. Full field path to new value',
   _kafka_partition varchar,
   _entity_event_sequence varchar
)
WITH (
   format = 'PARQUET',
   format_version = 2,
   location = 's3a://wix-bi-domain-events/mobile.db/v1_branded_app_crud',
   partitioning = ARRAY['_slug','day(_event_time)']
)

Expected behavior A clear and concise description of what you expected to happen.

Version:

  • OS: [e.g. iOS]
  • Python version:
  • OpenMetadata version: 1.5.12
  • OpenMetadata Ingestion package version: 1.5.12.1

Additional context Add any other context about the problem here.

SumanMaharana avatar Apr 08 '25 08:04 SumanMaharana