dagster
dagster copied to clipboard
[docs] - Document the DAGSTER_GRPC_MAX_RX_BYTES environment variable
Summary
Document the DAGSTER_GRPC_MAX_RX_BYTES environment variable to increase gRPC memory limits.
Conversation
This issue was generated from the slack conversation at: https://dagster.slack.com/archives/C01U954MEER/p1634497569283600?thread_ts=1634497569.283600&cid=C01U954MEER
Conversation excerpt:
U0290A48WCD: Quite frequently, I am unable to retry solid errors and I receive the error below. It seems to happen with solids that are downstream of dynamic outputs, but that may not always be the case. At a loss as to where to start troubleshooting and am hoping someone can point me in the right direction. Right now I am running this on my local machine using the multiprocess executor. Full codebase is here: https://github.com/xmarcosx/dagster-etlgrpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.RESOURCE_EXHAUSTED
details = "Received message larger than max (10985123 vs. 10485760)"
debug_error_string = "{"created":"@1634491696.811821461","description":"Received message larger than max (10985123 vs. 10485760)","file":"src/core/ext/filters/message_size/message_size_filter.cc","file_line":206,"grpc_status":8}"
U016C4E5CP8: Hi Marcos - is there possibly a longer stack trace in the dagit process output when this happens? If so would you mind sharing it? U0290A48WCD:
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.RESOURCE_EXHAUSTED
details = "Received message larger than max (10985123 vs. 10485760)"
debug_error_string = "{"created":"@1634491696.811821461","description":"Received message larger than max (10985123 vs. 10485760)","file":"src/core/ext/filters/message_size/message_size_filter.cc","file_line":206,"grpc_status":8}"
>
File "/usr/local/lib/python3.8/site-packages/dagster_graphql/implementation/utils.py", line 34, in _fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/dagster_graphql/implementation/execution/launch_execution.py", line 11, in launch_pipeline_reexecution
return _launch_pipeline_execution(graphene_info, execution_params, is_reexecuted=True)
File "/usr/local/lib/python3.8/site-packages/dagster_graphql/implementation/execution/launch_execution.py", line 50, in _launch_pipeline_execution
run = do_launch(graphene_info, execution_params, is_reexecuted)
File "/usr/local/lib/python3.8/site-packages/dagster_graphql/implementation/execution/launch_execution.py", line 34, in do_launch
pipeline_run = create_valid_pipeline_run(graphene_info, external_pipeline, execution_params)
File "/usr/local/lib/python3.8/site-packages/dagster_graphql/implementation/execution/run_lifecycle.py", line 48, in create_valid_pipeline_run
external_execution_plan = get_external_execution_plan_or_raise(
File "/usr/local/lib/python3.8/site-packages/dagster_graphql/implementation/external.py", line 115, in get_external_execution_plan_or_raise
return graphene_info.context.get_external_execution_plan(
File "/usr/local/lib/python3.8/site-packages/dagster/core/workspace/context.py", line 190, in get_external_execution_plan
return self.get_repository_location(
File "/usr/local/lib/python3.8/site-packages/dagster/core/host_representation/repository_location.py", line 620, in get_external_execution_plan
execution_plan_snapshot_or_error = sync_get_external_execution_plan_grpc(
File "/usr/local/lib/python3.8/site-packages/dagster/api/snapshot_execution_plan.py", line 36, in sync_get_external_execution_plan_grpc
api_client.execution_plan_snapshot(
File "/usr/local/lib/python3.8/site-packages/dagster/grpc/client.py", line 153, in execution_plan_snapshot
res = self._query(
File "/usr/local/lib/python3.8/site-packages/dagster/grpc/client.py", line 110, in _query
response = getattr(stub, method)(request_type(**kwargs), timeout=timeout)
File "/usr/local/lib/python3.8/site-packages/grpc/_channel.py", line 946, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/usr/local/lib/python3.8/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
U0290A48WCD: Hopefully that's helpful! I can dig up more details if needed U016C4E5CP8: Definitely helpful - and just to confirm, this is the latest version of dagster? Appears to be from looking at your GitHub repo U0290A48WCD: Yup, that's right U016C4E5CP8: Hey marcos - just trying to reproduce this with some stubbed out data. You said 'It seems to happen with solids that are downstream of dynamic outputs' - do you recall a specific solid where it happened? Thanks U016C4E5CP8: and if you remember how many dynamic outputs were being collected in that particular step, that would also be helpful U016C4E5CP8: the github repo is really helpful - if sending over your a dump of your runs and event_logs table (over DM or email) is an option, that would definitely be enough to reproduce the problem U016C4E5CP8: Ah, actually, I think there is an env var that you can set (which we should document) that increases the limit that you're running into - try setting DAGSTER_GRPC_MAX_RX_BYTES to 20000000 U016C4E5CP8: <@U018K0G2Y85> docs Document the DAGSTER_GRPC_MAX_RX_BYTES environment variable to increase gRPC memory limits U0290A48WCD: Thank you , setting the DAGSTER_GRPC_MAX_RX_BYTES environment variable did it! For background info: I have an external API request that returned 1,200 unique ids. Those ids are dynamic and for each one I need to hit a set of additional API endpoints (endpoints A, B, and C). I had a solid that runs the GET to receive the ~1,200 ids and dynamically outputs them. I put those results through .map() functions to hit endpoints A,B, C which can all be run in parallel. Dagster has allowed ETL pipelines that once took days, to take only several hours to complete.
Message from the maintainers:
Are you looking for the same documentation content? Give it a :thumbsup:. We factor engagement into prioritization.