lakeFS icon indicating copy to clipboard operation
lakeFS copied to clipboard

Esti: TestImport sporadically fails

Open N-o-Z opened this issue 2 years ago • 7 comments

Initial investigation show's there might be a race which causes writer to close before operation is completed

Aborting write to range: %!w(*fmt.wrapError=&{sstable file close: pebble: writer is closed

Logs:

lakefs_1    | time="2022-05-25T17:30:12Z" level=debug msg="performing API action" func="pkg/api.(*Controller).LogAction" file="build/pkg/api/controller.go:3427" action=ingest_range host="lakefs:8000" message_type=action method=POST path=/api/v1/repositories/testimport/branches/ranges request_id=b6beab6f-13c0-4dac-bb08-6056c9b21527 service=api_gateway service_name=rest_api
lakefs_1    | time="2022-05-25T17:30:15Z" level=error msg="error making request request" func="pkg/block/s3.(*Adapter).streamToS3" file="build/pkg/block/s3/adapter.go:232" error="Put \"[https://esti-system-testing.s3.amazonaws.com/866-local-api-key/7421/ca76f95eu8ninpebbm90/testimport/_lakefs/3bcd5ee790dbc4318c09ae4aded17db86c92d3179c45ed70b31be3822bff897d\](https://esti-system-testing.s3.amazonaws.com/866-local-api-key/7421/ca76f95eu8ninpebbm90/testimport/_lakefs/3bcd5ee790dbc4318c09ae4aded17db86c92d3179c45ed70b31be3822bff897d/)": net/http: HTTP/1.x transport connection broken: read tcp 172.19.0.3:51966->52.216.136.180:443: read: connection reset by peer" host="lakefs:8000" method=POST operation=PutObject path=/api/v1/repositories/testimport/branches/ranges request_id=b6beab6f-13c0-4dac-bb08-6056c9b21527 service_name=rest_api url="https://esti-system-testing.s3.amazonaws.com/866-local-api-key/7421/ca76f95eu8ninpebbm90/testimport/_lakefs/3bcd5ee790dbc4318c09ae4aded17db86c92d3179c45ed70b31be3822bff897d"
lakefs_1    | time="2022-05-25T17:30:15Z" level=error msg="Aborting write to range: %!w(*fmt.wrapError=&{sstable file close: pebble: writer is closed 0xc00e352cf0})" func="pkg/logging.(*logrusEntryWrapper).Errorf" file="build/pkg/logging/logger.go:250"
lakefs_1    | time="2022-05-25T17:30:15Z" level=debug msg="HTTP call ended" func=pkg/httputil.DebugLoggingMiddleware.func1.1 file="build/pkg/httputil/logging.go:78" host="lakefs:8000" method=POST path=/api/v1/repositories/testimport/branches/ranges request_id=b6beab6f-13c0-4dac-bb08-6056c9b21527 sent_bytes=618 service_name=rest_api status_code=500 took=2.866843068s
esti_1      | time="2022-05-25T17:30:15Z" level=info msg="Ingest range response body: %s[123 34 109 101 115 115 97 103 101 34 58 34 119 114 105 116 105 110 103 32 114 97 110 103 101 32 102 114 111 109 32 101 110 116 114 121 32 105 116 101 114 97 116 111 114 58 32 99 108 111 115 105 110 103 32 119 114 105 116 101 114 58 32 115 115 116 97 98 108 101 32 115 116 111 114 101 32 40 51 98 99 100 53 101 101 55 57 48 100 98 99 52 51 49 56 99 48 57 97 101 52 97 100 101 100 49 55 100 98 56 54 99 57 50 100 51 49 55 57 99 52 53 101 100 55 48 98 51 49 98 101 51 56 50 50 98 102 102 56 57 55 100 41 58 32 97 100 97 112 116 101 114 32 112 117 116 32 115 51 58 47 47 101 115 116 105 45 115 121 115 116 101 109 45 116 101 115 116 105 110 103 47 56 54 54 45 108 111 99 97 108 45 97 112 105 45 107 101 121 47 55 52 50 49 47 99 97 55 54 102 57 53 101 117 56 110 105 110 112 101 98 98 109 57 48 47 116 101 115 116 105 109 112 111 114 116 32 51 98 99 100 53 101 101 55 57 48 100 98 99 52 51 49 56 99 48 57 97 101 52 97 100 101 100 49 55 100 98 56 54 99 57 50 100 51 49 55 57 99 52 53 101 100 55 48 98 51 49 98 101 51 56 50 50 98 102 102 56 57 55 100 58 32 80 117 116 32 92 34 104 116 116 112 115 58 47 47 101 115 116 105 45 115 121 115 116 101 109 45 116 101 115 116 105 110 103 46 115 51 46 97 109 97 122 111 110 97 119 115 46 99 111 109 47 56 54 54 45 108 111 99 97 108 45 97 112 105 45 107 101 121 47 55 52 50 49 47 99 97 55 54 102 57 53 101 117 56 110 105 110 112 101 98 98 109 57 48 47 116 101 115 116 105 109 112 111 114 116 47 95 108 97 107 101 102 115 47 51 98 99 100 53 101 101 55 57 48 100 98 99 52 51 49 56 99 48 57 97 101 52 97 100 101 100 49 55 100 98 56 54 99 57 50 100 51 49 55 57 99 52 53 101 100 55 48 98 51 49 98 101 51 56 50 50 98 102 102 56 57 55 100 92 34 58 32 110 101 116 47 104 116 116 112 58 32 72 84 84 80 47 49 46 120 32 116 114 97 110 115 112 111 114 116 32 99 111 110 110 101 99 116 105 111 110 32 98 114 111 107 101 110 58 32 114 101 97 100 32 116 99 112 32 49 55 50 46 49 57 46 48 46 51 58 53 49 57 54 54 45 92 117 48 48 51 101 53 50 46 50 49 54 46 49 51 54 46 49 56 48 58 52 52 51 58 32 114 101 97 100 58 32 99 111 110 110 101 99 116 105 111 110 32 114 101 115 101 116 32 98 121 32 112 101 101 114 34 125 10]" func=github.com/treeverse/lakefs/esti.TestImport file="/lakefs/esti/import_test.go:68" testName=TestImport

@itai-david FYI

N-o-Z avatar May 25 '22 17:05 N-o-Z

Added more logging in #3615 , this doesn't happen very often and hopefully the new log will catch premature context cancelling.

itaiad200 avatar Jul 12 '22 07:07 itaiad200

Another fresh example, the added logs did not catch it..

itaiad200 avatar Jul 13 '22 15:07 itaiad200

Previous examples: 1, 2

itaiad200 avatar Jul 13 '22 15:07 itaiad200

image

https://github.com/treeverse/lakeFS/runs/7598489344?check_suite_focus=true

itaiad200 avatar Jul 31 '22 12:07 itaiad200

https://github.com/treeverse/lakeFS/runs/7730265083?check_suite_focus=true

itaiad200 avatar Aug 08 '22 16:08 itaiad200

:( https://github.com/treeverse/lakeFS/runs/7785306745?check_suite_focus=true

itaiad200 avatar Aug 11 '22 12:08 itaiad200

https://github.com/treeverse/lakeFS/runs/8067326411?check_suite_focus=true

itaiad200 avatar Aug 29 '22 10:08 itaiad200

No recent failures, closing.

itaiad200 avatar Sep 28 '22 13:09 itaiad200