cube icon indicating copy to clipboard operation
cube copied to clipboard

pre-aggregations fail in developer mode with local remote_dir "<...> can't be listed in remote fs" on 0.35.x

Open codingchili opened this issue 1 year ago • 4 comments

Describe the bug

When pre-aggregations are enabled in developer mode, table creation fails with the error "location temp://<file.csv.gz> can't be listed in remote fs" in import/mod.rs::location_file_size. This is a blocker as we won't be able to develop with pre aggregations in our schemas.

To Reproduce

  1. Start the dev server.
  2. Query the ClientTracker.path dimension in cube playground etc.
  3. Check the logs for pre-aggregations error.

Expected behavior

The error should not happen and the pre-aggregation should be built correctly. I can see that the <file.csv.gz> file exists in the temp-uploads folder

Screenshots

image

Error log

> [email protected] dev C:\Users\xxx\IdeaProjects\xxx\server
> env-cmd -f .env.local cubejs-dev-server

🔥 Cube Store (0.35.11) is assigned to 3030 port.
🔓 Authentication checks are disabled in developer mode. Please use NODE_ENV=production to enable it.
🦅 Dev environment available at http://localhost:4000
🔗 Cube SQL (pg) is listening on 0.0.0.0:15432
🚀 Cube.js server (0.35.11) is listening on 4000

<....>

Executing Load Pre Aggregation SQL: scheduler-568f70dd-260a-4bc8-9759-9d38f17a7736 
--
  SELECT
      `client_tracker`.path `client_tracker__path`
    FROM
      (select CURRENT_TIMESTAMP() as dt, "/website" as path) AS `client_tracker`  GROUP BY 1
--
2024-04-15 09:14:44,486 ERROR [cubestore::http] <pid:132364> Error processing HTTP command: Internal: Location temp://dev_pre_aggregations.client_tracker_paths_vcvuyp0c_baakftn_1j1pkv0-0.csv.gz can't be listed in remote_fs
Uploading external pre-aggregation error: scheduler-568f70dd-260a-4bc8-9759-9d38f17a7736
--
  SELECT
      `client_tracker`.path `client_tracker__path`
    FROM
      (select CURRENT_TIMESTAMP() as dt, "/website" as path) AS `client_tracker`  GROUP BY 1
--
{
  "queryKey": [
    [
      "SELECT\n      `client_tracker`.path `client_tracker__path`\n    FROM\n      (select CURRENT_TIMESTAMP() as dt, \"/website\" as path) AS `client_tracker`  GROUP BY 1",
      []
    ],
    [
      [
        {
          "refresh_key": "28552754"
        }
      ]
    ]
  ],
  "targetTableName": "dev_pre_aggregations.client_tracker_paths_vcvuyp0c_baakftn_1j1pkv0",
  "newVersionEntry": {
    "table_name": "dev_pre_aggregations.client_tracker_paths",
    "structure_version": "baakftn",
    "content_version": "vcvuyp0c",
    "last_updated_at": 1713165280626,
    "naming_version": 2
  }
}
Error: Error during create table: CREATE TABLE dev_pre_aggregations.client_tracker_paths_vcvuyp0c_baakftn_1j1pkv0 (`client_tracker__path` varchar(255)) LOCATION ?: Internal: Location temp://dev_pre_aggregations.client_tracker_paths_vcvuyp0c_baakftn_1j1pkv0-0.csv.gz can't be listed in remote_fs
    at WebSocket.<anonymous> (C:\Users\ext.robin.duda\IdeaProjects\extracto-central\server\node_modules\.pnpm\@[email protected][email protected]\node_modules\@cubejs-backend\cubestore-driver\src\WebSocketConnection.ts:121:30)
    at WebSocket.emit (node:events:517:28)
    at Receiver.receiverOnMessage (C:\Users\ext.robin.duda\IdeaProjects\extracto-central\server\node_modules\.pnpm\[email protected]\node_modules\ws\lib\websocket.js:1068:20)
    at Receiver.emit (node:events:517:28)
    at Receiver.dataMessage (C:\Users\ext.robin.duda\IdeaProjects\extracto-central\server\node_modules\.pnpm\[email protected]\node_modules\ws\lib\receiver.js:502:14)
    at Receiver.getData (C:\Users\ext.robin.duda\IdeaProjects\extracto-central\server\node_modules\.pnpm\[email protected]\node_modules\ws\lib\receiver.js:435:17)
    at Receiver.startLoop (C:\Users\ext.robin.duda\IdeaProjects\extracto-central\server\node_modules\.pnpm\[email protected]\node_modules\ws\lib\receiver.js:143:22)
    at Receiver._write (C:\Users\ext.robin.duda\IdeaProjects\extracto-central\server\node_modules\.pnpm\[email protected]\node_modules\ws\lib\receiver.js:78:10)
    at writeOrBuffer (node:internal/streams/writable:392:12)
    at _write (node:internal/streams/writable:333:10)
    at Receiver.Writable.write (node:internal/streams/writable:337:10)
    at Socket.socketOnData (C:\Users\ext.robin.duda\IdeaProjects\extracto-central\server\node_modules\.pnpm\[email protected]\node_modules\ws\lib\websocket.js:1162:35)
    at Socket.emit (node:events:517:28)
    at addChunk (node:internal/streams/readable:368:12)
    at readableAddChunk (node:internal/streams/readable:341:9)
    at Socket.Readable.push (node:internal/streams/readable:278:10)
    at TCP.onStreamRead (node:internal/stream_base_commons:190:23)

Minimally reproducible Cube Schema In case your bug report is data modelling related please put your minimally reproducible Cube Schema here. You can use selects without tables in order to achieve that as follows.

cube(`ClientTracker`, {
  sql: 'select CURRENT_TIMESTAMP() as dt, "/website" as path',
  refreshKey: {
    every: "1 minute"
  },
  preAggregations: {
    paths: {
      external: true,
      type: "rollup",
      dimensions: ["ClientTracker.path"],
      refresh_key: {
        every: "1 minute"
      },
    },
  },
  dimensions: {
    dt: {
      sql: "dt",
      type: "time",
      title: "Date",
    },
    path: {
      sql: `path`,
      type: `string`,
      title: `path`,
    },
  },
});

Version: 0.35.[5..11]

Additional context

  • Pre-aggregations are working fine when running in cubestore cluster with GCS bucket.
  • Pre-aggregations work when external: false in dev mode.
  • Tested to fail on Windows 11 and in WSL2, Ubuntu 22.04.
  • CUBESTORE_* environment variables does not seem to have an effect in devmode?
  • Issue reproduces consistently.

Related

  • https://github.com/cube-js/cube/issues/3510
  • https://github.com/cube-js/cube/issues/6765
  • https://github.com/cube-js/cube/issues/6788

Configuration

CUBEJS_DB_BQ_PROJECT_ID=<>
CUBEJS_DB_BQ_KEY_FILE=<>
CUBEJS_DEV_MODE=true
CUBEJS_DB_TYPE=bigquery
CUBEJS_API_SECRET=<>
CUBEJS_DB_BQ_LOCATION=EU
CUBEJS_SCHEMA_PATH=schema
PORT=4000

codingchili avatar Apr 15 '24 07:04 codingchili

@codingchili Are you running Cube on Windows? Are you using WSL?

From the docs:

Using Windows? We strongly recommend using WSL2 for Windows 10 to run the following commands.

igorlukanin avatar Apr 17 '24 17:04 igorlukanin

yes, this is running the cubejs-dev-server on Windows 11 directly or when running in WSL 2. is this unsupported and I must run the dev server/standard deployment within Docker?

codingchili avatar Apr 18 '24 06:04 codingchili

We recommend to always use Docker. This is the recommended way to run Cube since 2020: https://cube.dev/blog/cubejs-loves-docker

igorlukanin avatar Apr 18 '24 16:04 igorlukanin

okay thanks, should we consider clarifying the documentation to more explicitly state that running it on W11/WSL directly is unsupported/does not work?

in any case the docker approach works fine for me, this issue can be closed.


In case anyone else comes across this I'm running it with nodemon to hot-reload model files, where "cube-api" is the containers name.

nodemon -e js --exec docker cp ./model cube-api:/model

nodemon.json configuration,

{
  "verbose": true,
  "ignore": [
    "node_modules/**",
    ".cubestore/**"
  ]
}

codingchili avatar Apr 25 '24 10:04 codingchili