docs
docs copied to clipboard
Fix an error in importing sample data
What is changed, added or deleted? (Required)
The step to LOAD DATA INFILE does not work with the client tiup client. Changed it to another method.
close #6647
Which TiDB version(s) do your changes apply to? (Required)
Tips for choosing the affected version(s):
By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.
For details, see tips for choosing the affected versions.
- [x] master (the latest development version)
- [x] v6.4 (TiDB 6.4 versions)
- [x] v6.3 (TiDB 6.3 versions)
- [x] v6.1 (TiDB 6.1 versions)
- [ ] v5.4 (TiDB 5.4 versions)
- [ ] v5.3 (TiDB 5.3 versions)
- [ ] v5.2 (TiDB 5.2 versions)
- [ ] v5.1 (TiDB 5.1 versions)
- [ ] v5.0 (TiDB 5.0 versions)
What is the related PR or file link(s)?
- This PR is translated from:
- Other reference link(s):
Do your changes match any of the following descriptions?
- [ ] Delete files
- [ ] Change aliases
- [ ] Need modification after applied to another branch
- [ ] Might cause conflicts after applied to another branch
[REVIEW NOTIFICATION]
This pull request has not been approved.
To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.
The full list of commands accepted by this bot can be found here.
Reviewer can indicate their review by submitting an approval review. Reviewer can cancel approval by submitting a request changes review.
I'll review this next week. I think some more changes are needed.
I think we should consider the following:
- There are two documents about this with different procedures
- https://docs.pingcap.com/tidbcloud/import-sample-data
- https://docs.pingcap.com/tidb/dev/import-example-data
It might be good to consider if we need to combine these or not. My take on it is that we should not do this as the procedure for TiDB Cloud uses the cloud UI to import and uses data that was already converted to dumpling format instead of the original CSV data. We could use the S3 hosted files for the onprem example as well as tidb-lightning supports S3.
- Client strategy
The referenced issue links to https://docs.pingcap.com/tidb/stable/quick-start-with-tidb on which there are examples for "MySQL Client" and tiup client. On TiDB Cloud we have WebSQL which is based on https://github.com/xo/usql just like tiup client. However there isn't an example for tiup client on the TiDB Cloud connect page. It also doesn't look like tiup client was intended to be used for anything except for local playgrounds. On this dialog we also list https://github.com/dbcli/mycli. I'm often using "MySQL Shell", which is listed on https://docs.pingcap.com/tidb/stable/dev-guide-connect-to-tidb#mysql-shell as well. There are also many GUI clients that people use like https://github.com/dbeaver/dbeaver and https://github.com/mysql/mysql-workbench. We should make sure the method that we're choosing here aligns with the long term plans for support of clients.

Side note: Maybe we should add a note about tiup client on https://docs.pingcap.com/tidb/stable/sql-statement-load-data
- Audience
People that are just starting with TiDB are likely to come from a MySQL background. This means that they might already be familiar with "MySQL Client", LOAD DATA..., etc. but not have experience with tidb-lightning.
While showing the functionality of Lightning it might also be good to not overwhelm them with new tools.
Maybe we should list both the tidb-lightning method and the LOAD DATA... method.
- TiDB Cloud
If we use LOAD DATA... or tidb-lightning with tidb backend then this procedure should also work on TiDB Cloud. Let's try to keep all of this working with TiDB Cloud and document if anything special is needed for that.
- Using a schema file.
The example uses no-schema = true and manually creates the schema. We could put the SQL in a file and let tidb-lightning take care of this.
- The goal
I think the goal of importing the sample data is:
- Giving people some data in TiDB to allow them to play with this with SQL.
- Maybe add some example queries?
- What about adding indexes?
- Using the data in dumpling format from S3 might be faster
- Having people learn how to import data into TiDB
- Maybe link to https://docs.pingcap.com/tidb/stable/migrate-small-mysql-to-tidb ?
- The S3 data is in dumpling format, which may be similar to what they eventually use if they use Dumpling to dump data from MySQL.
- Newer data
https://s3.amazonaws.com/capitalbikeshare-data/index.html has data from 2010 until last month (2022-09). Maybe we should use the newer data. Note that the newer files have different fields.
- Data amount
To see the full benefits of TiDB it might be needed to load more years of data than the example uses. Maybe list the commands to import all data as well.
- TiFlash.
Might be good to give an example about TiFlash with this data (adding replica and explain, etc)
- OSS Insight.
OSS Insight is a full example as it has the data and an opensource application. Maybe we want to either use OSS Insight data here or create a small example frontend for the bikeshare data to give a more complete example.
If we do this we could use some of the example queries from the TiDB Cloud Playground functionality.
Maybe of these points are probably out of scope for this specific PR. For now I think for this PR we should:
- Keep both methods (
LOAD DATA...andtidb-lightning) - Consider putting the SQL in a file for
tidb-lightning. - Add some example queries. Maybe add an index.
- Load newer data
- List optional steps to load more data
PR needs rebase.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@hfxsd: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
| Test name | Commit | Details | Required | Rerun command |
|---|---|---|---|---|
| pull-verify | 01b16412f49238e7b1f00153a67b434ddb23c635 | link | true | /test pull-verify |
Full PR test history. Your PR dashboard.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.