incubator-devlake icon indicating copy to clipboard operation
incubator-devlake copied to clipboard

[Bug][Github] Incorrect datetime value: '0000-00-00' for column 'started_at'

Open gbrancaglione opened this issue 7 months ago • 7 comments

Search before asking

  • [x] I had searched in the issues and found no similar issues.

What happened

When running the pipeline against a github repo that was working normally, this started to appear. I don't know why this started. The other repos here (5+) are all working fine.

Image

What do you expect to happen

I would expect this not to happen.

How to reproduce

I do not know how to reproduce it. The other projects are not having this error.

Anything else

No response

Version

v1.0.1

Are you willing to submit PR?

  • [x] Yes I am willing to submit a PR!

Code of Conduct

gbrancaglione avatar May 15 '25 14:05 gbrancaglione

Hello @gbrancaglione! I'm here to help you tackle any bugs, answer questions, and assist you in becoming a contributor while you wait for a human maintainer. Let's get to the bottom of this! 🤖

The MySQL error "Incorrect datetime value: '0000-00-00' for column 'started_at'" is likely caused by the StartedAt field in the DbCheckRun struct being set to a zero date value ('0000-00-00'). MySQL does not accept '0000-00-00' as a valid date unless the SQL mode is configured to allow it, which is not the default behavior.

To resolve this issue, ensure that the StartedAt field is initialized with a valid datetime value before it is inserted into the database. You can add a check in the Extract function to handle cases where checkRun.StartedAt is '0000-00-00' and set it to a valid default datetime or handle it according to your application's requirements. Here's a code snippet to handle this:

if checkRun.StartedAt == "0000-00-00" {
    checkRun.StartedAt = "1970-01-01" // or any other default valid datetime
}

This check should be added before the githubJob struct is created in the Extract function [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

dosubot[bot] avatar May 15 '25 14:05 dosubot[bot]

Try the latest beta. https://github.com/apache/incubator-devlake/releases/tag/v1.0.2-beta8

klesh avatar May 16 '25 03:05 klesh

This bug is related to #8214. It is solved, but it is not associated with any version. What's the procedure for adding it to a release?

zvika-finally avatar May 19 '25 20:05 zvika-finally

We've the same issue. Would be nice to know for sure, which version or release-candidate will contain the fix.

Joerg-Tech avatar Jun 06 '25 09:06 Joerg-Tech

Try the latest beta. v1.0.2-beta8 (release)

It fixed the initial sync (latest 14 days), but then I tried to get older data (last year) and it started failing again. Even in v1.0.2-beta8

vadim-ilchenko avatar Jun 06 '25 12:06 vadim-ilchenko

Maybe we should skip those records without started_at value? What do you guys think.

klesh avatar Jun 09 '25 07:06 klesh

Maybe we should skip those records without started_at value? What do you guys think.

That sounds good. Is it possible to configure somehow directly while importing records? I've set 'skip failed tasks' after the initial sync and increased timeline from "last 14 days" to "last year". While showing 'partial success' in tasks it still doesn't update values in grafana dashboards (they show last 14 days).

What extra steps can I take to load only what loads correctly and see the full picture? Thank you in advance for your support

Image Image Image Image

vadim-ilchenko avatar Jun 09 '25 09:06 vadim-ilchenko

Try the latest beta. v1.0.2-beta8 (release)

It fixed the initial sync (latest 14 days), but then I tried to get older data (last year) and it started failing again. Even in v1.0.2-beta8

I can confirm this issue still exists in v1.0.2-beta9.

dsiebel avatar Jun 23 '25 15:06 dsiebel

Also still exists in v1.0.3-beta1. Can we skip records for now that have invalid date time?

Brend-Smits avatar Jul 02 '25 09:07 Brend-Smits

Maybe we should skip those records without started_at value? What do you guys think.

I've opened a small PR with your proposed fix, would you be able to have a look at that please? https://github.com/apache/incubator-devlake/pull/8488

Brend-Smits avatar Jul 02 '25 10:07 Brend-Smits

@Brend-Smits . Good job, thanks for the PR.

klesh avatar Jul 07 '25 03:07 klesh

I'm getting this error with v1.0.2-beta9 with the "published_at" column.

attached stack trace -- stack trace: | github.com/apache/incubator-devlake/server/services.runPipeline | /app/server/services/pipeline_runner.go:79 | [...repeated from below...] Wraps: (2) Error running pipeline 9. Wraps: (3) attached stack trace -- stack trace: | github.com/apache/incubator-devlake/server/services.RunTasksStandalone | /app/server/services/task.go:217 | github.com/apache/incubator-devlake/server/services.(*pipelineRunner).runPipelineStandalone.func1 | /app/server/services/pipeline_runner.go:42 | github.com/apache/incubator-devlake/core/runner.runPipelineTasks | /app/core/runner/run_pipeline.go:90 | github.com/apache/incubator-devlake/core/runner.RunPipeline | /app/core/runner/run_pipeline.go:54 | github.com/apache/incubator-devlake/server/services.(*pipelineRunner).runPipelineStandalone | /app/server/services/pipeline_runner.go:38 | github.com/apache/incubator-devlake/server/services.runPipeline | /app/server/services/pipeline_runner.go:76 | github.com/apache/incubator-devlake/server/services.RunPipelineInQueue.func1 | /app/server/services/pipeline.go:360 | runtime.goexit | /usr/local/go/src/runtime/asm_arm64.s:1172 Wraps: (4) attached stack trace | -- stack trace: | | github.com/apache/incubator-devlake/server/services.RunTasksStandalone.func1 | | /app/server/services/task.go:189 | Wraps: (2) Error running task 56. | Wraps: (3) attached stack trace | -- stack trace: | | github.com/apache/incubator-devlake/core/runner.RunPluginSubTasks | | /app/core/runner/run_task.go:333 | | [...repeated from below...] | Wraps: (4) subtask Extract Releases ended unexpectedly | Wraps: (5) attached stack trace | -- stack trace: | | github.com/apache/incubator-devlake/impls/dalgorm.(*Dalgorm).convertGormError | | /app/impls/dalgorm/dalgorm.go:547 | | github.com/apache/incubator-devlake/impls/dalgorm.(*Dalgorm).CreateOrUpdate | | /app/impls/dalgorm/dalgorm.go:265 | | github.com/apache/incubator-devlake/helpers/pluginhelper/api.(*BatchSave).flushWithoutLocking | | /app/helpers/pluginhelper/api/batch_save.go:131 | | github.com/apache/incubator-devlake/helpers/pluginhelper/api.(*BatchSave).Close | | /app/helpers/pluginhelper/api/batch_save.go:147 | | github.com/apache/incubator-devlake/helpers/pluginhelper/api.(*BatchSaveDivider).Close | | /app/helpers/pluginhelper/api/batch_save_divider.go:102 | | github.com/apache/incubator-devlake/helpers/pluginhelper/api.(*ApiExtractor).Execute | | /app/helpers/pluginhelper/api/api_extractor.go:141 | | github.com/apache/incubator-devlake/plugins/github_graphql/tasks.ExtractReleases | | /app/plugins/github_graphql/tasks/release_extractor.go:74 | | github.com/apache/incubator-devlake/core/runner.runSubtask | | /app/core/runner/run_task.go:425 | | github.com/apache/incubator-devlake/core/runner.RunPluginSubTasks | | /app/core/runner/run_task.go:330 | | github.com/apache/incubator-devlake/core/runner.RunPluginTask | | /app/core/runner/run_task.go:165 | | github.com/apache/incubator-devlake/core/runner.RunTask | | /app/core/runner/run_task.go:139 | | github.com/apache/incubator-devlake/server/services.runTaskStandalone | | /app/server/services/task_runner.go:114 | | github.com/apache/incubator-devlake/server/services.RunTasksStandalone.func1 | | /app/server/services/task.go:187 | | runtime.goexit | | /usr/local/go/src/runtime/asm_arm64.s:1172 | Wraps: (6) Error 1292 (22007): Incorrect datetime value: '0000-00-00' for column 'published_at' at row 3 (500) | Wraps: (7) Error 1292 (22007): Incorrect datetime value: '0000-00-00' for column 'published_at' at row 3 | Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *mysql.MySQLError Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.leafError

charltonaustin avatar Jul 31 '25 20:07 charltonaustin

Hi, I'm still getting the error with "started_at" with v1.0.3-beta3@e474b4e . I didn't see the bug on the release notes, but the diff looks like the fix included?

Joerg-Tech avatar Aug 05 '25 06:08 Joerg-Tech

Hi, I'm still getting the error with "started_at" with v1.0.3-beta3@e474b4e . I didn't see the bug on the release notes, but the diff looks like the fix included?

I can confirm that. Issue still happens with v1.0.3-beta3 As far as I can tell, the fix implemented in #8488 only covers the cicd_job_convertor. It seems that this also needs to be added to the cicd_job_extractor.

Example Stacktrace
attached stack trace
  -- stack trace:
  | github.com/apache/incubator-devlake/core/runner.RunPluginSubTasks
  | 	/app/core/runner/run_task.go:333
  | github.com/apache/incubator-devlake/core/runner.RunPluginTask
  | 	/app/core/runner/run_task.go:165
  | github.com/apache/incubator-devlake/core/runner.RunTask
  | 	/app/core/runner/run_task.go:139
  | github.com/apache/incubator-devlake/server/services.runTaskStandalone
  | 	/app/server/services/task_runner.go:114
  | github.com/apache/incubator-devlake/server/services.RunTasksStandalone.func1
  | 	/app/server/services/task.go:187
Wraps: (2) subtask Extract Jobs ended unexpectedly
Wraps: (3) attached stack trace
  -- stack trace:
  | github.com/apache/incubator-devlake/helpers/pluginhelper/api.(*ApiExtractor).Execute
  | 	/app/helpers/pluginhelper/api/api_extractor.go:134
  | [...repeated from below...]
Wraps: (4) error adding result to batch (500)
Wraps: (5) attached stack trace
  -- stack trace:
  | github.com/apache/incubator-devlake/impls/dalgorm.(*Dalgorm).convertGormError
  | 	/app/impls/dalgorm/dalgorm.go:547
  | github.com/apache/incubator-devlake/impls/dalgorm.(*Dalgorm).CreateOrUpdate
  | 	/app/impls/dalgorm/dalgorm.go:265
  | github.com/apache/incubator-devlake/helpers/pluginhelper/api.(*BatchSave).flushWithoutLocking
  | 	/app/helpers/pluginhelper/api/batch_save.go:131
  | github.com/apache/incubator-devlake/helpers/pluginhelper/api.(*BatchSave).Add
  | 	/app/helpers/pluginhelper/api/batch_save.go:109
  | github.com/apache/incubator-devlake/helpers/pluginhelper/api.(*ApiExtractor).Execute
  | 	/app/helpers/pluginhelper/api/api_extractor.go:132
  | github.com/apache/incubator-devlake/plugins/github/tasks.ExtractJobs
  | 	/app/plugins/github/tasks/cicd_job_extractor.go:99
  | github.com/apache/incubator-devlake/core/runner.runSubtask
  | 	/app/core/runner/run_task.go:425
  | github.com/apache/incubator-devlake/core/runner.RunPluginSubTasks
  | 	/app/core/runner/run_task.go:330
  | github.com/apache/incubator-devlake/core/runner.RunPluginTask
  | 	/app/core/runner/run_task.go:165
  | github.com/apache/incubator-devlake/core/runner.RunTask
  | 	/app/core/runner/run_task.go:139
  | github.com/apache/incubator-devlake/server/services.runTaskStandalone
  | 	/app/server/services/task_runner.go:114
  | github.com/apache/incubator-devlake/server/services.RunTasksStandalone.func1
  | 	/app/server/services/task.go:187
  | runtime.goexit
  | 	/usr/local/go/src/runtime/asm_amd64.s:1598
Wraps: (6) Error 1292 (22007): Incorrect datetime value: '0000-00-00' for column 'started_at' at row 451 (500)
Wraps: (7) Error 1292 (22007): Incorrect datetime value: '0000-00-00' for column 'started_at' at row 451
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *mysql.MySQLError

For the issue mentioned by @charltonaustin I'm not sure. It's from the release_extractor, and I think releases don't always have a published_at set (pre-releases, drafts). EDIT:Sounds like it could be this: #8214

dsiebel avatar Aug 08 '25 12:08 dsiebel

I can confirm with a fresh project on the 1.0.2 release, even without the fix. Tested on a fresh project, does not affect all repos. I added some debugging (also trying Time.IsZero() to get some clues (as dlv will be more painful for now), will keep you updated.

petkostas avatar Aug 12 '25 17:08 petkostas

Closing as new issue ##8490 was opened.

petkostas avatar Aug 12 '25 17:08 petkostas

Closing as new issue ##8490 was opened.

I don't think the two are strictly the same. #8490 addresses a similar issue in the GitHub GraphQL plugin. The REST implementation remains broken.

dsiebel avatar Aug 14 '25 07:08 dsiebel

@dsiebel @Joerg-Tech @charltonaustin Seems like you are getting errors from different tables/entities. Can you all share the detailed log like @charltonaustin ? Thanks

klesh avatar Aug 14 '25 09:08 klesh

The error in the task...github_graphql.log is not very detailed: time="2025-08-14 00:05:07" level=info msg=" [pipeline service] [pipeline #585] [task #7658] [Extract Jobs] finished records: 479(not exactly)" time="2025-08-14 00:05:15" level=info msg=" [pipeline service] [pipeline #585] [task #7658] subtask Extract Jobs finished in 17137 ms" time="2025-08-14 00:05:15" level=error msg=" [pipeline service] [pipeline #585] [task #7658] subtask Extract Jobs ended unexpectedly\n\tWraps: (2) error adding result to batch (500)\n\tWraps: (3) Error 1292 (22007): Incorrect datetime value: '0000-00-00' for column 'started_at' at row 117 (500)\n\tWraps: (4) Error 1292 (22007): Incorrect datetime value: '0000-00-00' for column 'started_at' at row 117\n\tError types: (1) *hintdetail.withDetail (2) *hintdetail.withDetail (3) *hintdetail.withDetail (4) *mysql.MySQLError"

Please let me know, if there's another log file location, where I should look.

Joerg-Tech avatar Aug 14 '25 09:08 Joerg-Tech

@Joerg-Tech This is coming from the gitthub_graphql extractor. The fix I submitted addresses the behavior in the collector, so I suspect entries collected prior to that fix might still crash due to the incremental nature of the data, though that’s just a hypothesis for now and needs validation.

@charltonaustin Your issue seems to point to the release collector, which suggests there might be another scenario where we’re receiving empty dates from GitHub. Would it be possible for you to check that specific release via the GitHub GraphQL API and let us know what type of release it was that had the empty date?

petkostas avatar Aug 14 '25 13:08 petkostas

@dsiebel Thanks for the input! I’ll look into adding checks for both REST and GraphQL. We’re aware of the issue, but I didn’t anticipate there would be so many different cases where GitHub returns empty dates. It’s also unclear whether this behavior is new or has changed recently, something we might need to investigate further.

petkostas avatar Aug 14 '25 13:08 petkostas

I will prune everything on my local setup and try with a new project in order to determine possible scenarios. I have two or three candidates for testing.

petkostas avatar Aug 14 '25 14:08 petkostas

Update 👇🏼 I have confirmed that nullable date fields in GraphQL are always returning a 0001-01-01 value due to the GO implementation. What this means is that even if we are using *time.Time actually translates to the following

type ChecRun struct {
    StartedAt *time.Time
}

func main() {
    instance := &ChecRun{}
    test_payload := []byte(`{"StartedAt": "0001-01-01"}`)
    _ = json.Unmarshal(test_payload, instance)
    anotherTest := &ChecRun{
        StartedAt: instance.StartedAt,
    }
    fmt.Println("Is Nil:", instance.StartedAt == nil)
    fmt.Println("Is Nil:", anotherTest.StartedAt == nil)
    fmt.Println(instance)
    fmt.Println(anotherTest)
}

Which produces:

Is Nil: false
Is Nil: false
&{0001-01-01 00:00:00 +0000 UTC}
&{0001-01-01 00:00:00 +0000 UTC}

According to tickets and resources I could find, GORM does not treat a pointer to zero time as a "missing value." Only a real nil pointer is considered NULL and stored as SQL NULL. Next to that there is another issue, the Apache Devlake GitHub models in their majority have a *time.Time field, which makes it easy to track which fields should be conditionally set to nil in the collectors and extractors (this is required for existing data in the DB as the extractor will crash). The only drawback / time consuming case here is that I have identified at least one GitHub model which has a non nullable datetime (releases) while the documentation of the GraphQL API states the opposite, which means that if I assign a nil the pipeline will fail again.


type GithubRelease struct {
	...
	PublishedAt      time.Time `json:"publishedAt"`
	...
}

According to GitHub:

Image

So we do need to escan models and update date fields, but at the same time how do we want to proceed with the current situation? @klesh should we set the fields to nil? (will require a DB migration) or should we skip those? I believe those cases relate to the time of the polling, as jobs may have been queued (as an example) which is logical not to have a startedAt I believe also the same applies to releases (release was started but not finished as example).

petkostas avatar Aug 16 '25 16:08 petkostas

@petkostas Great investigation, much appreciated.

I personally prefer using *time.Time for fields that are nullable and setting them to nil accordingly. For cases where we discover mismatches, we can handle them through a database migration.

That said, how do you suggest we address the issue of GraphQL always returning 0001-01-01 for null datetime values?

klesh avatar Aug 18 '25 15:08 klesh

@klesh I did a quick check on the DB schema, and the fields are nullable :) so we should be able to just replace the time.Time with *time.Time in the respective locations without needing a migration. I am going to collect all the date fields that we have under GitHub and do a run against the documentation to see if there are any other entries that need to be updated.

petkostas avatar Aug 18 '25 17:08 petkostas

Sounds good to me.

@klesh I did a quick check on the DB schema, and the fields are nullable :) so we should be able to just replace the time.Time with *time.Time in the respective locations without needing a migration. I am going to collect all the date fields that we have under GitHub and do a run against the documentation to see if there are any other entries that need to be updated.

klesh avatar Aug 22 '25 03:08 klesh

@charltonaustin Your issue seems to point to the release collector, which suggests there might be another scenario where we’re receiving empty dates from GitHub. Would it be possible for you to check that specific release via the GitHub GraphQL API and let us know what type of release it was that had the empty date?

Yes

charltonaustin avatar Sep 16 '25 16:09 charltonaustin

I have released an alpha image and looking for some extra test runs :) Please check the details in #8538

petkostas avatar Sep 22 '25 16:09 petkostas

@dsiebel @Joerg-Tech @charltonaustin Seems like you are getting errors from different tables/entities. Can you all share the detailed log like @charltonaustin ? Thanks

@klesh Apologies for the delay. Error is essentially the same as @Joerg-Tech, so from the github extractor:

Stack Trace
attached stack trace
  -- stack trace:
  | github.com/apache/incubator-devlake/core/runner.RunPluginSubTasks
  | 	/app/core/runner/run_task.go:333
  | github.com/apache/incubator-devlake/core/runner.RunPluginTask
  | 	/app/core/runner/run_task.go:165
  | github.com/apache/incubator-devlake/core/runner.RunTask
  | 	/app/core/runner/run_task.go:139
  | github.com/apache/incubator-devlake/server/services.runTaskStandalone
  | 	/app/server/services/task_runner.go:114
  | github.com/apache/incubator-devlake/server/services.RunTasksStandalone.func1
  | 	/app/server/services/task.go:187
Wraps: (2) subtask Extract Jobs ended unexpectedly
Wraps: (3) attached stack trace
  -- stack trace:
  | github.com/apache/incubator-devlake/helpers/pluginhelper/api.(*ApiExtractor).Execute
  | 	/app/helpers/pluginhelper/api/api_extractor.go:134
  | [...repeated from below...]
Wraps: (4) error adding result to batch (500)
Wraps: (5) attached stack trace
  -- stack trace:
  | github.com/apache/incubator-devlake/impls/dalgorm.(*Dalgorm).convertGormError
  | 	/app/impls/dalgorm/dalgorm.go:547
  | github.com/apache/incubator-devlake/impls/dalgorm.(*Dalgorm).CreateOrUpdate
  | 	/app/impls/dalgorm/dalgorm.go:265
  | github.com/apache/incubator-devlake/helpers/pluginhelper/api.(*BatchSave).flushWithoutLocking
  | 	/app/helpers/pluginhelper/api/batch_save.go:131
  | github.com/apache/incubator-devlake/helpers/pluginhelper/api.(*BatchSave).Add
  | 	/app/helpers/pluginhelper/api/batch_save.go:109
  | github.com/apache/incubator-devlake/helpers/pluginhelper/api.(*ApiExtractor).Execute
  | 	/app/helpers/pluginhelper/api/api_extractor.go:132
  | github.com/apache/incubator-devlake/plugins/github_graphql/tasks.ExtractJobs
  | 	/app/plugins/github_graphql/tasks/job_extractor.go:92
  | github.com/apache/incubator-devlake/core/runner.runSubtask
  | 	/app/core/runner/run_task.go:425
  | github.com/apache/incubator-devlake/core/runner.RunPluginSubTasks
  | 	/app/core/runner/run_task.go:330
  | github.com/apache/incubator-devlake/core/runner.RunPluginTask
  | 	/app/core/runner/run_task.go:165
  | github.com/apache/incubator-devlake/core/runner.RunTask
  | 	/app/core/runner/run_task.go:139
  | github.com/apache/incubator-devlake/server/services.runTaskStandalone
  | 	/app/server/services/task_runner.go:114
  | github.com/apache/incubator-devlake/server/services.RunTasksStandalone.func1
  | 	/app/server/services/task.go:187
  | runtime.goexit
  | 	/usr/local/go/src/runtime/asm_amd64.s:1598
Wraps: (6) Error 1292 (22007): Incorrect datetime value: '0000-00-00' for column 'started_at' at row 14 (500)
Wraps: (7) Error 1292 (22007): Incorrect datetime value: '0000-00-00' for column 'started_at' at row 14
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *mysql.MySQLError

And it still happens in v1.0.3-beta6

dsiebel avatar Sep 22 '25 20:09 dsiebel

@dsiebel is it possible to do a test run using the 1.0.4-alpha image? I have linked also the issue above, since this seems to be similar (nullable GitHub GraphQL days).

petkostas avatar Sep 23 '25 14:09 petkostas