kafka spark structured streaming 개발 중 아래 에러가 발생하였습니다.
java.lang.IllegalStateException: Set() are gone. Kafka option 'kafka.group.id' has been set on this query, it is
not recommended to set this option. This option is unsafe to use since multiple concurrent
queries or sources using the same group id will interfere with each other as they are part
of the same consumer group. Restarted queries may also suffer interference from the
previous run having the same group id. The user should have only one query per group id,
and/or set the option 'kafka.session.timeout.ms' to be very small so that the Kafka
consumers from the previous query are marked dead by the Kafka group coordinator before the
restarted query starts running.
환경은 특정 kafka topic에서 spark structured streaming(jar application)으로 데이터를 실시간으로 가지고오고있었고,
checkpoint location을 hdfs 경로에 저장하고 있었습니다.
위 에러는 저장하고있는 checkpoint location 내
offset과 commit폴더 내의 개수가 일치하지 않아 생긴 현상이었습니다.
checkpoint location을 지운 후 spark streaming을 재실행하여 해결하였습니다.
'BigData > Spark' 카테고리의 다른 글
pyspark) kafka spark structured streaming HA 구성 시 중요사항 (2) | 2023.12.18 |
---|---|
pyspark) pyspark.sql.utils.StreamingQueryException: assertion failed: Concurrent update to the commit log. Multiple streaming jobs detected for 0 (0) | 2023.12.18 |
spark log 삭제 주기 설정 (0) | 2023.07.25 |
pyspark 워드 카운트 예제 (0) | 2023.07.25 |
Apache Spark 설치 (0) | 2023.07.25 |