CDC (change data capture) is an integration pattern widely used when we talk about delivering data changes in other systems, the concept is very simple, recognizing when data has changed in the source system, these changes are captured and inserted into some target system. Let's imagine the following scenario, I have a table in sqlserver in the cloud and I want to update the same table in a local database, the tool identifies changes in the table of type INSERT/UPDATE and DELETE, takes the changed rows and in the destination does a check based on the table ID, if the `A` record does not exist in the destination table, it does an INSERT operation, if the record already exists and there was a change in the row, it makes an UPDATE on the changed field, if the row was deleted it deletes in the target table. When we talk about Big Data, things tend to change a little, all these changes we capture are saved in some file format (parquet, avro, etc.) of operation that the data suffered in the origin, for example, when we work with the capture of data from MySQL (BinLog) the file comes with a column called `Type` that informs the type of the operation (INSERT/UPDATE/DELETE) so it is easy do the loading process taking only the operations != 'DELETE'.
Share this post
Example of using cdc in delta tables applying…
Share this post
CDC (change data capture) is an integration pattern widely used when we talk about delivering data changes in other systems, the concept is very simple, recognizing when data has changed in the source system, these changes are captured and inserted into some target system. Let's imagine the following scenario, I have a table in sqlserver in the cloud and I want to update the same table in a local database, the tool identifies changes in the table of type INSERT/UPDATE and DELETE, takes the changed rows and in the destination does a check based on the table ID, if the `A` record does not exist in the destination table, it does an INSERT operation, if the record already exists and there was a change in the row, it makes an UPDATE on the changed field, if the row was deleted it deletes in the target table. When we talk about Big Data, things tend to change a little, all these changes we capture are saved in some file format (parquet, avro, etc.) of operation that the data suffered in the origin, for example, when we work with the capture of data from MySQL (BinLog) the file comes with a column called `Type` that informs the type of the operation (INSERT/UPDATE/DELETE) so it is easy do the loading process taking only the operations != 'DELETE'.